期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Using Class Based Document Frequency to Select Features in Text Classification
1
作者 Baoli Li qiuling yan Liping Han 《国际计算机前沿大会会议论文集》 2015年第B12期50-52,共3页
Document Frequency(DF)is reported to be a simple yet quite effective measure for feature selection in text classification,which is a key step in processing big textual data collections.The calculation is based on how ... Document Frequency(DF)is reported to be a simple yet quite effective measure for feature selection in text classification,which is a key step in processing big textual data collections.The calculation is based on how many documents in a collection contain a feature,which can be a word,a phrase,a n-gram,or a specially derived attribute.It is an unsupervised and class independent metric.Features of the same DF value may have quite different distribution over different categories,and thus have different discriminative power over categories.For example,in a binary classification problem,if feature A only appears in one category,but feature B,which has the same DF value as feature A,is evenly distributed in both categories.Then,feature A is obviously more effective than feature B for classification.To overcome this weakness of the original document frequency feature selection metric,we,therefore,propose a class based document frequency strategy to further refine the original DF to some extent.Extensive experiments on three text classification datasets demonstrate the effectiveness of the proposed measures.Using Class Based Document Frequency to Select 展开更多
关键词 DOCUMENT FREQUENCY DIFFERENT DISTRIBUTION
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部