期刊文献+

结合类频率的关联中文文本分类 被引量:12

Integrating Class Frequency Into Association Rules Based Chinese Text Categorization
在线阅读 下载PDF
导出
摘要 该文提出一种词类频率和关联中文文本分类相结合的算法ARCTC。此算法将文档视作事务 ,关键词视作项 ,并针对文本事务的特性 ,提出利用词的类频率筛选与分类相关性不大的词汇 ,然后将改进的关联规则挖掘算法用于挖掘项和类别间的相关关系。挖掘出的规则用于形成类别特征词的集合 ,可用来和类标号未知文档的词的集合求交集 ,交集元素个数最多者即为所分类别。实验证明 ,该算法在提高训练时间和测试时间的同时具有较好的召回率、准确率和F Measure。 In this paper, a new algorithm that integrates class frequency into association rules based document classification is introduced into Chinese text categorization. This algorithm views each document as a transaction and each term as an item. The class frequency of a term is used to filter the words that are irrelevant to classification, and the mining algorithm of association rules is used to mine the correlation between item and category. Class character words sets are formed basing on the rules, and unlabeled documents are classified by intersecting with these sets. Experiments confirm that this method has a promising recall, precision rate and F-Measure while speeding up both training and test time.
出处 《中文信息学报》 CSCD 北大核心 2004年第6期30-36,共7页 Journal of Chinese Information Processing
基金 科技部科技电子政务系统关键技术及应用系统的研究资助 (2 0 0 1BA110B0 1)
关键词 计算机应用 中文信息处理 基于关联的分类 中文文本分类 词类频率 类别特征词集合 computer application Chinese information processing association based classification chinese text categorization term class frequency class character term set
  • 相关文献

参考文献7

  • 1B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining [C]. KDD - 98, New York,1998.
  • 2Wenmin Li, Jiawei Han, JianPei. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules [C] .ICDM2001, Silicon Valley, Ca, Nov 2001:369- 376.
  • 3Maria-Luiza Antonie, Osmar R. Zaiane. Text Document Categorization by Term Association [C]. In: Proc of the IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan: 19 - 26.
  • 4Mohammed J. Zaki, Charu C. Aggarwal. XRules: An Effective Structural Classifier for XML Data [C]. The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(SIGKDD). Washington, DC,USA, 2003.
  • 5Yiming Yang, Jan O. Pederson. A Comparative Study on Feature Selection in Text Categorization [C]. International Conference on Machine Learning, Nashville, TN, July 1997.
  • 6https://securesite.chireader.com/Archive/stopwords.txt.
  • 7http://www. in2in. com/download. htm.

同被引文献70

引证文献12

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部