期刊文献+

一种无监督文本特征计算模型 被引量:2

An Effective Unsupervised Feature Computing Model
在线阅读 下载PDF
导出
摘要 提出一种基于语义显量子勾连模型和潜量子共现模型的无监督特征提取方法,解决了当前文本聚类不支持增量式和分布式计算的问题,为后续互联网环境下海量文本聚类、单文本摘要以及多文本摘要的发展奠定了基础.实验结果表明,该模型无需领域知识库的支持,在移走约96%的冗余信息后仍能保持较好的聚类效果. This paper presents a new unsupervised feature extraction method based on the obvious quantum entangled model and the latent quantum co-occurrence model to solve the problems that current text clustering methods don't support incremental clustering and distributed computing,which is the foundation for the text clustering in Internet environment and single-and multi-text summary.The model without the support of domain knowledge maintains a good information clustering effect after moving ca 96% of the redundant features.Theory analysis and numerical experiments show that this model is effective.
出处 《吉林大学学报(理学版)》 CAS CSCD 北大核心 2010年第1期79-84,共6页 Journal of Jilin University:Science Edition
基金 国家重点基础研究发展计划973项目基金(批准号:2004CB318000)
关键词 无监督 特征提取 勾连模型 窗函数 unsupervised feature selection entangling model window function
  • 相关文献

参考文献10

  • 1陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 2刘涛,吴功宜,陈正.一种高效的用于文本聚类的无监督特征选择算法[J].计算机研究与发展,2005,42(3):381-386. 被引量:37
  • 3Sebastiani F. Machine Learning in Automated Text Categorization [ J]. ACM Computing Surveys, 2002, 34 (1) : 1-47.
  • 4YANG Yi-ming, Pedersen J O. A Comparative Study on Feature Selection in Text Categorization [ C ]//Proceedings of the Fourteenth International Conference on Machine Learning. San Franoisco: Morgan Kaufmann Publishers Inc, 1997: 412-420.
  • 5Rogati M, YANG Yi-ming. High Performing Feature Selection for Text Categorization [ C ]//The CIKM-02. New York: ACM Press, 2002: 659-661.
  • 6LIU Tao, LIU Sheng-ping, CHEN Zheng, et al. An Evaluation on Feature Selection for Text Clustering [ C]//Proceed- ings of the 20th International Conference on Machine Learning (ICML-03). Washington DC : [ s. n. ], 2003 : 488-495.
  • 7郭锋,李绍滋,周昌乐,林颖,李胜睿.基于词汇吸引与排斥模型的共现词提取[J].中文信息学报,2004,18(6):16-22. 被引量:8
  • 8Steinbach M, Karypis G, Kumar V. A Comparison of Document Clustering Techniques [ R]. Proc of KDD Workshop on Text Mining' 00. [ S. l. ] : University of Minnesota, 2000 : 1-20.
  • 9鲁松,白硕.自然语言处理中词语上下文有效范围的定量描述[J].计算机学报,2001,24(7):742-747. 被引量:47
  • 10Beeferman D, Berger A, Lafferty J. A Model of Lexical Attraction and Repulsion [ C ]//Proceedings of the 8th Conference on European Chapter of the Association for Computational Linguistics. Morristown : Association for Computational Linguistics, 1997 : 373-380.

二级参考文献15

  • 1白硕,语言学知识的计算机辅助发现,1995年
  • 2方开泰,实用多元统计分析,1989年
  • 3C. C. Aggrawal, P. S. Yu. Finding generalized projected clustersin high dimensional spaces. The SIGMOD'00, Dallas, 2000.
  • 4M. Dash, H. Liu. Feature selection for clustering. The PAKDD-00, Kyoto, 2000.
  • 5F. Sebastiani. Machine learning in automated text categorization.ACM Computin Surveys, 2002, 34(1): 1--47.
  • 6Y. Yang, J. O. Pedersen. A comparative study on featureselection in text categorization. The ICML97, Nashville, 1997.
  • 7M. Rogati, Y. Yang. High performance feature selection for text categorization. The CIKM-02, Mclean, 2002.
  • 8L. Tao, L. Shengping, C. Zheng, et al.An evaluation on feature selection for text clustering. The ICML03, Washington,2003.
  • 9Ying Ding, IR and AI. Using Co - occurrence Theory to Generate Lightweight Ontologies[A]. Proceedings of 12th International Workshop on Database and Expert Systems Applications[C], Pages:961 -965 , Sept.,2001.
  • 10E1-Sayed Atlam, A New Method for Construction Field Association Terms Using Co-occurrence Words and Declinable Words Information[A]. Proceedings of 2002 IEEE Intemational Conference on Systems, Man and Cybernetics[C],Volume 4 ,Pages:5, Oct. 2002 .

共引文献212

同被引文献16

  • 1蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量:21
  • 2Bisgin H,DalfesH N.Parallel ClusteringAlgorithms with Application to Climatology[D].Istanbul Technical University,2008.
  • 3Gad W K,Kamel M S.New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps[C]∥Proc of MLDM’09,2009:663-667.
  • 4Ingaramo D,Errecalde M,Cagnina L,et al.Particle Swarm Optimization for Clustering Short-Text Corpora[C]∥Proc of the Computational Intelligence and Bioengineering,2009:3-19.
  • 5Wang Y,Cheung Y,Liu H.An Efficient Algorithm for Clustering Search Engine Results[C]∥Proc of the CIS’06,2006:661-671.
  • 6Maulik U,Bandyopadhyay S.Genetic Algorithm Based Clustering Technique[J].Pattern Recognition,2000,33(9):1455-1465.
  • 7Salton G, Wong A, Yang C S. On the specification of term values in automatic indexing[J]. Journal of Documentation,1973, 29(4): 351-372.
  • 8Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
  • 9Baly R, Hajj H. Wafer classification using support vector machines[J].IEEE Transactions on Semiconductor Manufacturting, 2012,25(3): 373-383.
  • 10Liu X-W, Yin J-P, Wang L, et al. An adaptive approach to learning optimal neighborhood kemels[J].IEEE Transactions on Cybernetics,2013,43(1): 371-384.

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部