期刊文献+

面向维吾尔语文本的改进后缀树聚类 被引量:2

Improved suffix tree clustering for Uyghur text
在线阅读 下载PDF
导出
摘要 针对后缀树聚类选取基类时,基类短语出现信息不规范、重复和冗余的问题,提出了一种改进后缀树聚类算法。该算法首先以短语互信息算法改进基类的选取,选出遵守维吾尔语语法规则的基类短语;然后,利用短语归并算法对选取的重复基类短语进行归并;最后,在前两步的工作基础上,利用短语去冗余算法处理冗余的基类短语。实验证明,与传统后缀树聚类(STC)相比,改进后缀树聚算法的全面率、准确率都得到了提高。这表明,改进算法有效地改善了聚类效果。 In order to solve the problems of non-standard,repetition and redundancy of information in the process of selecting the base class phrases,an improved Suffix Tree Clustering(STC) method was proposed.Firstly,phrase mutual information algorithm was put forward to choose the base class phrases abiding by Uyghur grammar.Secondly,in order to reduce the repeated base class phrase,the phrase reduction algorithm based on Uyghur grammar was proposed.Thirdly,on the basis of the first two steps,the phrase redundancy algorithm based on Uyghur grammar was constructed to remove redundant phrase.The experimental results show that this method improves the recall and the precision compared with STC.This indicates that the improved algorithm can enhance clustering performance effectively.
出处 《计算机应用》 CSCD 北大核心 2012年第4期1078-1081,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60963017) 国家社会科学基金资助项目(10BTQ045 11XTQ007) 新疆大学博士基金资助项目(BS100120)
关键词 维吾尔语 后缀树 互信息 归并 冗余 Uyghur Suffix Tree(ST) Mutual Information(MI) reduction redundancy
  • 相关文献

参考文献14

  • 1ZAMIR O,ETZIONI O,MADANI O,et al.Fast and intuitive clus-tering of Web documents[C]//Proceedings of the 3rd InternationalConference on Knowledge Discovery and Data Mining.New York:AAAI Press,1997:287-290.
  • 2HONG YI,SAM K.Learning assignment order of instances for theconstrained K-means clustering algorithm[J].IEEE Transactions onSystems Man and Cybernetics Part B-Cybernetics,2009,39(2):568-574.
  • 3HALL L O,GOLDGOF D B.On convergence properties of the sin-glepass and online fuzzy c-means algorithm[C]//2010 IEEE Inter-national Conference on Fuzzy Systems,Washington,DC:IEEE,2010:1-3.
  • 4AIOLLI F,SAN-MARTINO G,HAGENBUCHNER M,et al.Learning nonsparse kernels by self organizing maps for structured da-ta[J].IEEE Transactions on Neural Networks,2009,20(12):1938-1949.
  • 5ZAMIR O,ETZIONI O.Web document clustering:A feasibilitydemonstration[C]//SIGIR'98:Proceedings of the 21st Interna-tional ACM SIGIR Conference on Research and Development in In-formation Retrieval.New York:ACM Press,1998:46-54.
  • 6CHEN CHUNXI,BERTIL S.Parallel construction of large suffixtrees on a PC cluster[C]//Euro-Par 2005 Parallel Processing:11th International Euro-Par Conference.Berlin:Springer,2005:1227-1236.
  • 7WANG JUNZE,MO YIJUN,HUANG BENXIONG,et al.Websearch results clustering based on a novel suffix tree structure[C]//Autonomic and Trusted Computing:5th International Conference.Berlin:Springer,2008:540-554.
  • 8KOPIDAKI S,PAPADAKOS P,TZITZIKAS Y.STC+and NM-STC:two novel online results clustering methods for Web searching[C]//WISE 2009:10th International Conference.Berlin:Spring-er,2009:523-537.
  • 9杜红斌,夏克文,刘南平,吴涛.一种改进的基于广义后缀树的文本聚类算法[J].信息与控制,2009,38(3):331-336. 被引量:7
  • 10HAN WEN,GUO-SHUN HUANG,ZHAO LI.Clustering Websearch results using semantic information[C]//Proceedings of theEighth International Conference on Machine Learning and Cybernet-ics.Liverpool:World Academic Press,2009:1504-1509.

二级参考文献20

  • 1张敏,马少平,宋睿华.DF还是IDF?主特征模型在Web信息检索中的使用[J].软件学报,2005,16(5):1012-1020. 被引量:13
  • 2Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval[M].Boston,USA:Addison-Wesley Longman Publishing Company Inc.,1999.
  • 3Eissen S M,Stein B,Potthast M.The suffix tree document model revisited[A].Proceedings of the 5th International Conference on Knowledge Management[C].Berlin,Germany:Springer-Verlag,2005.596~603.
  • 4Zamir O,Etzioni O.Web document clustering:A feasibility demonstration[A].SIGIR Forum (ACM Special Ineterest Group on Information Retrieval)[C].New York,USA:ACM,1998.46~54.
  • 5Ukkonen E.On-line construction of suffix trees[J].Algorithmica,1995,14(3):249~260.
  • 6Gusfield D.Algorithms on Strings,Trees and Sequences:Computer Science and Computational Biology[M].Cambridge,UK:Cambridge University Press,1997.
  • 7Karatzoglou A,Feinerer I.Text clustering with string kernels in R[A].Studies in Classification,Data Analysis,and Knowledge Organization[C].Berlin,Germany:Springer-Verlag,2007.91~98.
  • 8Joaehims T.Learning to Classify Text Using Support Vector Machines:Methods,Theory and Algorithms[M].Norwell,MA,USA:Kluwer Academic Publishers,2002.
  • 9Wang J H,Li R X.A new cluster merging algorithm of suffix tree ehistering[A].Proceedings of the 4th IFIP International Conference on Intelligent Information Processing[C].New York,USA:Springer,2007.197~203.
  • 10FLORESCU D,LEVY A,MENDELZON A.Database techniques for the world-wide Web:Survey[J].SIGMOD Record,1998,27(3):59-74.

共引文献28

同被引文献16

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部