期刊文献+

基于本体及相似度的文本聚类研究 被引量:9

Study on text clustering based on ontology and similarity
在线阅读 下载PDF
导出
摘要 为了改善文本聚类的质量,得到满意的聚类结果,针对文本聚类忽略概念的内涵及缺少概念间的联系,设计和改进了基于本体和相似度的文本聚类方法TCBOS(textclusteringbased on ontology and similarity)。研究了文本预处理及分词的方法,设计了用有限状态自动机来自动提取概念和关系的方法,对概念语义扩展和相似度计算方法进行了改进和完善,通过应用本体的语义相似度来度量文档间相近程度,完善了根据相似度进行文本聚类的K中心点算法。实验证明,该方法从聚类的准确性和聚类的关联度方面改善了聚类质量,为文本的自动分析和推荐提供了一条途径。 To improve the quality of text clustering and get the satisfactory clustering results,this paper proposed a text clustering based on similarity and ontology(TCBOS).By organizing text as ontology,this paper were easy to represent the meanings and relations of concepts.This paper designed and improved the measurement of similarity and measured the text similarity by similarity of text ontology,designed the algorithm of text clustering based on similarity.Experiments show that the method can avoid using the term isolation and high-dimensional,and can improve the clustering quality in correction degree and association degree,it's a way to analyze the text automatically.
作者 王刚 邱玉辉
出处 《计算机应用研究》 CSCD 北大核心 2010年第7期2494-2497,共4页 Application Research of Computers
基金 陕西省教育厅资助项目(09JK317) 智能信息处理技术关键问题及应用研究(2008akxy005) 基于本体的服务研究(AYQDZR200916)
关键词 本体 相似度 文本聚类 语义 ontology similarity text clustering semantic
  • 相关文献

参考文献9

  • 1SONG Shao-xu,LI Chun-ping.TCUAP:a novel approach of text clustering using asymmetric proximity[C] //Proc of IICAI.2005:676-685.
  • 2孙爽,章勇.一种基于语义相似度的文本聚类算法[J].南京航空航天大学学报,2006,38(6):712-716. 被引量:18
  • 3WEINSTEIN P,BIRMINGHAM W.Comparing concepts in differentiated ontologies[C] //Proc of KAW-99.1999.
  • 4WACHE H,VOGELE T,VISSER U,et al.Ontology based integration of information:a survey of existing approaches[C] //Proc of the IJCAI-01 Workshop on Ontologies and Information Sharing.New York:IEEE Press,2001:108-117.
  • 5FRIDMANNOY N,MUSEN M.PROMPT:algorithm and tool for automated ontology merging and alignment[C] //Proc of AAAI-2000.Austin,Texas:MIT Press/AAAI Press,2000:450-455.
  • 6PANDYA A,BHATTACHARYYA P.Text similarity measurement using concept representation of texts[C] //Proc of the 1st International Conference on Patttern Recognition and Machine Intelligence.Berlin,Germany:Springer,2005:678-683.
  • 7薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报,2005,19(4):59-63. 被引量:63
  • 8王刚,邱玉辉,蒲国林.一个基于语义元的相似度计算方法研究[J].计算机应用研究,2008,25(11):3253-3255. 被引量:13
  • 9范明,孟小峰.数据挖掘概念与技术[M].北京:机械工业出版社,2002.

二级参考文献23

  • 1吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:218
  • 2TANGMUNARUNKIT H. Ontology based resource matching in the grid:the grid meets the semantic Web[ C]//Proc of the 2nd International Semantic Web Conference. Sanibel-Captiva Islands : [ s. n. ], 2003.
  • 3LIU Chuang, FOSTER I. A constraint language approach to grid resource selection[ C]//Proc of the 12th IEEE International Symposium on High Performance Distributed Computing . Chicago:IEEE Press, 2003.
  • 4WACHE H, VOGELE T, VISSER U ,et al. Ontology-based integration of information: a survey of existing approaches [ C ]//Proc of the IJCAI' 01 Workshop: Ontologies and Information Sharing. Seattle, WA : Springer, 2001.
  • 5LI Lei, HORROCKS I. A software framework for matchmaking based on semantic Web technology [ C ]//Proc of the 12th International World Wide Web Conference (WWW2003). Germany:Folbah Yerlag Press ,2003.
  • 6PAOLUCCI M. Semantic matching of Web service capabilities [ C ]// Proc of the 1st.International Semantic Web Conference (ISWC). Italy :IOS Press, 2002.
  • 7SOLOMON M. Matchmaking distributed resource management for high throughput computing[ C ]//Proc of the 7th IEEE International Symposium on High Performance Distributed Computing. Chicago: IEEE CS Press, 1998.
  • 8SOLOMON M, RAMAN R. Resource management through multilateral matchmaking[ C]//Proc of the 9th IEEE Symposium on High Performance Distributed Computing (HPDC9). Pittsburgh:AAAI Press,2000.
  • 9BIANCHINI D. Hybrid ontology based matchmaking for service discovery[ C]//Proc of Symposium on Applied Computing the 2006 ACM Symposium on Applied Computing. New York:ACM Press, 2006.
  • 10唐焕玲.[D].北京:清华大学计算机系,2003.

共引文献95

同被引文献80

引证文献9

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部