期刊文献+

一种基于EMD的文档语义相似性度量 被引量:3

An EMD-Based Metric for Document Semantic Similarity
在线阅读 下载PDF
导出
摘要 针对基于EMD(Earth Mover's Distance)的文档语义相似性算法不满足度量公理因而难以在信息检索与数据挖掘中推广应用的问题,该文提出了一种新的基于EMD的文档语义相似性度量——..Mdss_EMD(Metric for document semantic similarity based EMD)。首先在分析EMD及现有改进方法缺陷的基础上,给出了文档宽度、虚拟项的概念;随后通过增加虚拟项来对齐文档矢量的总权值,使所有度量公理得到满足;最后,为提高该度量的适应能力及处理速度,还实现了虚拟项相似距离的弹性设计并对EMD算法进行了简化。该方法把EMD扩展到度量空间中来,很大程度上提高了EMD的索引能力与精度,初步实验表明,Mdss_EMD的整体性能优于原EMD及现有其它类似方法。 Aiming at the conflicts between EMD(Earth Mover's Distance)-based measure for document semantic similarity and metric axioms, which prevent EMD from being widely applied in the information retrieval and data mining, a novel EMD-based metric for document semantic similarity named Mdss EMD is presented. Firstly, based on the analysis of drawbacks of EMD and its existing modifications, the concepts of document width and virtual term are proposed. Subsequently, by adding virtual term to initial document vector, the approach aligns the total weights of document vectors, so that all of metric axioms are satisfied. Finally, in order to improve the applicability and processing speed of the metric, the similarity distance of virtual term is designed to be elastic and EMD algorithm is also simplified. The proposed approach extends EMD to metric space, and substantially improves EMD on indexing and accuracy. The experimental results demonstrate that Mdss_EMD outperforms the original EMD and other similar measures in general.
出处 《电子与信息学报》 EI CSCD 北大核心 2008年第9期2156-2161,共6页 Journal of Electronics & Information Technology
关键词 信息检索 EMD(Earth Mover’s Distance) 度量 文档相似性 匹配 语义距离 Information retrieval EMD(Earth Mover's Distance) Metric Document similarity Match Semantic distance
  • 相关文献

参考文献10

  • 1Wan Xiaojun and Peng Yuxin. The earth mover's distance as a semantic measure for document similarity[C]. ACM Fourteenth Conference on Information and Knowledge Management (CIKM), Bremen, 2005: 301-302.
  • 2Rubner Y and Carlo T, et al.. The Earth mover's distance as a metric for image retrieval[J]. International Journal of Computer Vision, 2000, 40(2): 99-121.
  • 3Giannopoulos P and Remeo C V. A pseudo-metric for weighted point sets[C]. The 7th European Conf on Computer Vision, Copenhagen, 2002: 715-730.
  • 4梁敏,郭新涛,阮备军,朱扬勇.X_Dist——一个柔性语义距离函数[J].计算机研究与发展,2004,41(10):1728-1736. 被引量:2
  • 5Pedersen T and Patwardhan S, et al.. WordNet: Similarity- measuring the relatedness of concepts[C]. In Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics Demonstrations, Boston, 2004: 38-41.
  • 6Prasanrua G and Hector G M, et al.. Exploiting hierarchical domain structure to compute similarity[J]. ACM Trans on Information System, 2003, 21(1): 64-93.
  • 7Svetlozar T R and Ludger R. Mass transportation problems[M]. Volume Ⅰ: Theory, New York : Springer-Verlag,1998: 36-180.
  • 8Rubner Y. Source code for the EMD software[CP], http: // robotics, stanford.edu/-rubner/emd/default.htm, Retrieved 2007, 1.
  • 9Yossi Rubner. Perceptual metrics for image database navigation[D]. Stanford University, Department of Computer Science, 1999.
  • 10Pedersen T and Patwardhan S, et al.. WordNet::similarityperl modules for computing measures of semantic relatedness [CP]. http: //search.cpan.org/ dist /WordNet-Similarity /lib/WordNet/Similarity.pm, Retrieved 2007, 1.

二级参考文献15

  • 1A V Goldberg. An efficient implementation of a scaling minimumcost flow algorithm. Journal of Algorithms, 1997, 22(1): 1~29
  • 2E Chavez, G Navarro, R Baeza-Yates, et al. Searching in metric spaces. ACM Computing Surveys, 2001, 33(3): 273~321
  • 3Olfa Nasraoui, Hichem Frigui, Raghu Krishnapuram, et al.Extracting Web user profiles using relational competitive fuzzy clustering. International Journal on Artificial Intelligence Tools,2000, 9(4): 509~526
  • 4Olfa Nasraoui, Raghu Krishnapuram. An evolutionary approach to mining robust multi-resolution Web profiles and context sensitive URL associations. International Journal of Computational Intelligence and Applications, 2002, 2 (3): 339~ 348
  • 5Prasanna Ganesan, Hector Garcia-Molina, Jennifer Widom.Exploiting hierarchical domain structure to compute similarity.ACM Trans on Information System, 2003, 21(1): 64~93
  • 6Yossi Rubner, Carlo Tomasi, Leonidas J Guibas. The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2): 99~121
  • 7Panos Giannopoulos, Remco C Veltkarmp. A pseudo-metric for weighted point sets. The 7th European Conf on Computer Vision,Copenhagen, 2002
  • 8G Hirst, D St-Onge. Lexical chains as representations of context for the detection and correction of malapropims. In: WordNet: An Electronic Lexical Database. Cambridge, MA: The MIT Press,1998. 305~332
  • 9M A Rodriguez, M J Egenhofer. Determining semantic similarity among entity classes from different ontologies. IEEE Trans on Knowledge and Data Engineering, 2003, 15(2): 442~456
  • 10Glen Jeh, Jennifer Widom. SimRank: A measure of structuralcontext similarity. The 8th ACM SIGKPP Int'l Conf on Knowledge Discovery and Data Mining, Edmonton, 2002

共引文献1

同被引文献24

  • 1吴雅娟,陈尧,尚福华.一种新的基于相似度计算的本体映射算法[J].计算机应用研究,2009,26(3):870-872. 被引量:11
  • 2孟爱国,卜胜贤,李鹰,甘文.一种网络考试系统中主观题自动评分的算法设计与实现[J].计算机与数字工程,2005,33(7):147-150. 被引量:47
  • 3李鹏,陶兰,王弼佐.一种改进的本体语义相似度计算及其应用[J].计算机工程与设计,2007,28(1):227-229. 被引量:39
  • 4刘群;李素建.基于《知网》的词汇语义相似度的计算[A]中国台北,2002.
  • 5Pandya A,Bhattacharyya P. Text similarity measurement using concept representation of texts[A].Berlin,Germ any:Springer,2005.678-689.
  • 6Rodriguez M A,Egenhofer M J. Determining Semantic Similarity Among Entity Classes from Different Ontologies[J].{H}IEEE Transactions on Knowledge and Data Engineering,2003,(02):442-456.
  • 7Budanitsky A,Hirst G. Evaluating Word Net-based Measures of Lexical Semantic Relatedness[J].{H}COMPUTATIONAL LINGUISTICS,2006,(01):13-47.
  • 8Giunhiglia F,Shvaiko P,Yatskevich M. Semantic Schema Matching[R].Trento,Italy:University of Trento,2005.
  • 9Dervis Karaboga,Bahriye Akay.A comparative study of Artificial Bee Colony algorithm[J]. Applied Mathematics and Computation . 2009 (1)
  • 10Michael J. Swain,Dana H. Ballard.Color indexing[J].International Journal of Computer Vision.1991(1)

引证文献3

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部