摘要
针对基于元数据或传统主题图的知识组织模式没有实现知识的多层次多粒度表示,以及知识融合过程中相似性算法准确性不高而影响融合质量的问题,结合全信息理论与扩展主题图结构特点及语义信息,提出了面向多源知识融合的扩展主题图相似性算法(ETMSC)和阈值选取的相关性、层次对应和实验确定三原则.该算法综合了语法、语义和语用的相似性,扩展了主题图元素间组成结构上的相似性,同时充分考虑了涵义及所处语境的相似性.主题图相似性的判别准则与阈值有关,阈值的确定与数据集相关.实验结果表明,ETMSC算法与单纯基于语法或语义的相似性算法相比,准确性提高了9.2%~11.1%.
A novel similarity algorithm of extended topic map called ETMSC for multi-resource knowledge fusion is proposed to improve the drawbacks that the knowledge organization model based on metadata or traditional topic map can not represent knowledge multi-level and multigranularity, and the low accuracy of existing similarity algorithms. Three principles of the correlation, levels corresponding, and the experimental determination in selecting threshold are presented. The algorithm combines the comprehensive information theory with the structure and semantic information of extended topic map. The syntactic matching, semantic matching, and pragmatic matching are comprehensively considered, in which not only the structural similarity of topic map elements are extended, but also the meaning and relevance in linguistic contexts are thoroughly taken into account. Topic map similarity criterions are related to a threshold, and the determination of the threshold is associated with the data sets. Experimental results and comparisons with the traditional algorithms that are purely based on the syntactic or semantic similarity show that the F-measure of ETMSC is improved by 9.2%-11.1%.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2010年第2期20-24,共5页
Journal of Xi'an Jiaotong University
基金
国家高技术研究发展计划资助项目(2008AA01Z131)
国家自然科学基金资助项目(60803162)
关键词
知识融合
主题图
相似性算法
knowledge fusion
topic map
similarity algorithm