摘要
针对XML文档半结构化的特点及传统tf-idf方法仅考虑关键字在文档中出现的频率,而未考虑XML文档中节点的语义信息问题,利用向量空间模型,设计一种基于XML关键字查询结果的相关度排序策略.相关度计算充分考虑XML文档中各节点对文档的区分程度、节点描述文档的明确程度及节点描述文档的直接程度,以提高节点权重度量的准确性,从而将最相关的信息提供给用户,经DBLP数据集实验验证了该方法的有效性.
Aiming at the semi-structured characteristics of XML document and the traditional tf-idf method only considering the frequency of keyword in the document,not considering the lack of semantic information of the nodes in the XML document,we designed the relevance ranking strategies that were designed based on XML keyword searching results via the vector space model.To improve the accuracy of the measure of the node weights,correlation calculation fully considers the distinctive degree of the nodes in the XML document,the clear and direct degree of nodes describing the document so as to provide the most relevant information to users.Experimental results show that the proposed method is effective.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2013年第6期1118-1122,共5页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:61075049)
安徽省高校优秀青年人才基金(批准号:2012SQRL194)
安徽省高校自然科学研究项目(批准号:KJ2012Z428)