摘要
知识图谱构造常面临三元组错误或缺失等质量问题,准确性评估是选择和优化知识图谱的基础,对提升下游应用的可信性具有重要意义.引入嵌入模型,降低对人工标注数据的依赖性,提升大规模数据处理效率.将三元组正误判定转化为无标注的自动化阈值选择问题,提出了3种阈值选择策略,增强评估的鲁棒性.提出结合三元组重要性的评估方法,从网络结构和关系语义两方面定义重要性指标,对关键结构、频繁访问的三元组赋予更高关注度.从嵌入模型表征能力、知识图谱稠密度、三元组重要性计算方式等多个角度,分析比较了对评估方法性能的影响.实验表明,相比现有知识图谱准确性的自动化评估方法,在零样本条件下,所提出的方法可有效降低评估误差,平均降低接近30%,在错误率较高、稠密图谱的数据集上效果尤为显著.
Quality issues,such as errors or deficiencies in triplets,become increasingly prominent in knowledge graphs,severely affecting the credibility of downstream applications.Accuracy evaluation is crucial for building confidence in the use and optimization of knowledge graphs.An embedding-model-based method is proposed to reduce reliance on manually labeled data and to achieve scalable automatic evaluation.Triplet verification is formulated as an automated threshold selection problem,with three threshold selection strategies proposed to enhance the robustness of the evaluation.In addition,triplet importance indicators are incorporated to place greater emphasis on critical triplets,with importance scores defined based on network structure and relationship semantics.Experiments are conducted to analyze and compare the impact on performance from various perspectives,such as embedding model capacity,knowledge graph sparsity,and triplet importance definition.The results demonstrate that,compared to existing automated evaluation methods,the proposed method can significantly reduce evaluation errors by nearly 30%in zero-shot conditions,particularly on datasets of dense graphs with high error rates.
作者
张明韬
杨国利
白晓颖
ZHANG Ming-Tao;YANG Guo-Li;BAI Xiao-Ying(Advanced Institute of Big Data,Beijing,Beijing 100195,China;School of Computer Science,Peking University,Beijing 100871,China)
出处
《软件学报》
北大核心
2025年第12期5674-5694,共21页
Journal of Software
基金
国家自然科学基金(72201275)。
关键词
知识图谱
准确性评估
嵌入模型
knowledge graph
accuracy evaluation
embedding model