摘要
从非结构化的文本中抽取实体关系三元组是构建大规模知识图谱的基础.由于实际抽取任务的数据标注往往是不平衡的,例如负样本的数量远超于正样本,或者简单样本的比例过高,导致模型训练易受到负样本或简单样本的支配.为了提高标注不平衡的三元组抽取性能,提出一个基于多任务交互特征提取的联合优化框架.该框架首先扩展切分网络(PFN)完成3个子任务(主语识别,宾语识别和主宾对齐)的特征提取,使得3个子任务既能互相交互,又能专注于自己的任务.其次引入改进的Dice损失以解决主宾关联矩阵不平衡的问题,同时在联合优化中引入均方差不确定性,以减少各个子任务噪声的影响.实验结果表明,所提方法在数据集NYT和WebNLG上取得了最好的综合性能.
Entity-relation triples extracted from unstructured texts are the basis for building large-scale knowledge graphs.The inherent imbalance in data annotation for extraction tasks,with a higher number of negative samples or an excessively high proportion of simple samples,can lead a model being dominated by negatives or simples during training.To improve the performance of triplets extraction with labeling imbalance,we propose a joint optimization framework based on multi-task interaction feature extraction.In this framework,we extend Partition Filter Network(PFN)to achieve the feature extraction for subtasks(including subject recognition,object recognition and subject-object alignment),enabling them to not only interact among themselves but also dedicatedly focus on their respective tasks.Additionally,we introduce an improved Dice Loss to address the imbalance in the subject-object correlation matrix.Simultaneously,uncertainty in mean square deviation is incorporated during joint optimization to reduce the impact of noise in each subtask.Experimental results show that our proposed method outperforms existing approaches on two common benchmark datasets(NYT and WebNLG).
作者
徐新黎
卢齐林
杨旭华
黄玉娇
龙海霞
马钢峰
XU Xinli;LU Qilin;YANG Xuhua;HUANG Yujiao;LONG Haixia;MA Gangfeng(College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China;College of Zhijiang,Zhejiang University of Technology,Shaoxing 312030,China)
出处
《小型微型计算机系统》
北大核心
2025年第6期1333-1341,共9页
Journal of Chinese Computer Systems
基金
浙江省科技发展重点项目(2024C03274,2022C03113)资助
国家自然科学基金项目(62176236,62106225)资助
浙江省自然科学基金项目(LY23F030008)资助.
关键词
三元组抽取
多任务交互
信息抽取
知识图谱
triplets extraction
multi-task interaction
information extraction
knowledge graph