文档级关系抽取(document-level relation extraction,DocRE)旨在从自然语言文档中识别实体间的语义关系.针对现有方法多利用部分句子作为证据,存在证据利用率低和类别不平衡的问题,提出基于优先焦点损失和证据推理融合的文档级关系抽取...文档级关系抽取(document-level relation extraction,DocRE)旨在从自然语言文档中识别实体间的语义关系.针对现有方法多利用部分句子作为证据,存在证据利用率低和类别不平衡的问题,提出基于优先焦点损失和证据推理融合的文档级关系抽取(prioritized focal loss and evidence inference fusion for document-level relation extraction,PFLEI-RE)模型,结合证据推理融合和优先焦点损失以提升性能.证据推理融合策略整合关键证据,生成伪文档,并与原文结合以增强复杂推理能力.通过引入优先焦点损失函数,以缓解类别不平衡问题,强化少数类学习.实验结果表明:PFLEI-RE模型在DocRED数据集上取得62.76的F1和60.67的Ign_F1,较基线ATLOP分别提升了1.46和1.36,验证了其有效性.展开更多
With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from com...With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from complex document information and establish coherent information links arise. In this work, we present a framework for knowledge graph construction in the industrial domain, predicated on knowledge-enhanced document-level entity and relation extraction. This approach alleviates the shortage of annotated data in the industrial domain and models the interplay of industrial documents. To augment the accuracy of named entity recognition, domain-specific knowledge is incorporated into the initialization of the word embedding matrix within the bidirectional long short-term memory conditional random field (BiLSTM-CRF) framework. For relation extraction, this paper introduces the knowledge-enhanced graph inference (KEGI) network, a pioneering method designed for long paragraphs in the industrial domain. This method discerns intricate interactions among entities by constructing a document graph and innovatively integrates knowledge representation into both node construction and path inference through TransR. On the application stratum, BiLSTM-CRF and KEGI are utilized to craft a knowledge graph from a knowledge representation model and Chinese fault reports for a steel production line, specifically SPOnto and SPFRDoc. The F1 value for entity and relation extraction has been enhanced by 2% to 6%. The quality of the extracted knowledge graph complies with the requirements of real-world production environment applications. The results demonstrate that KEGI can profoundly delve into production reports, extracting a wealth of knowledge and patterns, thereby providing a comprehensive solution for production management.展开更多
文摘文档级关系抽取(document-level relation extraction,DocRE)旨在从自然语言文档中识别实体间的语义关系.针对现有方法多利用部分句子作为证据,存在证据利用率低和类别不平衡的问题,提出基于优先焦点损失和证据推理融合的文档级关系抽取(prioritized focal loss and evidence inference fusion for document-level relation extraction,PFLEI-RE)模型,结合证据推理融合和优先焦点损失以提升性能.证据推理融合策略整合关键证据,生成伪文档,并与原文结合以增强复杂推理能力.通过引入优先焦点损失函数,以缓解类别不平衡问题,强化少数类学习.实验结果表明:PFLEI-RE模型在DocRED数据集上取得62.76的F1和60.67的Ign_F1,较基线ATLOP分别提升了1.46和1.36,验证了其有效性.
基金supported by the National Science and Technology Innovation 2030 New Generation Artificial Intelligence Major Project(Grant No.2018AAA0101800)the National Natural Science Foundation of China(Grant No.72271188).
文摘With the escalating complexity in production scenarios, vast amounts of production information are retained within enterprises in the industrial domain. Probing questions of how to meticulously excavate value from complex document information and establish coherent information links arise. In this work, we present a framework for knowledge graph construction in the industrial domain, predicated on knowledge-enhanced document-level entity and relation extraction. This approach alleviates the shortage of annotated data in the industrial domain and models the interplay of industrial documents. To augment the accuracy of named entity recognition, domain-specific knowledge is incorporated into the initialization of the word embedding matrix within the bidirectional long short-term memory conditional random field (BiLSTM-CRF) framework. For relation extraction, this paper introduces the knowledge-enhanced graph inference (KEGI) network, a pioneering method designed for long paragraphs in the industrial domain. This method discerns intricate interactions among entities by constructing a document graph and innovatively integrates knowledge representation into both node construction and path inference through TransR. On the application stratum, BiLSTM-CRF and KEGI are utilized to craft a knowledge graph from a knowledge representation model and Chinese fault reports for a steel production line, specifically SPOnto and SPFRDoc. The F1 value for entity and relation extraction has been enhanced by 2% to 6%. The quality of the extracted knowledge graph complies with the requirements of real-world production environment applications. The results demonstrate that KEGI can profoundly delve into production reports, extracting a wealth of knowledge and patterns, thereby providing a comprehensive solution for production management.