Entity relation extraction,a fundamental and essential task in natural language processing(NLP),has garnered significant attention over an extended period.,aiming to extract the core of semantic knowledge from unstruc...Entity relation extraction,a fundamental and essential task in natural language processing(NLP),has garnered significant attention over an extended period.,aiming to extract the core of semantic knowledge from unstructured text,i.e.,entities and the relations between them.At present,the main dilemma of Chinese entity relation extraction research lies in nested entities,relation overlap,and lack of entity relation interaction.This dilemma is particularly prominent in complex knowledge extraction tasks with high-density knowledge,imprecise syntactic structure,and lack of semantic roles.To address these challenges,this paper presents an innovative“character-level”Chinese part-of-speech(CN-POS)tagging approach and incorporates part-of-speech(POS)information into the pre-trained model,aiming to improve its semantic understanding and syntactic information processing capabilities.Additionally,A relation reference filling mechanism(RF)is proposed to enhance the semantic interaction between relations and entities,utilize relations to guide entity modeling,improve the boundary prediction ability of entity models for nested entity phenomena,and increase the cascading accuracy of entity-relation triples.Meanwhile,the“Queue”sub-task connection strategy is adopted to alleviate triplet cascading errors caused by overlapping relations,and a Syntax-enhanced entity relation extraction model(SE-RE)is constructed.The model showed excellent performance on the self-constructed E-commerce Product Information dataset(EPI)in this article.The results demonstrate that integrating POS enhancement into the pre-trained encoding model significantly boosts the performance of entity relation extraction models compared to baseline methods.Specifically,the F1-score fluctuation in subtasks caused by error accumulation was reduced by 3.21%,while the F1-score for entity-relation triplet extraction improved by 1.91%.展开更多
Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand ...Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand new research direction of ERE.In this paper,BERT is used to extracting entityrelations,and a separated pipeline architecture is proposed.ERE was decomposed into entity-relation classification sub-task and entity-pair annotation sub-task.Both sub-tasks conduct the pre-training and fine-tuning independently.Combining dynamic and static masking,newVerb-MLM and Entity-MLM BERT pre-training tasks were put forward to enhance the correlation between BERT pre-training and TargetedNLPdownstream task-ERE.Inter-layer sharing attentionmechanismwas added to the model,sharing the attention parameters according to the similarity of the attention matrix.Contrast experiment on the SemEavl 2010 Task8 dataset demonstrates that the new MLM task and inter-layer sharing attention mechanism improve the performance of BERT on the entity relation extraction effectively.展开更多
Entity and relation extraction is a critical task in information extraction.Recent approaches have emphasized obtaining improved span representations.However,existing work suffers from two major drawbacks.First,there ...Entity and relation extraction is a critical task in information extraction.Recent approaches have emphasized obtaining improved span representations.However,existing work suffers from two major drawbacks.First,there is an overabundance of low-quality candidate spans,which hinders the effective extraction of information from high-quality candidate spans.Second,the information encoded by existing marker strategies is often too simple to fully capture the nuances of the span,resulting in the loss of potentially valuable information.To address these issues,we propose an enhancing entity and relation extraction with high-quality spans and enhanced marker(HSEM)strategies,it assigns adaptive weights to different spans in order to make the model more focused on high quality spans.Specifically,the HSEM model enriches marker representation to incorporate more span information and enhance entity categorization.Additionally,we design a span scoring framework that assesses span quality based on the fusion of internal information and focuses the model on training high-quality samples to improve performance.Experimental results on six benchmark datasets demonstrate that our model achieves state-of-the-art results after discriminating span quality.展开更多
Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can b...Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.展开更多
Providing knowledge graphs for materials science facilitates understanding of the key data such as materials structure,property,etc.and their relations.However,very little work has been devoted to it.Meanwhile,immedia...Providing knowledge graphs for materials science facilitates understanding of the key data such as materials structure,property,etc.and their relations.However,very little work has been devoted to it.Meanwhile,immediately applying machine learning to materials computation still suffers from the lack of data and costly acquiring.To tackle these problems,we propose literature-aid automatic entity and relation extraction by deliberatively designed matching rules,especially for copper-based composites.Next,we fuse the knowledge by calculating the semantics similarity.Finally,the materials knowledge graphs are constructed and visualized on the Neo4j graph database.The experimental results show a total of 6,154 entities and 15,561 pairs of relations are extracted on the 69,600 open-accessed documents of copper-based composites,with their precision and accuracy rates over 80%.Further,we exemplify the effectiveness by building materials structure-property-value meta-paths and analyzing their impacts.展开更多
基金funded by the National Key Technology R&D Program of China under Grant No.2021YFD2100605the National Natural Science Foundation of China under Grant No.62433002+1 种基金the Project of Construction and Support for High-Level Innovative Teams of Beijing Municipal Institutions under Grant No.BPHR20220104Beijing Scholars Program under Grant No.099.
文摘Entity relation extraction,a fundamental and essential task in natural language processing(NLP),has garnered significant attention over an extended period.,aiming to extract the core of semantic knowledge from unstructured text,i.e.,entities and the relations between them.At present,the main dilemma of Chinese entity relation extraction research lies in nested entities,relation overlap,and lack of entity relation interaction.This dilemma is particularly prominent in complex knowledge extraction tasks with high-density knowledge,imprecise syntactic structure,and lack of semantic roles.To address these challenges,this paper presents an innovative“character-level”Chinese part-of-speech(CN-POS)tagging approach and incorporates part-of-speech(POS)information into the pre-trained model,aiming to improve its semantic understanding and syntactic information processing capabilities.Additionally,A relation reference filling mechanism(RF)is proposed to enhance the semantic interaction between relations and entities,utilize relations to guide entity modeling,improve the boundary prediction ability of entity models for nested entity phenomena,and increase the cascading accuracy of entity-relation triples.Meanwhile,the“Queue”sub-task connection strategy is adopted to alleviate triplet cascading errors caused by overlapping relations,and a Syntax-enhanced entity relation extraction model(SE-RE)is constructed.The model showed excellent performance on the self-constructed E-commerce Product Information dataset(EPI)in this article.The results demonstrate that integrating POS enhancement into the pre-trained encoding model significantly boosts the performance of entity relation extraction models compared to baseline methods.Specifically,the F1-score fluctuation in subtasks caused by error accumulation was reduced by 3.21%,while the F1-score for entity-relation triplet extraction improved by 1.91%.
基金Hainan Province High level talent project of basic and applied basic research plan(Natural Science Field)in 2019(No.2019RC100)Haikou City Key Science and Technology Plan Project(2020–049)Hainan Province Key Research and Development Project(ZDYF2020018).
文摘Entity relation extraction(ERE)is an important task in the field of information extraction.With the wide application of pre-training language model(PLM)in natural language processing(NLP),using PLM has become a brand new research direction of ERE.In this paper,BERT is used to extracting entityrelations,and a separated pipeline architecture is proposed.ERE was decomposed into entity-relation classification sub-task and entity-pair annotation sub-task.Both sub-tasks conduct the pre-training and fine-tuning independently.Combining dynamic and static masking,newVerb-MLM and Entity-MLM BERT pre-training tasks were put forward to enhance the correlation between BERT pre-training and TargetedNLPdownstream task-ERE.Inter-layer sharing attentionmechanismwas added to the model,sharing the attention parameters according to the similarity of the attention matrix.Contrast experiment on the SemEavl 2010 Task8 dataset demonstrates that the new MLM task and inter-layer sharing attention mechanism improve the performance of BERT on the entity relation extraction effectively.
文摘Entity and relation extraction is a critical task in information extraction.Recent approaches have emphasized obtaining improved span representations.However,existing work suffers from two major drawbacks.First,there is an overabundance of low-quality candidate spans,which hinders the effective extraction of information from high-quality candidate spans.Second,the information encoded by existing marker strategies is often too simple to fully capture the nuances of the span,resulting in the loss of potentially valuable information.To address these issues,we propose an enhancing entity and relation extraction with high-quality spans and enhanced marker(HSEM)strategies,it assigns adaptive weights to different spans in order to make the model more focused on high quality spans.Specifically,the HSEM model enriches marker representation to incorporate more span information and enhance entity categorization.Additionally,we design a span scoring framework that assesses span quality based on the fusion of internal information and focuses the model on training high-quality samples to improve performance.Experimental results on six benchmark datasets demonstrate that our model achieves state-of-the-art results after discriminating span quality.
基金Supported by the National Natural Science Foundation of China (71804017)the R&D Program of Beijing Municipal Education Commission (KZ202210005013)the Sichuan Social Science Planning Project (SC22B151)。
文摘Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.
基金partially supported by the Natural Science Foundation of China(62062046)
文摘Providing knowledge graphs for materials science facilitates understanding of the key data such as materials structure,property,etc.and their relations.However,very little work has been devoted to it.Meanwhile,immediately applying machine learning to materials computation still suffers from the lack of data and costly acquiring.To tackle these problems,we propose literature-aid automatic entity and relation extraction by deliberatively designed matching rules,especially for copper-based composites.Next,we fuse the knowledge by calculating the semantics similarity.Finally,the materials knowledge graphs are constructed and visualized on the Neo4j graph database.The experimental results show a total of 6,154 entities and 15,561 pairs of relations are extracted on the 69,600 open-accessed documents of copper-based composites,with their precision and accuracy rates over 80%.Further,we exemplify the effectiveness by building materials structure-property-value meta-paths and analyzing their impacts.