In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to er...In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to error propagation.To overcome the limitations of traditional pipeline models,recent research has focused on jointly modeling the two key subtasks-named entity recognition and relation extraction-within a unified framework.To support future research,this paper provides a comprehensive review of recently published studies in the field of relational triplet extraction.The review examines commonly used public datasets for relational triplet extraction techniques and systematically reviews current mainstream joint extraction methods,including joint decoding methods and parameter sharing methods,with joint decoding methods further divided into table filling,tagging,and sequence-to-sequence approaches.In addition,this paper also conducts small-scale replication experiments on models that have performed well in recent years for each method to verify the reproducibility of the code and to compare the performance of different models under uniform conditions.Each method has its own advantages in terms of model design,task handling,and application scenarios,but also faces challenges such as processing complex sentence structures,cross-sentence relation extraction,and adaptability in low-resource environments.Finally,this paper systematically summarizes each method and discusses the future development prospects of joint extraction of relational triples.展开更多
Edge computing,a novel paradigm for performing computations at the network edge,holds significant relevance in the healthcare domain for extracting medical knowledge from traditional Uygur medical texts.Medical knowle...Edge computing,a novel paradigm for performing computations at the network edge,holds significant relevance in the healthcare domain for extracting medical knowledge from traditional Uygur medical texts.Medical knowledge extraction methods based on edge computing deploy deep learning models on edge devices to achieve localized entity and relation extraction.This approach avoids transferring substantial sensitive data to cloud data centers,effectively safeguarding the privacy of healthcare services.However,existing relation extraction methods mainly employ a sequential pipeline approach,which classifies relations between determined entities after entity recognition.This mode faces challenges such as error propagation between tasks,insufficient consideration of dependencies between the two subtasks,and the neglect of interrelations between different relations within a sentence.To address these challenges,a joint extraction model with parameter sharing in edge computing is proposed,named CoEx-Bert.This model leverages shared parameterization between two models to jointly extract entities and relations.Specifically,CoEx-Bert employs two models,each separately sharing hidden layer parameters,and combines these two loss functions for joint backpropagation to optimize the model parameters.Additionally,it effectively resolves the issue of entity overlapping when extracting knowledge from unstructured Uygur medical texts by considering contextual relations.Finally,this model is deployed on edge devices for real-time extraction and inference of Uygur medical knowledge.Experimental results demonstrate that CoEx-Bert outperforms existing state-of-the-art methods,achieving accuracy,recall,and F1-score of 90.65%,92.45%,and 91.54%,respectively,in the Uygur traditional medical literature dataset.These improvements represent a 6.45%increase in accuracy,a 9.45%increase in recall,and a 7.95%increase in F1-score compared to the baseline.展开更多
Extracting valuable information frombiomedical texts is one of the current research hotspots of concern to a wide range of scholars.The biomedical corpus contains numerous complex long sentences and overlapping relati...Extracting valuable information frombiomedical texts is one of the current research hotspots of concern to a wide range of scholars.The biomedical corpus contains numerous complex long sentences and overlapping relational triples,making most generalized domain joint modeling methods difficult to apply effectively in this field.For a complex semantic environment in biomedical texts,in this paper,we propose a novel perspective to perform joint entity and relation extraction;existing studies divide the relation triples into several steps or modules.However,the three elements in the relation triples are interdependent and inseparable,so we regard joint extraction as a tripartite classification problem.At the same time,fromthe perspective of triple classification,we design amulti-granularity 2D convolution to refine the word pair table and better utilize the dependencies between biomedical word pairs.Finally,we use a biaffine predictor to assist in predicting the labels of word pairs for relation extraction.Our model(MCTPL)Multi-granularity Convolutional Tokens Pairs of Labeling better utilizes the elements of triples and improves the ability to extract overlapping triples compared to previous approaches.Finally,we evaluated our model on two publicly accessible datasets.The experimental results show that our model’s ability to extract relation triples on the CPI dataset improves the F1 score by 2.34%compared to the current optimal model.On the DDI dataset,the F1 value improves the F1 value by 1.68%compared to the current optimal model.Our model achieved state-of-the-art performance compared to other baseline models in biomedical text entity relation extraction.展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can b...Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.展开更多
Spatial relation extraction is the process of identifying geographic entities from text and determining their corresponding spatial relations.Traditional spatial relation extraction mainly uses rule-based pattern matc...Spatial relation extraction is the process of identifying geographic entities from text and determining their corresponding spatial relations.Traditional spatial relation extraction mainly uses rule-based pattern matching,supervised learning-based or unsupervised learning-based methods.However,these methods suffer from poor time-sensitive,high labor cost and high dependence on large-scale data.With the development of pre-trained language models greatly alleviating the shortcomings of traditional methods,supervised learning methods incorporating pre-trained language models have become the mainstream relation extraction methods.Pipeline extraction and joint extraction,as the two most dominant ideas of relation extraction,both have obtained good performance on different datasets,and whether to share the contextual information of entities and relations is the main differences between the two ideas.In this paper,we compare the performance of two ideas oriented to spatial relation extraction based on Chinese corpus data in the field of geography and verify which method based on pre-trained language models is more suitable for Chinese spatial relation extraction.We fine-tuned the hyperparameters of the two models to optimize the extraction accuracy before the comparison experiments.The results of the comparison experiments show that pipeline extraction performs better than joint extraction of spatial relation extraction for Chinese text data with sentence granularity,because different tasks have different focus on contextual information,and it is difficult to take account into the needs of both tasks by sharing contextual information.In addition,we further compare the performance of the two models with the rule-based template approach in extracting topological,directional and distance relations,summarize the shortcomings of this experiment and provide an outlook for future work.展开更多
As Satellite Frequency and Orbit(SFO)constitute scarce natural resources,constructing a Satellite Frequency and Orbit Knowledge Graph(SFO-KG)becomes crucial for optimizing their utilization.In the process of building ...As Satellite Frequency and Orbit(SFO)constitute scarce natural resources,constructing a Satellite Frequency and Orbit Knowledge Graph(SFO-KG)becomes crucial for optimizing their utilization.In the process of building the SFO-KG from Chinese unstructured data,extracting Chinese entity relations is the fundamental step.Although Relation Extraction(RE)methods in the English field have been extensively studied and developed earlier than their Chinese counterparts,their direct application to Chinese texts faces significant challenges due to linguistic distinctions such as unique grammar,pictographic characters,and prevalent polysemy.The absence of comprehensive reviews on Chinese RE research progress necessitates a systematic investigation.A thorough review of Chinese RE has been conducted from four methodological approaches:pipeline RE,joint entityrelation extraction,open domain RE,and multimodal RE techniques.In addition,we further analyze the essential research infrastructure,including specialized datasets,evaluation benchmarks,and competitions within Chinese RE research.Finally,the current research challenges and development trends in the field of Chinese RE were summarized and analyzed from the perspectives of ecological construction methods for datasets,open domain RE,N-ary RE,and RE based on large language models.This comprehensive review aims to facilitate SFO-KG construction and its practical applications in SFO resource management.展开更多
A comprehensive examination of the victimization process,coupled with the development of effective preventive strategies,represents the most promising approach for mitigating telecom network fraud.However,the limited ...A comprehensive examination of the victimization process,coupled with the development of effective preventive strategies,represents the most promising approach for mitigating telecom network fraud.However,the limited availability of telecom fraud case text data hinders the advancement of robust data extraction algorithms,thereby complicating the identification of victimization patterns.To address this gap,this study proposes a victimization process analysis model that leverages mixed expert event joint extraction,utilizing real telecom fraud case data.The model integrates LERT-MoE to extract trigger words and arguments related to the victimization process from law enforcement reports,followed by the application of a dot-product attention mechanism for argument role classification.To the best of our knowledge,this represents the first attempt to apply a mixture-of-experts model with a purpose-built dot-product attention mechanism for the in-depth analysis of telecom network fraud victimization patterns,overcoming the limitations of previous methods in managing the complexity and diversity of fraudulent behaviors.Additionally,the Apriori method is employed to uncover prevalent behavioral patterns in the victimization process.Experimental results demonstrate that the proposed model outperforms baseline models in precision,accuracy,and F1-score for event extraction tasks in telecom fraud instances.Furthermore,the model identifies more granular fraud patterns within the victimization process,offering a valuable knowledge base for the development of targeted preventive strategies.The identified patterns can be used to design focused awareness campaigns,enhance fraud detection algorithms,and improve law enforcement training,thereby significantly increasing the effectiveness of anti-fraud initiatives.展开更多
基金funding from Key Areas Science and Technology Research Plan of Xinjiang Production And Construction Corps Financial Science and Technology Plan Project under Grant Agreement No.2023AB048 for the project:Research and Application Demonstration of Data-driven Elderly Care System.
文摘In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to error propagation.To overcome the limitations of traditional pipeline models,recent research has focused on jointly modeling the two key subtasks-named entity recognition and relation extraction-within a unified framework.To support future research,this paper provides a comprehensive review of recently published studies in the field of relational triplet extraction.The review examines commonly used public datasets for relational triplet extraction techniques and systematically reviews current mainstream joint extraction methods,including joint decoding methods and parameter sharing methods,with joint decoding methods further divided into table filling,tagging,and sequence-to-sequence approaches.In addition,this paper also conducts small-scale replication experiments on models that have performed well in recent years for each method to verify the reproducibility of the code and to compare the performance of different models under uniform conditions.Each method has its own advantages in terms of model design,task handling,and application scenarios,but also faces challenges such as processing complex sentence structures,cross-sentence relation extraction,and adaptability in low-resource environments.Finally,this paper systematically summarizes each method and discusses the future development prospects of joint extraction of relational triples.
文摘Edge computing,a novel paradigm for performing computations at the network edge,holds significant relevance in the healthcare domain for extracting medical knowledge from traditional Uygur medical texts.Medical knowledge extraction methods based on edge computing deploy deep learning models on edge devices to achieve localized entity and relation extraction.This approach avoids transferring substantial sensitive data to cloud data centers,effectively safeguarding the privacy of healthcare services.However,existing relation extraction methods mainly employ a sequential pipeline approach,which classifies relations between determined entities after entity recognition.This mode faces challenges such as error propagation between tasks,insufficient consideration of dependencies between the two subtasks,and the neglect of interrelations between different relations within a sentence.To address these challenges,a joint extraction model with parameter sharing in edge computing is proposed,named CoEx-Bert.This model leverages shared parameterization between two models to jointly extract entities and relations.Specifically,CoEx-Bert employs two models,each separately sharing hidden layer parameters,and combines these two loss functions for joint backpropagation to optimize the model parameters.Additionally,it effectively resolves the issue of entity overlapping when extracting knowledge from unstructured Uygur medical texts by considering contextual relations.Finally,this model is deployed on edge devices for real-time extraction and inference of Uygur medical knowledge.Experimental results demonstrate that CoEx-Bert outperforms existing state-of-the-art methods,achieving accuracy,recall,and F1-score of 90.65%,92.45%,and 91.54%,respectively,in the Uygur traditional medical literature dataset.These improvements represent a 6.45%increase in accuracy,a 9.45%increase in recall,and a 7.95%increase in F1-score compared to the baseline.
基金supported by the National Natural Science Foundation of China(Nos.62002206 and 62202373)the open topic of the Green Development Big Data Decision-Making Key Laboratory(DM202003).
文摘Extracting valuable information frombiomedical texts is one of the current research hotspots of concern to a wide range of scholars.The biomedical corpus contains numerous complex long sentences and overlapping relational triples,making most generalized domain joint modeling methods difficult to apply effectively in this field.For a complex semantic environment in biomedical texts,in this paper,we propose a novel perspective to perform joint entity and relation extraction;existing studies divide the relation triples into several steps or modules.However,the three elements in the relation triples are interdependent and inseparable,so we regard joint extraction as a tripartite classification problem.At the same time,fromthe perspective of triple classification,we design amulti-granularity 2D convolution to refine the word pair table and better utilize the dependencies between biomedical word pairs.Finally,we use a biaffine predictor to assist in predicting the labels of word pairs for relation extraction.Our model(MCTPL)Multi-granularity Convolutional Tokens Pairs of Labeling better utilizes the elements of triples and improves the ability to extract overlapping triples compared to previous approaches.Finally,we evaluated our model on two publicly accessible datasets.The experimental results show that our model’s ability to extract relation triples on the CPI dataset improves the F1 score by 2.34%compared to the current optimal model.On the DDI dataset,the F1 value improves the F1 value by 1.68%compared to the current optimal model.Our model achieved state-of-the-art performance compared to other baseline models in biomedical text entity relation extraction.
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
基金Supported by the National Natural Science Foundation of China (71804017)the R&D Program of Beijing Municipal Education Commission (KZ202210005013)the Sichuan Social Science Planning Project (SC22B151)。
文摘Aiming at the lack of classification and good standard corpus in the task of joint entity and relationship extraction in the current Chinese academic field, this paper builds a dataset in management science that can be used for joint entity and relationship extraction, and establishes a deep learning model to extract entity and relationship information from scientific texts. With the definition of entity and relation classification, we build a Chinese scientific text corpus dataset based on the abstract texts of projects funded by the National Natural Science Foundation of China(NSFC) in 2018–2019. By combining the word2vec features with the clue word feature which is a kind of special style in scientific documents, we establish a joint entity relationship extraction model based on the Bi LSTM-CNN-CRF model for scientific information extraction. The dataset we constructed contains 13060 entities(not duplicated) and 9728 entity relation labels. In terms of entity prediction effect, the accuracy rate of the constructed model reaches 69.15%, the recall rate reaches 61.03%, and the F1 value reaches 64.83%. In terms of relationship prediction effect, the accuracy rate is higher than that of entity prediction, which reflects the effectiveness of the input mixed features and the integration of local features with CNN layer in the model.
基金supported by the National Key Research and Development Program of China under[Grant number 2021YFB3900903]the National Natural Science Foundation of China under[Grant number 41971337].
文摘Spatial relation extraction is the process of identifying geographic entities from text and determining their corresponding spatial relations.Traditional spatial relation extraction mainly uses rule-based pattern matching,supervised learning-based or unsupervised learning-based methods.However,these methods suffer from poor time-sensitive,high labor cost and high dependence on large-scale data.With the development of pre-trained language models greatly alleviating the shortcomings of traditional methods,supervised learning methods incorporating pre-trained language models have become the mainstream relation extraction methods.Pipeline extraction and joint extraction,as the two most dominant ideas of relation extraction,both have obtained good performance on different datasets,and whether to share the contextual information of entities and relations is the main differences between the two ideas.In this paper,we compare the performance of two ideas oriented to spatial relation extraction based on Chinese corpus data in the field of geography and verify which method based on pre-trained language models is more suitable for Chinese spatial relation extraction.We fine-tuned the hyperparameters of the two models to optimize the extraction accuracy before the comparison experiments.The results of the comparison experiments show that pipeline extraction performs better than joint extraction of spatial relation extraction for Chinese text data with sentence granularity,because different tasks have different focus on contextual information,and it is difficult to take account into the needs of both tasks by sharing contextual information.In addition,we further compare the performance of the two models with the rule-based template approach in extracting topological,directional and distance relations,summarize the shortcomings of this experiment and provide an outlook for future work.
文摘As Satellite Frequency and Orbit(SFO)constitute scarce natural resources,constructing a Satellite Frequency and Orbit Knowledge Graph(SFO-KG)becomes crucial for optimizing their utilization.In the process of building the SFO-KG from Chinese unstructured data,extracting Chinese entity relations is the fundamental step.Although Relation Extraction(RE)methods in the English field have been extensively studied and developed earlier than their Chinese counterparts,their direct application to Chinese texts faces significant challenges due to linguistic distinctions such as unique grammar,pictographic characters,and prevalent polysemy.The absence of comprehensive reviews on Chinese RE research progress necessitates a systematic investigation.A thorough review of Chinese RE has been conducted from four methodological approaches:pipeline RE,joint entityrelation extraction,open domain RE,and multimodal RE techniques.In addition,we further analyze the essential research infrastructure,including specialized datasets,evaluation benchmarks,and competitions within Chinese RE research.Finally,the current research challenges and development trends in the field of Chinese RE were summarized and analyzed from the perspectives of ecological construction methods for datasets,open domain RE,N-ary RE,and RE based on large language models.This comprehensive review aims to facilitate SFO-KG construction and its practical applications in SFO resource management.
基金supported in part by a grant from Bejing Natural Science Foundation(No.9244025)project.
文摘A comprehensive examination of the victimization process,coupled with the development of effective preventive strategies,represents the most promising approach for mitigating telecom network fraud.However,the limited availability of telecom fraud case text data hinders the advancement of robust data extraction algorithms,thereby complicating the identification of victimization patterns.To address this gap,this study proposes a victimization process analysis model that leverages mixed expert event joint extraction,utilizing real telecom fraud case data.The model integrates LERT-MoE to extract trigger words and arguments related to the victimization process from law enforcement reports,followed by the application of a dot-product attention mechanism for argument role classification.To the best of our knowledge,this represents the first attempt to apply a mixture-of-experts model with a purpose-built dot-product attention mechanism for the in-depth analysis of telecom network fraud victimization patterns,overcoming the limitations of previous methods in managing the complexity and diversity of fraudulent behaviors.Additionally,the Apriori method is employed to uncover prevalent behavioral patterns in the victimization process.Experimental results demonstrate that the proposed model outperforms baseline models in precision,accuracy,and F1-score for event extraction tasks in telecom fraud instances.Furthermore,the model identifies more granular fraud patterns within the victimization process,offering a valuable knowledge base for the development of targeted preventive strategies.The identified patterns can be used to design focused awareness campaigns,enhance fraud detection algorithms,and improve law enforcement training,thereby significantly increasing the effectiveness of anti-fraud initiatives.