As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge d...As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge determines the quality of GeoKG,thus drawing considerable attention in the related domains.Mass unstructured geographic knowledge scattered in web texts has been regarded as a potential source for enriching the triplets in GeoKGs.The crux of triplet extraction from web texts lies in the detection of key phrases indicating the correct geo-relations between geo-entities.However,the current methods for key-phrase detection are ineffective because the sparseness of the terms in the web texts describing geo-relations results in an insufficient training corpus.In this study,an unsupervised context-enhanced method is proposed to detect geo-relation key phrases from web texts for extracting triplets.External semantic knowledge is introduced to relieve the influence of the sparseness of the georelation description terms in web texts.Specifically,the contexts of geo-entities are fused with category semantic knowledge and word semantic knowledge.Subsequently,an enhanced corpus is generated using frequency-based statistics.Finally,the geo-relation key phrases are detected from the enhanced contexts using the statistical lexical features from the enhanced corpus.Experiments are conducted with real web texts.In comparison with the well-known frequency-based methods,the proposed method improves the precision of detecting the key phrases of the geo-relation description by approximately 20%.Moreover,compared with the well-defined geo-relation properties in DBpedia,the proposed method provides quintuple key-phrases for indicating the geo-relations between geo-entities,which facilitate the generation of new triplets from web texts.展开更多
In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to er...In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to error propagation.To overcome the limitations of traditional pipeline models,recent research has focused on jointly modeling the two key subtasks-named entity recognition and relation extraction-within a unified framework.To support future research,this paper provides a comprehensive review of recently published studies in the field of relational triplet extraction.The review examines commonly used public datasets for relational triplet extraction techniques and systematically reviews current mainstream joint extraction methods,including joint decoding methods and parameter sharing methods,with joint decoding methods further divided into table filling,tagging,and sequence-to-sequence approaches.In addition,this paper also conducts small-scale replication experiments on models that have performed well in recent years for each method to verify the reproducibility of the code and to compare the performance of different models under uniform conditions.Each method has its own advantages in terms of model design,task handling,and application scenarios,but also faces challenges such as processing complex sentence structures,cross-sentence relation extraction,and adaptability in low-resource environments.Finally,this paper systematically summarizes each method and discusses the future development prospects of joint extraction of relational triples.展开更多
基金This research was supported by the National Natural Science Foundation of China[41631177,41801320].
文摘As an effective organization form of geographic information,a geographic knowledge graph(GeoKG)facilitates numerous geography-related analyses and services.The completeness of triplets regarding geographic knowledge determines the quality of GeoKG,thus drawing considerable attention in the related domains.Mass unstructured geographic knowledge scattered in web texts has been regarded as a potential source for enriching the triplets in GeoKGs.The crux of triplet extraction from web texts lies in the detection of key phrases indicating the correct geo-relations between geo-entities.However,the current methods for key-phrase detection are ineffective because the sparseness of the terms in the web texts describing geo-relations results in an insufficient training corpus.In this study,an unsupervised context-enhanced method is proposed to detect geo-relation key phrases from web texts for extracting triplets.External semantic knowledge is introduced to relieve the influence of the sparseness of the georelation description terms in web texts.Specifically,the contexts of geo-entities are fused with category semantic knowledge and word semantic knowledge.Subsequently,an enhanced corpus is generated using frequency-based statistics.Finally,the geo-relation key phrases are detected from the enhanced contexts using the statistical lexical features from the enhanced corpus.Experiments are conducted with real web texts.In comparison with the well-known frequency-based methods,the proposed method improves the precision of detecting the key phrases of the geo-relation description by approximately 20%.Moreover,compared with the well-defined geo-relation properties in DBpedia,the proposed method provides quintuple key-phrases for indicating the geo-relations between geo-entities,which facilitate the generation of new triplets from web texts.
基金funding from Key Areas Science and Technology Research Plan of Xinjiang Production And Construction Corps Financial Science and Technology Plan Project under Grant Agreement No.2023AB048 for the project:Research and Application Demonstration of Data-driven Elderly Care System.
文摘In recent years,with the rapid development of deep learning technology,relational triplet extraction techniques have also achieved groundbreaking progress.Traditional pipeline models have certain limitations due to error propagation.To overcome the limitations of traditional pipeline models,recent research has focused on jointly modeling the two key subtasks-named entity recognition and relation extraction-within a unified framework.To support future research,this paper provides a comprehensive review of recently published studies in the field of relational triplet extraction.The review examines commonly used public datasets for relational triplet extraction techniques and systematically reviews current mainstream joint extraction methods,including joint decoding methods and parameter sharing methods,with joint decoding methods further divided into table filling,tagging,and sequence-to-sequence approaches.In addition,this paper also conducts small-scale replication experiments on models that have performed well in recent years for each method to verify the reproducibility of the code and to compare the performance of different models under uniform conditions.Each method has its own advantages in terms of model design,task handling,and application scenarios,but also faces challenges such as processing complex sentence structures,cross-sentence relation extraction,and adaptability in low-resource environments.Finally,this paper systematically summarizes each method and discusses the future development prospects of joint extraction of relational triples.