Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite a...Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite advancements in intelligent and digital technologies,assembly process design still heavily relies on manual knowledge reuse,and inefficiencies and inconsistent quality in process documentation are caused.To address the aforementioned issues,this paper proposes a knowledge push method of complex product assembly process design based on distillation model-based dynamically enhanced graph and Bayesian network.First,an initial knowledge graph is constructed using a BERT-BiLSTM-CRF model trained with integrated human expertise and a fine-tuned large language model.Then,a confidence-based dynamic weighted fusion strategy is employed to achieve dynamic incremental construction of the knowledge graph with low resource consumption.Subsequently,a Bayesian network model is constructed based on the relationships between assembly components,assembly features,and operations.Bayesian network reasoning is used to push assembly process knowledge under different design requirements.Finally,the feasibility of the Bayesian network construction method and the effectiveness of Bayesian network reasoning are verified through a specific example,significantly improving the utilization of assembly process knowledge and the efficiency of assembly process design.展开更多
Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representati...Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.展开更多
Spleen-Stomach disorders are prevalent clinical conditions in Traditional Chinese Medicine(TCM).The complex diagnostic and treatment model used in TCM is based on a“symptom-pattern-disease-formula”framework that hea...Spleen-Stomach disorders are prevalent clinical conditions in Traditional Chinese Medicine(TCM).The complex diagnostic and treatment model used in TCM is based on a“symptom-pattern-disease-formula”framework that heavily relies on practitioners’experience.However,this model faces several challenges,including ambiguous knowledge representation,unstructured data,and difficulties with knowledge sharing.Recent advancements in artificial intelligence,natural language processing,and medical knowledge engineering have significantly improved research on knowledge graphs(KGs)and intelligent diagnosis and treatment systems for these disorders,making these technologies crucial for modernizing TCM.This article systematically reviews two core research pathways related to Spleen-Stomach disorders.The first pathway focuses on constructing knowledge graphs for“structured knowledge representation”.This includes ontology modeling,entity recognition,relation extraction,graph fusion,semantic reasoning,visualization services,and an ensemble model to predict treatment efficacy.The second pathway involves the development of intelligent diagnosis and treatment systems,with a focus on“clinical applications”.This pathway includes key technologies such as quantitative modeling of TCM,the four diagnostic methods(inspection,auscultation-olfaction,interrogation,and palpation),semantic analysis of classical texts,pattern differentiation algorithms,and multimodal consultation recommenders.Through the synthesis and analysis of current research,several ongoing challenges have been identified.These include inconsistent models and annotation of TCM clinical knowledge,limited semantic reasoning capabilities,insufficient integration between KGs and intelligent diagnostic models,and limited clinical adaptability of existing intelligent diagnostic systems.To address these challenges,this review suggests future research directions that include enhancing heterogeneous multisource knowledge integration techniques,deepening semantic reasoning through collaborative reasoning frameworks that incorporate large language models,and developing effective cross-disease transfer learning strategies.These directions aim to improve interpretability,reasoning accuracy,and clinical applicability of intelligent diagnosis and treatment systems for Spleen-Stomach disorders in TCM.展开更多
Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly...Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.展开更多
The massive amount and multi-sourced,multi-structured data in the upstream petroleum industry impose great challenge on data integration and smart application.Knowledge graph,as an emerging technology,can potentially ...The massive amount and multi-sourced,multi-structured data in the upstream petroleum industry impose great challenge on data integration and smart application.Knowledge graph,as an emerging technology,can potentially provide a way to tackle the challenges associated with oil and gas big data.This paper proposes an engineering-based method that can improve upon traditional natural language processing to construct the domain knowledge graph based on a petroleum exploration and development ontology.The exploration and development knowledge graph is constructed by assembling Sinopec’s multi-sourced heterogeneous database,and millions of nodes.The two applications based on the constructed knowledge graph are developed and validated for effectiveness and advantages in providing better knowledge services for the oil and gas industry.展开更多
There are heterogeneous problems between the CAD model and the assembly process document.In the planning stage of assembly process,these heterogeneous problems can decrease the efficiency of information interaction.Ba...There are heterogeneous problems between the CAD model and the assembly process document.In the planning stage of assembly process,these heterogeneous problems can decrease the efficiency of information interaction.Based on knowledge graph,this paper proposes an assembly information model(KGAM)to integrate geometric information from CAD model,non-geometric information and semantic information from assembly process document.KGAM describes the integrated assembly process information as a knowledge graph in the form of“entity-relationship-entity”and“entity-attribute-value”,which can improve the efficiency of information interaction.Taking the trial assembly stage of a certain type of aeroengine compressor rotor component as an example,KGAM is used to get its assembly process knowledge graph.The trial data show the query and update rate of assembly attribute information is improved by more than once.And the query and update rate of assembly semantic information is improved by more than twice.In conclusion,KGAM can solve the heterogeneous problems between the CAD model and the assembly process document and improve the information interaction efficiency.展开更多
A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,...A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,as it can be applied to applications that help humans.However,existing researches are constructing knowledge graphs without the time information that knowledge implies.Knowledge stored without time information becomes outdated over time,and in the future,the possibility of knowledge being false or meaningful changes is excluded.As a result,they can’t reect information that changes dynamically,and they can’t accept information that has newly emerged.To solve this problem,this paper proposes Time-Aware PolarisX,an automatically extended knowledge graph including time information.TimeAware PolarisX constructed a BERT model with a relation extractor and an ensemble NER model including a time tag with an entity extractor to extract knowledge consisting of subject,relation,and object from unstructured text.Through two application experiments,it shows that the proposed system overcomes the limitations of existing systems that do not consider time information when applied to an application such as a chatbot.Also,we verify that the accuracy of the extraction model is improved through a comparative experiment with the existing model.展开更多
At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production ...At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.展开更多
In wastewater treatment systems,extracting meaningful features from process data is essential for effective monitoring and control.However,the multi-time scale data generated by different sampling frequencies pose a c...In wastewater treatment systems,extracting meaningful features from process data is essential for effective monitoring and control.However,the multi-time scale data generated by different sampling frequencies pose a challenge to accurately extract features.To solve this issue,a multi-timescale feature extraction method based on adaptive entropy is proposed.Firstly,the expert knowledge graph is constructed by analyzing the characteristics of wastewater components and water quality data,which can illustrate various water quality parameters and the network of relationships among them.Secondly,multiscale entropy analysis is used to investigate the inherent multi-timescale patterns of water quality data in depth,which enables us to minimize information loss while uniformly optimizing the timescale.Thirdly,we harness partial least squares for feature extraction,resulting in an enhanced representation of sample data and the iterative enhancement of our expert knowledge graph.The experimental results show that the multi-timescale feature extraction algorithm can enhance the representation of water quality data and improve monitoring capabilities.展开更多
钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井...钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。展开更多
中文命名实体识别(NER)任务旨在抽取非结构化文本中包含的实体并给它们分配预定义的实体类别。针对大多数中文NER方法在上下文信息缺乏时的语义学习不足问题,提出一种层次融合多元知识的NER框架——HTLR(Chinese NER method based on Hi...中文命名实体识别(NER)任务旨在抽取非结构化文本中包含的实体并给它们分配预定义的实体类别。针对大多数中文NER方法在上下文信息缺乏时的语义学习不足问题,提出一种层次融合多元知识的NER框架——HTLR(Chinese NER method based on Hierarchical Transformer fusing Lexicon and Radical),以通过分层次融合的多元知识来帮助模型学习更丰富、全面的上下文信息和语义信息。首先,通过发布的中文词汇表和词汇向量表识别语料中包含的潜在词汇并把它们向量化,同时通过优化后的位置编码建模词汇和相关字符的语义关系,以学习中文的词汇知识;其次,通过汉典网发布的基于汉字字形的编码将语料转换为相应的编码序列以代表字形信息,并提出RFECNN(Radical Feature Extraction-Convolutional Neural Network)模型来提取字形知识;最后,提出Hierarchical Transformer模型,其中由低层模块分别学习字符和词汇以及字符和字形的语义关系,并由高层模块进一步融合字符、词汇、字形等多元知识,从而帮助模型学习语义更丰富的字符表征。在Weibo、Resume、MSRA和OntoNotes4.0公开数据集进行了实验,与主流方法NFLAT(Non-Flat-LAttice Transformer for Chinese named entity recognition)的对比结果表明,所提方法的F1值在4个数据集上分别提升了9.43、0.75、1.76和6.45个百分点,达到最优水平。可见,多元语义知识、层次化融合、RFE-CNN结构和Hierarchical Transformer结构对学习丰富的语义知识及提高模型性能是有效的。展开更多
基金Supported by National Key Research and Development Program(Grant No.2024YFB3312700)National Natural Science Foundation of China(Grant No.52405541)the Changzhou Municipal Sci&Tech Program(Grant No.CJ20241131)。
文摘Under the paradigm of Industry 5.0,intelligent manufacturing transcends mere efficiency enhancement by emphasizing human-machine collaboration,where human expertise plays a central role in assembly processes.Despite advancements in intelligent and digital technologies,assembly process design still heavily relies on manual knowledge reuse,and inefficiencies and inconsistent quality in process documentation are caused.To address the aforementioned issues,this paper proposes a knowledge push method of complex product assembly process design based on distillation model-based dynamically enhanced graph and Bayesian network.First,an initial knowledge graph is constructed using a BERT-BiLSTM-CRF model trained with integrated human expertise and a fine-tuned large language model.Then,a confidence-based dynamic weighted fusion strategy is employed to achieve dynamic incremental construction of the knowledge graph with low resource consumption.Subsequently,a Bayesian network model is constructed based on the relationships between assembly components,assembly features,and operations.Bayesian network reasoning is used to push assembly process knowledge under different design requirements.Finally,the feasibility of the Bayesian network construction method and the effectiveness of Bayesian network reasoning are verified through a specific example,significantly improving the utilization of assembly process knowledge and the efficiency of assembly process design.
文摘Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.
基金supported by grants from the National Innovation Platform Development Program(No.2020021105012440)the National Natural Science Foundation of China(No.82172524 and No.81974355)the Hubei Provincial Key R&D Project of Artificial Intelligence(No.2021BEA161).
文摘Spleen-Stomach disorders are prevalent clinical conditions in Traditional Chinese Medicine(TCM).The complex diagnostic and treatment model used in TCM is based on a“symptom-pattern-disease-formula”framework that heavily relies on practitioners’experience.However,this model faces several challenges,including ambiguous knowledge representation,unstructured data,and difficulties with knowledge sharing.Recent advancements in artificial intelligence,natural language processing,and medical knowledge engineering have significantly improved research on knowledge graphs(KGs)and intelligent diagnosis and treatment systems for these disorders,making these technologies crucial for modernizing TCM.This article systematically reviews two core research pathways related to Spleen-Stomach disorders.The first pathway focuses on constructing knowledge graphs for“structured knowledge representation”.This includes ontology modeling,entity recognition,relation extraction,graph fusion,semantic reasoning,visualization services,and an ensemble model to predict treatment efficacy.The second pathway involves the development of intelligent diagnosis and treatment systems,with a focus on“clinical applications”.This pathway includes key technologies such as quantitative modeling of TCM,the four diagnostic methods(inspection,auscultation-olfaction,interrogation,and palpation),semantic analysis of classical texts,pattern differentiation algorithms,and multimodal consultation recommenders.Through the synthesis and analysis of current research,several ongoing challenges have been identified.These include inconsistent models and annotation of TCM clinical knowledge,limited semantic reasoning capabilities,insufficient integration between KGs and intelligent diagnostic models,and limited clinical adaptability of existing intelligent diagnostic systems.To address these challenges,this review suggests future research directions that include enhancing heterogeneous multisource knowledge integration techniques,deepening semantic reasoning through collaborative reasoning frameworks that incorporate large language models,and developing effective cross-disease transfer learning strategies.These directions aim to improve interpretability,reasoning accuracy,and clinical applicability of intelligent diagnosis and treatment systems for Spleen-Stomach disorders in TCM.
基金This work was co-funded by the European Research Council for the project ScienceGRAPH(Grant agreement ID:819536)by the TIB Leibniz Information Centre for Science and Technology.
文摘Purpose:This work aims to normalize the NLPCONTRIBUTIONS scheme(henceforward,NLPCONTRIBUTIONGRAPH)to structure,directly from article sentences,the contributions information in Natural Language Processing(NLP)scholarly articles via a two-stage annotation methodology:1)pilot stage-to define the scheme(described in prior work);and 2)adjudication stage-to normalize the graphing model(the focus of this paper).Design/methodology/approach:We re-annotate,a second time,the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising:contribution-centered sentences,phrases,and triple statements.To this end,specifically,care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme.Findings:The application of NLPCONTRIBUTIONGRAPH on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences,4,702 contribution-information-centered phrases,and 2,980 surface-structured triples.The intra-annotation agreement between the first and second stages,in terms of F1-score,was 67.92%for sentences,41.82%for phrases,and 22.31%for triple statements indicating that with increased granularity of the information,the annotation decision variance is greater.Research limitations:NLPCONTRIBUTIONGRAPH has limited scope for structuring scholarly contributions compared with STEM(Science,Technology,Engineering,and Medicine)scholarly knowledge at large.Further,the annotation scheme in this work is designed by only an intra-annotator consensus-a single annotator first annotated the data to propose the initial scheme,following which,the same annotator reannotated the data to normalize the annotations in an adjudication stage.However,the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles.This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a“single”set of structures and relationships as the final scheme.Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe,our intraannotation procedure is well-suited.Nevertheless,the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews.This is planned as future work to produce a robust model.Practical implications:We demonstrate NLPCONTRIBUTIONGRAPH data integrated into the Open Research Knowledge Graph(ORKG),a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge,as a viable aid to assist researchers in their day-to-day tasks.Originality/value:NLPCONTRIBUTIONGRAPH is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph,which to the best of our knowledge does not exist in the community.Furthermore,our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
基金support is gratefully acknowledged to the National Natural Science Foundation of China(Grant No.42050104)National Science and Technology Support Program(Grant No.2012BAH34F00)National Oil and Gas Major Special Project(Grant No.2016ZX05033005).
文摘The massive amount and multi-sourced,multi-structured data in the upstream petroleum industry impose great challenge on data integration and smart application.Knowledge graph,as an emerging technology,can potentially provide a way to tackle the challenges associated with oil and gas big data.This paper proposes an engineering-based method that can improve upon traditional natural language processing to construct the domain knowledge graph based on a petroleum exploration and development ontology.The exploration and development knowledge graph is constructed by assembling Sinopec’s multi-sourced heterogeneous database,and millions of nodes.The two applications based on the constructed knowledge graph are developed and validated for effectiveness and advantages in providing better knowledge services for the oil and gas industry.
基金the National Natural Science Foundation of China(No.51805079)。
文摘There are heterogeneous problems between the CAD model and the assembly process document.In the planning stage of assembly process,these heterogeneous problems can decrease the efficiency of information interaction.Based on knowledge graph,this paper proposes an assembly information model(KGAM)to integrate geometric information from CAD model,non-geometric information and semantic information from assembly process document.KGAM describes the integrated assembly process information as a knowledge graph in the form of“entity-relationship-entity”and“entity-attribute-value”,which can improve the efficiency of information interaction.Taking the trial assembly stage of a certain type of aeroengine compressor rotor component as an example,KGAM is used to get its assembly process knowledge graph.The trial data show the query and update rate of assembly attribute information is improved by more than once.And the query and update rate of assembly semantic information is improved by more than twice.In conclusion,KGAM can solve the heterogeneous problems between the CAD model and the assembly process document and improve the information interaction efficiency.
基金supported by Basic Science Research Program through the NRF(National Research Foundation of Korea)the MSIT(Ministry of Science and ICT),Korea,under the National Program for Excellence in SW supervised by the IITP(Institute for Information&communications Technology Promotion)the Gachon University research fund of 2019(Nos.NRF2019R1A2C1008412,2015-0-00932,GCU-2019-0773)。
文摘A knowledge graph is a structured graph in which data obtained from multiple sources are standardized to acquire and integrate human knowledge.Research is being actively conducted to cover a wide variety of knowledge,as it can be applied to applications that help humans.However,existing researches are constructing knowledge graphs without the time information that knowledge implies.Knowledge stored without time information becomes outdated over time,and in the future,the possibility of knowledge being false or meaningful changes is excluded.As a result,they can’t reect information that changes dynamically,and they can’t accept information that has newly emerged.To solve this problem,this paper proposes Time-Aware PolarisX,an automatically extended knowledge graph including time information.TimeAware PolarisX constructed a BERT model with a relation extractor and an ensemble NER model including a time tag with an entity extractor to extract knowledge consisting of subject,relation,and object from unstructured text.Through two application experiments,it shows that the proposed system overcomes the limitations of existing systems that do not consider time information when applied to an application such as a chatbot.Also,we verify that the accuracy of the extraction model is improved through a comparative experiment with the existing model.
基金supported by the Sichuan Science and Technology Program under Grants No.2022YFQ0052 and No.2021YFQ0009.
文摘At present,knowledge embedding methods are widely used in the field of knowledge graph(KG)reasoning,and have been successfully applied to those with large entities and relationships.However,in research and production environments,there are a large number of KGs with a small number of entities and relations,which are called sparse KGs.Limited by the performance of knowledge extraction methods or some other reasons(some common-sense information does not appear in the natural corpus),the relation between entities is often incomplete.To solve this problem,a method of the graph neural network and information enhancement is proposed.The improved method increases the mean reciprocal rank(MRR)and Hit@3 by 1.6%and 1.7%,respectively,when the sparsity of the FB15K-237 dataset is 10%.When the sparsity is 50%,the evaluation indexes MRR and Hit@10 are increased by 0.8%and 1.8%,respectively.
基金the National Key Research and Development Program of China(2022YFB3305800-5)the National Natural Science Foundation of China(62125301,62021003)+2 种基金the Beijing Outstanding Young Scientist Program(BJJWZYJH01201910005020)the Natural Science Foundation of Beijing Municipality(KZ202110005009)Youth Beijing Scholar(037).
文摘In wastewater treatment systems,extracting meaningful features from process data is essential for effective monitoring and control.However,the multi-time scale data generated by different sampling frequencies pose a challenge to accurately extract features.To solve this issue,a multi-timescale feature extraction method based on adaptive entropy is proposed.Firstly,the expert knowledge graph is constructed by analyzing the characteristics of wastewater components and water quality data,which can illustrate various water quality parameters and the network of relationships among them.Secondly,multiscale entropy analysis is used to investigate the inherent multi-timescale patterns of water quality data in depth,which enables us to minimize information loss while uniformly optimizing the timescale.Thirdly,we harness partial least squares for feature extraction,resulting in an enhanced representation of sample data and the iterative enhancement of our expert knowledge graph.The experimental results show that the multi-timescale feature extraction algorithm can enhance the representation of water quality data and improve monitoring capabilities.
文摘钻井顶部驱动装置结构复杂、故障类型多样,现有的故障树分析法和专家系统难以有效应对复杂多变的现场情况。为此,利用知识图谱在结构化与非结构化信息融合、故障模式关联分析以及先验知识传递方面的优势,提出了一种基于知识图谱的钻井顶部驱动装置故障诊断方法,利用以Transformer为基础的双向编码器模型(Bidirectional Encoder Representations from Transformers,BERT)构建了混合神经网络模型BERT-BiLSTM-CRF与BERT-BiLSTM-Attention,分别实现了顶驱故障文本数据的命名实体识别和关系抽取,并通过相似度计算,实现了故障知识的有效融合和智能问答,最终构建了顶部驱动装置故障诊断方法。研究结果表明:①在故障实体识别任务上,BERT-BiLSTM-CRF模型的精确度达到95.49%,能够有效识别故障文本中的信息实体;②在故障关系抽取上,BERT-BiLSTM-Attention模型的精确度达到93.61%,实现了知识图谱关系边的正确建立;③开发的问答系统实现了知识图谱的智能应用,其在多个不同类型问题上的回答准确率超过了90%,能够满足现场使用需求。结论认为,基于知识图谱的故障诊断方法能够有效利用顶部驱动装置的先验知识,实现故障的快速定位与智能诊断,具备良好的应用前景。
文摘中文命名实体识别(NER)任务旨在抽取非结构化文本中包含的实体并给它们分配预定义的实体类别。针对大多数中文NER方法在上下文信息缺乏时的语义学习不足问题,提出一种层次融合多元知识的NER框架——HTLR(Chinese NER method based on Hierarchical Transformer fusing Lexicon and Radical),以通过分层次融合的多元知识来帮助模型学习更丰富、全面的上下文信息和语义信息。首先,通过发布的中文词汇表和词汇向量表识别语料中包含的潜在词汇并把它们向量化,同时通过优化后的位置编码建模词汇和相关字符的语义关系,以学习中文的词汇知识;其次,通过汉典网发布的基于汉字字形的编码将语料转换为相应的编码序列以代表字形信息,并提出RFECNN(Radical Feature Extraction-Convolutional Neural Network)模型来提取字形知识;最后,提出Hierarchical Transformer模型,其中由低层模块分别学习字符和词汇以及字符和字形的语义关系,并由高层模块进一步融合字符、词汇、字形等多元知识,从而帮助模型学习语义更丰富的字符表征。在Weibo、Resume、MSRA和OntoNotes4.0公开数据集进行了实验,与主流方法NFLAT(Non-Flat-LAttice Transformer for Chinese named entity recognition)的对比结果表明,所提方法的F1值在4个数据集上分别提升了9.43、0.75、1.76和6.45个百分点,达到最优水平。可见,多元语义知识、层次化融合、RFE-CNN结构和Hierarchical Transformer结构对学习丰富的语义知识及提高模型性能是有效的。