In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shippi...In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.展开更多
This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prom...This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.展开更多
In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Co...In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.展开更多
There is a growing amount of data uploaded to the internet every day and it is important to understand the volume of those data to find a better scheme to process them.However,the volume of internet data is beyond the...There is a growing amount of data uploaded to the internet every day and it is important to understand the volume of those data to find a better scheme to process them.However,the volume of internet data is beyond the processing capabilities of the current internet infrastructure.Therefore,engineering works using technology to organize and analyze information and extract useful information are interesting in both industry and academia.The goal of this paper is to explore the entity relationship based on deep learning,introduce semantic knowledge by using the prepared language model,develop an advanced entity relationship information extraction method by combining Robustly Optimized BERT Approach(RoBERTa)and multi-task learning,and combine the intelligent characters in the field of linguistic,called Robustly Optimized BERT Approach+Multi-Task Learning(RoBERTa+MTL).To improve the effectiveness of model interaction,multi-task teaching is used to implement the observation information of auxiliary tasks.Experimental results show that our method has achieved an accuracy of 88.95 entity relationship extraction,and a further it has achieved 86.35%of accuracy after being combined with multi-task learning.展开更多
The correlation relationships of apparent extraction equilibrium constant (1gK(ex)) with the electronic effect parameter( Sigma sigma(Phi)) and the steric effect parameter ( Sigma upsilon ) of the substituents in extr...The correlation relationships of apparent extraction equilibrium constant (1gK(ex)) with the electronic effect parameter( Sigma sigma(Phi)) and the steric effect parameter ( Sigma upsilon ) of the substituents in extractant molecules are investigated by linear regression analysis in the extraction of rare earths by various classes and structures of monoacidic organophosphorus extractants. The results indicate that in Linear free energy relationship formula 1gK(ex) = rho Sigma sigma(Phi) + psi Sigma upsilon + h generally follows for this kind of extraction systems. Accordingly, the quantitative structure-behaviour relationships of extractants are discussed. These relationships can be preliminarily applied to predict the 1gK(ex) values of rare earth extraction with definite structures of this class of extractants, and thus can provide some directions for the design of new RE extractants.展开更多
Introduction The metal extraction is a process of complex formation by organic ligands with metal ions or corresponding ionic groups, that is to say, it is a multicomponent coordination process in the heterogeneous ph...Introduction The metal extraction is a process of complex formation by organic ligands with metal ions or corresponding ionic groups, that is to say, it is a multicomponent coordination process in the heterogeneous phase system. The property of extraction is chiefly dependent on the stability of metal complexes, which is closely related to both the structure of ligands and the nature of metal ions. Therefore, the physicochemical behaviour of the substituent in the ligand will influence the extractive ability of the extractant for a certain展开更多
煤炭是能源消费降碳的主力军,煤炭开发利用过程中产生的碳排放占全国碳排放总量的60%~70%,是我国完成碳减排任务的关键所在。煤炭开采利用碳排放治理技术知识图谱构建与应用聚焦煤炭开采利用碳排放治理技术,系统梳理出相关治理技术知识...煤炭是能源消费降碳的主力军,煤炭开发利用过程中产生的碳排放占全国碳排放总量的60%~70%,是我国完成碳减排任务的关键所在。煤炭开采利用碳排放治理技术知识图谱构建与应用聚焦煤炭开采利用碳排放治理技术,系统梳理出相关治理技术知识,在此基础上构建知识图谱,挖掘出不同技术间的内在联系、适用条件、实施效果及减排路径,为相关人员获取碳排放治理技术领域前沿知识提供支撑,推动煤炭行业向绿色低碳方向转型。一是广泛收集煤炭减排技术相关的专业书籍、术语字典、权威研究报告、中国知网核心期刊文献以及各类标准规范等,采用自底向上和自顶向下的混合构建法构建煤炭开采利用碳排放治理技术领域概念知识模型;二是运用BIO标注策略,并应用BERT+CRF(Bidirectional Encoder Representations from Transformers&Conditional Random Fields)模型,识别该领域实体;三是在实体识别基础上,应用BiLSTM-Attention模型进一步挖掘实体间关系,实现关系抽取;四是采用实体消歧和共指消解技术进行知识融合,消除数据中的矛盾与冗余信息;五是通过Neo4j图数据库存储实体与关系,基于上述结构化的方法与模型,由此完成煤炭开采利用碳排放治理技术领域知识图谱的构建。构建了涵盖排放特征、开采方式、利用方式和减碳技术四大类的煤炭开采利用碳排放治理技术领域知识概念模型,又将这四大类知识概念细分为12个子类,30个细类,形成了完整的概念分类体系。定义了10类命名实体及6种关系,基于提出的知识图谱构建组合方法与创新模型,抽取出12631个节点与32209个实体间关系,揭示了碳排放技术与排放特征、开采方式、利用方式之间的复杂关联,并根据已构建的煤炭开采利用碳排放治理技术领域的知识图谱,支持矿山企业选取相适配的减碳技术路径。随着煤炭行业低碳发展的场景拓展、数据的积累以及人工智能和大模型的发展,本研究将在多模态数据融合的基础上,优化图谱的构建方法,拓展图谱的应用范围,提高技术路径推荐的精准度。展开更多
中文司法领域的实体和关系抽取技术在提高办案效率方面具有重要作用,但现有的关系抽取模型缺乏领域知识且难以处理重叠实体,造成难以准确区分和提取实体与关系等问题.通过引入领域知识,提出一种法律信息增强模块,增强了用所提法律潜在...中文司法领域的实体和关系抽取技术在提高办案效率方面具有重要作用,但现有的关系抽取模型缺乏领域知识且难以处理重叠实体,造成难以准确区分和提取实体与关系等问题.通过引入领域知识,提出一种法律信息增强模块,增强了用所提法律潜在关系与全局对应(legal potential relationship and global correspondence,LPRGC)模型理解法律文本中术语、规则和上下文信息的能力,从而提高了实体和关系的识别准确性,进而提升了实体和关系抽取算法的性能.为解决重叠实体问题,设计了一种基于潜在关系和实体对齐的关系抽取方法.通过精确标注实体位置,筛选潜在关系,并利用全局矩阵对齐实体,解决重叠实体的关系抽取问题,能够更准确地捕捉到重叠实体之间的关系,并有效地将其映射到正确的实体对上,从而提高抽取结果的准确性.在中国法律智能技术评测数据集上进行实体和关系抽取实验,结果表明,LPRGC模型的准确率、召回率和F_(1)值分别为85.21%、81.19%和83.15%,均优于对比模型,特别是在处理实体重叠问题时,LPRGC模型在单实体重叠类型的抽取中,F_(1)值达到了81.45%;在多实体重叠类型的抽取中,F_(1)值达80.67%.LPRGC模型在实体和关系抽取的准确性上较现有方法有明显改进,在处理复杂法律文本中的实体重叠问题上取得了显著效果.展开更多
针对目前方法大多未能充分利用跨度语义信息和局部上下文隐含信息等问题,提出基于跨度和多层次特征融合的实体关系联合抽取模型。该模型首先将文本输入到预训练语言模型(Bidirectional Encoder Representations from Transformer,BERT)...针对目前方法大多未能充分利用跨度语义信息和局部上下文隐含信息等问题,提出基于跨度和多层次特征融合的实体关系联合抽取模型。该模型首先将文本输入到预训练语言模型(Bidirectional Encoder Representations from Transformer,BERT)转换为词向量后,将其与通过图卷积获得的句法依赖信息进行融合,形成更丰富的文本特征;然后通过多头注意力层对文本特征进行加权处理,以此抑制噪声特征的干扰,并促进特征之间的交互,随后根据跨度将文本信息分割成跨度序列进行实体识别;最后使用双向门控循环单元提取局部上下文隐含信息,将与实体类型信息融合到候选实体跨度对并使用sigmoid函数进行关系分类。实验表明,该模型在SciERC数据集和CoNLL04数据集上取得良好的提升效果。展开更多
文摘In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making.
文摘This article proposes a document-level prompt learning approach using LLMs to extract the timeline-based storyline. Through verification tests on datasets such as ESCv1.2 and Timeline17, the results show that the prompt + one-shot learning proposed in this article works well. Meanwhile, our research findings indicate that although timeline-based storyline extraction has shown promising prospects in the practical applications of LLMs, it is still a complex natural language processing task that requires further research.
基金supported by the National Natural Science Foundation of China (41431177 and 41601413)the National Basic Research Program of China (2015CB954102)+1 种基金the Natural Science Research Program of Jiangsu Province, China (BK20150975 and 14KJA170001)the Outstanding Innovation Team in Colleges and Universities in Jiangsu Province, China
文摘In addition to soil samples, conventional soil maps, and experienced soil surveyors, text about soils(e.g., soil survey reports) is an important potential data source for extracting soil–environment relationships. Considering that the words describing soil–environment relationships are often mixed with unrelated words, the first step is to extract the needed words and organize them in a structured way. This paper applies natural language processing(NLP) techniques to automatically extract and structure information from soil survey reports regarding soil–environment relationships. The method includes two steps:(1) construction of a knowledge frame and(2) information extraction using either a rule-based method or a statistic-based method for different types of information. For uniformly written text information, the rule-based approach was used to extract information. These types of variables include slope, elevation, accumulated temperature, annual mean temperature, annual precipitation, and frost-free period. For information contained in text written in diverse styles, the statistic-based method was adopted. These types of variables include landform and parent material. The soil species of China soil survey reports were selected as the experimental dataset. Precision(P), recall(R), and F1-measure(F1) were used to evaluate the performances of the method. For the rule-based method, the P values were 1, the R values were above 92%, and the F1 values were above 96% for all the involved variables. For the method based on the conditional random fields(CRFs), the P, R and F1 values for the parent material were, respectively, 84.15, 83.13, and 83.64%; the values for landform were 88.33, 76.81, and 82.17%, respectively. To explore the impact of text types on the performance of the CRFs-based method, CRFs models were trained and validated separately by the descriptive texts of soil types and typical profiles. For parent material, the maximum F1 value for the descriptive text of soil types was 90.7%, while the maximum F1 value for the descriptive text of soil profiles was only 75%. For landform, the maximum F1 value for the descriptive text of soil types was 85.33%, which was similar to that of the descriptive text of soil profiles(i.e., 85.71%). These results suggest that NLP techniques are effective for the extraction and structuration of soil–environment relationship information from a text data source.
文摘There is a growing amount of data uploaded to the internet every day and it is important to understand the volume of those data to find a better scheme to process them.However,the volume of internet data is beyond the processing capabilities of the current internet infrastructure.Therefore,engineering works using technology to organize and analyze information and extract useful information are interesting in both industry and academia.The goal of this paper is to explore the entity relationship based on deep learning,introduce semantic knowledge by using the prepared language model,develop an advanced entity relationship information extraction method by combining Robustly Optimized BERT Approach(RoBERTa)and multi-task learning,and combine the intelligent characters in the field of linguistic,called Robustly Optimized BERT Approach+Multi-Task Learning(RoBERTa+MTL).To improve the effectiveness of model interaction,multi-task teaching is used to implement the observation information of auxiliary tasks.Experimental results show that our method has achieved an accuracy of 88.95 entity relationship extraction,and a further it has achieved 86.35%of accuracy after being combined with multi-task learning.
文摘The correlation relationships of apparent extraction equilibrium constant (1gK(ex)) with the electronic effect parameter( Sigma sigma(Phi)) and the steric effect parameter ( Sigma upsilon ) of the substituents in extractant molecules are investigated by linear regression analysis in the extraction of rare earths by various classes and structures of monoacidic organophosphorus extractants. The results indicate that in Linear free energy relationship formula 1gK(ex) = rho Sigma sigma(Phi) + psi Sigma upsilon + h generally follows for this kind of extraction systems. Accordingly, the quantitative structure-behaviour relationships of extractants are discussed. These relationships can be preliminarily applied to predict the 1gK(ex) values of rare earth extraction with definite structures of this class of extractants, and thus can provide some directions for the design of new RE extractants.
基金Supported by the National Natural Science Foundation of China
文摘Introduction The metal extraction is a process of complex formation by organic ligands with metal ions or corresponding ionic groups, that is to say, it is a multicomponent coordination process in the heterogeneous phase system. The property of extraction is chiefly dependent on the stability of metal complexes, which is closely related to both the structure of ligands and the nature of metal ions. Therefore, the physicochemical behaviour of the substituent in the ligand will influence the extractive ability of the extractant for a certain
文摘煤炭是能源消费降碳的主力军,煤炭开发利用过程中产生的碳排放占全国碳排放总量的60%~70%,是我国完成碳减排任务的关键所在。煤炭开采利用碳排放治理技术知识图谱构建与应用聚焦煤炭开采利用碳排放治理技术,系统梳理出相关治理技术知识,在此基础上构建知识图谱,挖掘出不同技术间的内在联系、适用条件、实施效果及减排路径,为相关人员获取碳排放治理技术领域前沿知识提供支撑,推动煤炭行业向绿色低碳方向转型。一是广泛收集煤炭减排技术相关的专业书籍、术语字典、权威研究报告、中国知网核心期刊文献以及各类标准规范等,采用自底向上和自顶向下的混合构建法构建煤炭开采利用碳排放治理技术领域概念知识模型;二是运用BIO标注策略,并应用BERT+CRF(Bidirectional Encoder Representations from Transformers&Conditional Random Fields)模型,识别该领域实体;三是在实体识别基础上,应用BiLSTM-Attention模型进一步挖掘实体间关系,实现关系抽取;四是采用实体消歧和共指消解技术进行知识融合,消除数据中的矛盾与冗余信息;五是通过Neo4j图数据库存储实体与关系,基于上述结构化的方法与模型,由此完成煤炭开采利用碳排放治理技术领域知识图谱的构建。构建了涵盖排放特征、开采方式、利用方式和减碳技术四大类的煤炭开采利用碳排放治理技术领域知识概念模型,又将这四大类知识概念细分为12个子类,30个细类,形成了完整的概念分类体系。定义了10类命名实体及6种关系,基于提出的知识图谱构建组合方法与创新模型,抽取出12631个节点与32209个实体间关系,揭示了碳排放技术与排放特征、开采方式、利用方式之间的复杂关联,并根据已构建的煤炭开采利用碳排放治理技术领域的知识图谱,支持矿山企业选取相适配的减碳技术路径。随着煤炭行业低碳发展的场景拓展、数据的积累以及人工智能和大模型的发展,本研究将在多模态数据融合的基础上,优化图谱的构建方法,拓展图谱的应用范围,提高技术路径推荐的精准度。
文摘针对农药登记文本中信息密集、逻辑结构复杂、实体间跨度大以及实体长度异质性等特点,同时为克服传统联合抽取方法中面临的三元组重叠、曝光偏差和冗余计算问题,本研究提出一种多特征融合的单阶段实体关系联合抽取模型(Multi-feature fusion single-stage entity and relation joint extraction model,MF-SERel)。首先,在编码层,通过融合语义与句法特征,丰富字符向量表示,提升模型对复杂语料的表征能力;其次,在多维标注框架层,提出HT-BES多维标注策略,以解决重叠三元组问题。通过并行评分函数与细粒度分类组件,将实体关系联合抽取转化为了基于关系维度的多标签标注任务,该过程不包含相互依赖步骤,从而实现单阶段并行标注,避免了曝光偏差并降低了计算冗余;最后,在解码层依据细粒度分类预测标签,解码出实体关系三元组。将本研究提出的模型与GraphRel、CasRel和TPLinker等基线模型进行对比,在农药数据集(Pesticide registration dataset,PRD)和公开数据集(Dataset of unstructured information extraction,DuIE)上进行检测。结果表明MF-SERel模型在农药数据集PRD和公开数据集DuIE上具有良好的表现。在农药数据集PRD上,本研究提出的模型MF-SERel在推理速度上提升了20%,F1值提升了2.3%,说明MF-SERel模型在农药登记文本中具有良好的知识挖掘能力;在公开数据集DuIE上,MF-SERel模型在推理速度上提升了54%,F1值提升了1.7%,同样取得了较好结果,证明MF-SERel模型具有较好的泛化能力。综上,本研究提出的MF-SERel模型可为农药领域知识的结构化抽取提供新方法。
文摘中文司法领域的实体和关系抽取技术在提高办案效率方面具有重要作用,但现有的关系抽取模型缺乏领域知识且难以处理重叠实体,造成难以准确区分和提取实体与关系等问题.通过引入领域知识,提出一种法律信息增强模块,增强了用所提法律潜在关系与全局对应(legal potential relationship and global correspondence,LPRGC)模型理解法律文本中术语、规则和上下文信息的能力,从而提高了实体和关系的识别准确性,进而提升了实体和关系抽取算法的性能.为解决重叠实体问题,设计了一种基于潜在关系和实体对齐的关系抽取方法.通过精确标注实体位置,筛选潜在关系,并利用全局矩阵对齐实体,解决重叠实体的关系抽取问题,能够更准确地捕捉到重叠实体之间的关系,并有效地将其映射到正确的实体对上,从而提高抽取结果的准确性.在中国法律智能技术评测数据集上进行实体和关系抽取实验,结果表明,LPRGC模型的准确率、召回率和F_(1)值分别为85.21%、81.19%和83.15%,均优于对比模型,特别是在处理实体重叠问题时,LPRGC模型在单实体重叠类型的抽取中,F_(1)值达到了81.45%;在多实体重叠类型的抽取中,F_(1)值达80.67%.LPRGC模型在实体和关系抽取的准确性上较现有方法有明显改进,在处理复杂法律文本中的实体重叠问题上取得了显著效果.
文摘针对目前方法大多未能充分利用跨度语义信息和局部上下文隐含信息等问题,提出基于跨度和多层次特征融合的实体关系联合抽取模型。该模型首先将文本输入到预训练语言模型(Bidirectional Encoder Representations from Transformer,BERT)转换为词向量后,将其与通过图卷积获得的句法依赖信息进行融合,形成更丰富的文本特征;然后通过多头注意力层对文本特征进行加权处理,以此抑制噪声特征的干扰,并促进特征之间的交互,随后根据跨度将文本信息分割成跨度序列进行实体识别;最后使用双向门控循环单元提取局部上下文隐含信息,将与实体类型信息融合到候选实体跨度对并使用sigmoid函数进行关系分类。实验表明,该模型在SciERC数据集和CoNLL04数据集上取得良好的提升效果。