针对风力发电机故障领域关系抽取任务存在大量的专业术语、单实体重叠和多实体重叠、单句子关系多以及句子结构复杂的问题,本文提出一种改进的CasRel二元级联标记框架CA-CasRel。由于ERNIE模型在预训练过程中引入了实体信息和知识图谱信...针对风力发电机故障领域关系抽取任务存在大量的专业术语、单实体重叠和多实体重叠、单句子关系多以及句子结构复杂的问题,本文提出一种改进的CasRel二元级联标记框架CA-CasRel。由于ERNIE模型在预训练过程中引入了实体信息和知识图谱信息,本研究首先用ERNIE模型代替BERT模型进行句子编码嵌入,深层次地捕捉文本序列的实体关系信息。另外,本研究在预测头实体之后,对头实体向量用余弦注意力机制进行编码,余弦注意力具有方向敏感性和语义纯度强化,可高效提取语义的本质特征和对稀疏特征的无偏捕捉。最终,所提架构和基线CasRel模型相比,其精确率、召回率和F1值分别提升了1.71%、6.09%和4.05%,达到了领域内的SOTA效果。In the task of relation extraction within the wind turbine fault domain, there are numerous challenges, such as the presence of domain-specific terminology, overlapping of single and multiple entities, multiple relations in a single sentence, and complex sentence structures. This paper proposes an improved CasRel-based binary tagging framework, CA-CasRel. Since the ERNIE model incorporates entity information and knowledge graph data during pre-training, this study replaces the BERT model with ERNIE for sentence encoding and embedding, enabling a deeper capture of entity-relation information in text sequences. Furthermore, after predicting the head entity, the head entity vector is encoded using a cosine attention mechanism. Cosine attention, with its direction sensitivity and enhanced semantic purity, efficiently captures essential semantic features and unbiasedly detects sparse features. Compared to the original CasRel model, the proposed architecture improves precision, recall, and F1 score by 1.71%, 6.09%, and 4.05%, respectively, achieving state-of-the-art performance in the domain.展开更多
随着中国医学事业的快速发展,中文医学文本的数量不断增加。为了从这些中文医学文本中提取有价值的信息,并解决中文医学领域的实体关系抽取问题,研究人员已经提出一系列基于双向LSTM的模型。然而,由于双向LSTM的训练速度等问题,文中引...随着中国医学事业的快速发展,中文医学文本的数量不断增加。为了从这些中文医学文本中提取有价值的信息,并解决中文医学领域的实体关系抽取问题,研究人员已经提出一系列基于双向LSTM的模型。然而,由于双向LSTM的训练速度等问题,文中引入了层叠指针网络框架来处理中文医学文本的实体关系抽取任务。为了弥补层叠指针网络框架中主实体识别能力不足以及解决复用编码层时的梯度问题,文中提出了主实体增强模块,并引入了条件层归一化方法,从而提出了面向中文医学文本的主语增强型层叠指针网络框架(Subject Enhanced Cascade Binary Pointer Tagging Framework for Chinese Medical Text,SE-CAS)。通过引入主实体增强模块,能够精确识别有效的主实体,并排除错误实体。此外,还使用条件层归一化方法来替代原模型中的简单相加方法,并将其应用于编码层和主实体编码层。实验结果证明,所提模型在CMeIE数据集上取得了5.73%的F1值提升。通过消融实验证实,各个模块均能带来性能提升,并且这些提升具有叠加效应。展开更多
As the global demand for renewable energy increases,the photovoltaic(PV)industry,which is a vital component of clean energy,plays a crucial role in achieving energy transition and sustainable development goals.Consequ...As the global demand for renewable energy increases,the photovoltaic(PV)industry,which is a vital component of clean energy,plays a crucial role in achieving energy transition and sustainable development goals.Consequently,the PV industry has grown rapidly in recent years,leading to an expanded and increasingly sophisticated industrial chain.However,the effective assessment of the development of the PV industry is complicated by the extensiveness of the PV industrial chain.This paper presents a method for constructing a knowledge graph of the PV industry chain using enterprise bidding data,effectively coupling the product and supply networks.First,by leveraging relevant knowledge in the PV field,we employ two deep learning models,the BERT-BiLSTM-CRF model and an improved CasRel model,for entity and relationship extraction,respectively.Subsequently,entity-linking technology is applied to facilitate knowledge fusion.Finally,the Neo4j graph database is utilized for knowledge storage and graphical representation,comprehensively illustrating the technical process of constructing the PV industry chain knowledge graph.The knowledge graph of the PV industry chain facilitates a timely understanding of the industry’s overall status and development trends while identifying bottlenecks and risks within the chain.Furthermore,it can aid enterprises in devising more effective risk management strategies and countermeasures,continuously optimizing the industry chain structure,and promoting the sustainable development of the industry.展开更多
羊疾病领域知识图谱是实现羊疾病防控与智能诊疗的前提。针对羊疾病文本语义边界模糊、实体角色重叠及关系语义复杂等问题,该研究提出了一种基于CaRoMHPE(CasRel-based model combined with RoBERTa,multi-scale crossattention mechani...羊疾病领域知识图谱是实现羊疾病防控与智能诊疗的前提。针对羊疾病文本语义边界模糊、实体角色重叠及关系语义复杂等问题,该研究提出了一种基于CaRoMHPE(CasRel-based model combined with RoBERTa,multi-scale crossattention mechanism,and hybrid position encoding in multi-head attention)模型的知识图谱构建方法。首先根据羊疾病语料特点,构建了一个包含9类实体和8种关系的羊疾病数据集,涵盖了羊疾病诊疗全过程中的关键实体及关系,为实体关系抽取任务提供数据支持。随后,以CasRel(cascade relational triple extraction)为基础模型,使用RoBERTa-wwmext(robustly optimized BERT approach)替换BERT(bidirectional encoder representations from transformers)作为预训练编码模型,以增强模型对上下文的理解和对复杂语言结构的处理能力;在主体标注模块后添加多尺度跨注意力机制,更好地细化实体之间的语义关系,同时融入混合位置编码(hybrid position encoding,HPE)对多头注意力机制进行改进,增强关系抽取任务中的实体边界划分和角色区分能力。结果表明,该模型知识抽取的准确率、召回率和F1值分别达到了94.70%、94.04%、94.37%,相较于CasRel模型分别提升了9.14、9.21和9.18个百分点,增强了羊疾病信息实体关系抽取效果。最后,在抽取得到的三元组基础上,结合语义嵌入技术和余弦相似度算法,通过消除同义词重复和处理潜在歧义,构建了规范化的知识图谱,为智能化羊疾病诊疗提供有力的支持。展开更多
文摘针对风力发电机故障领域关系抽取任务存在大量的专业术语、单实体重叠和多实体重叠、单句子关系多以及句子结构复杂的问题,本文提出一种改进的CasRel二元级联标记框架CA-CasRel。由于ERNIE模型在预训练过程中引入了实体信息和知识图谱信息,本研究首先用ERNIE模型代替BERT模型进行句子编码嵌入,深层次地捕捉文本序列的实体关系信息。另外,本研究在预测头实体之后,对头实体向量用余弦注意力机制进行编码,余弦注意力具有方向敏感性和语义纯度强化,可高效提取语义的本质特征和对稀疏特征的无偏捕捉。最终,所提架构和基线CasRel模型相比,其精确率、召回率和F1值分别提升了1.71%、6.09%和4.05%,达到了领域内的SOTA效果。In the task of relation extraction within the wind turbine fault domain, there are numerous challenges, such as the presence of domain-specific terminology, overlapping of single and multiple entities, multiple relations in a single sentence, and complex sentence structures. This paper proposes an improved CasRel-based binary tagging framework, CA-CasRel. Since the ERNIE model incorporates entity information and knowledge graph data during pre-training, this study replaces the BERT model with ERNIE for sentence encoding and embedding, enabling a deeper capture of entity-relation information in text sequences. Furthermore, after predicting the head entity, the head entity vector is encoded using a cosine attention mechanism. Cosine attention, with its direction sensitivity and enhanced semantic purity, efficiently captures essential semantic features and unbiasedly detects sparse features. Compared to the original CasRel model, the proposed architecture improves precision, recall, and F1 score by 1.71%, 6.09%, and 4.05%, respectively, achieving state-of-the-art performance in the domain.
文摘随着中国医学事业的快速发展,中文医学文本的数量不断增加。为了从这些中文医学文本中提取有价值的信息,并解决中文医学领域的实体关系抽取问题,研究人员已经提出一系列基于双向LSTM的模型。然而,由于双向LSTM的训练速度等问题,文中引入了层叠指针网络框架来处理中文医学文本的实体关系抽取任务。为了弥补层叠指针网络框架中主实体识别能力不足以及解决复用编码层时的梯度问题,文中提出了主实体增强模块,并引入了条件层归一化方法,从而提出了面向中文医学文本的主语增强型层叠指针网络框架(Subject Enhanced Cascade Binary Pointer Tagging Framework for Chinese Medical Text,SE-CAS)。通过引入主实体增强模块,能够精确识别有效的主实体,并排除错误实体。此外,还使用条件层归一化方法来替代原模型中的简单相加方法,并将其应用于编码层和主实体编码层。实验结果证明,所提模型在CMeIE数据集上取得了5.73%的F1值提升。通过消融实验证实,各个模块均能带来性能提升,并且这些提升具有叠加效应。
基金supported by the National Natural Science Foundation of China under Grant Nos.72293563,72293565,and 72101051.
文摘As the global demand for renewable energy increases,the photovoltaic(PV)industry,which is a vital component of clean energy,plays a crucial role in achieving energy transition and sustainable development goals.Consequently,the PV industry has grown rapidly in recent years,leading to an expanded and increasingly sophisticated industrial chain.However,the effective assessment of the development of the PV industry is complicated by the extensiveness of the PV industrial chain.This paper presents a method for constructing a knowledge graph of the PV industry chain using enterprise bidding data,effectively coupling the product and supply networks.First,by leveraging relevant knowledge in the PV field,we employ two deep learning models,the BERT-BiLSTM-CRF model and an improved CasRel model,for entity and relationship extraction,respectively.Subsequently,entity-linking technology is applied to facilitate knowledge fusion.Finally,the Neo4j graph database is utilized for knowledge storage and graphical representation,comprehensively illustrating the technical process of constructing the PV industry chain knowledge graph.The knowledge graph of the PV industry chain facilitates a timely understanding of the industry’s overall status and development trends while identifying bottlenecks and risks within the chain.Furthermore,it can aid enterprises in devising more effective risk management strategies and countermeasures,continuously optimizing the industry chain structure,and promoting the sustainable development of the industry.
文摘羊疾病领域知识图谱是实现羊疾病防控与智能诊疗的前提。针对羊疾病文本语义边界模糊、实体角色重叠及关系语义复杂等问题,该研究提出了一种基于CaRoMHPE(CasRel-based model combined with RoBERTa,multi-scale crossattention mechanism,and hybrid position encoding in multi-head attention)模型的知识图谱构建方法。首先根据羊疾病语料特点,构建了一个包含9类实体和8种关系的羊疾病数据集,涵盖了羊疾病诊疗全过程中的关键实体及关系,为实体关系抽取任务提供数据支持。随后,以CasRel(cascade relational triple extraction)为基础模型,使用RoBERTa-wwmext(robustly optimized BERT approach)替换BERT(bidirectional encoder representations from transformers)作为预训练编码模型,以增强模型对上下文的理解和对复杂语言结构的处理能力;在主体标注模块后添加多尺度跨注意力机制,更好地细化实体之间的语义关系,同时融入混合位置编码(hybrid position encoding,HPE)对多头注意力机制进行改进,增强关系抽取任务中的实体边界划分和角色区分能力。结果表明,该模型知识抽取的准确率、召回率和F1值分别达到了94.70%、94.04%、94.37%,相较于CasRel模型分别提升了9.14、9.21和9.18个百分点,增强了羊疾病信息实体关系抽取效果。最后,在抽取得到的三元组基础上,结合语义嵌入技术和余弦相似度算法,通过消除同义词重复和处理潜在歧义,构建了规范化的知识图谱,为智能化羊疾病诊疗提供有力的支持。