期刊文献+
共找到381篇文章
< 1 2 20 >
每页显示 20 50 100
Construction of a Maritime Knowledge Graph Using GraphRAG for Entity and Relationship Extraction from Maritime Documents 被引量:1
1
作者 Yi Han Tao Yang +2 位作者 Meng Yuan Pinghua Hu Chen Li 《Journal of Computer and Communications》 2025年第2期68-93,共26页
In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shippi... In the international shipping industry, digital intelligence transformation has become essential, with both governments and enterprises actively working to integrate diverse datasets. The domain of maritime and shipping is characterized by a vast array of document types, filled with complex, large-scale, and often chaotic knowledge and relationships. Effectively managing these documents is crucial for developing a Large Language Model (LLM) in the maritime domain, enabling practitioners to access and leverage valuable information. A Knowledge Graph (KG) offers a state-of-the-art solution for enhancing knowledge retrieval, providing more accurate responses and enabling context-aware reasoning. This paper presents a framework for utilizing maritime and shipping documents to construct a knowledge graph using GraphRAG, a hybrid tool combining graph-based retrieval and generation capabilities. The extraction of entities and relationships from these documents and the KG construction process are detailed. Furthermore, the KG is integrated with an LLM to develop a Q&A system, demonstrating that the system significantly improves answer accuracy compared to traditional LLMs. Additionally, the KG construction process is up to 50% faster than conventional LLM-based approaches, underscoring the efficiency of our method. This study provides a promising approach to digital intelligence in shipping, advancing knowledge accessibility and decision-making. 展开更多
关键词 Maritime Knowledge Graph GraphRAG entity and Relationship Extraction Document Management
在线阅读 下载PDF
A Chinese Named Entity Recognition Method for News Domain Based on Transfer Learning and Word Embeddings
2
作者 Rui Fang Liangzhong Cui 《Computers, Materials & Continua》 2025年第5期3247-3275,共29页
Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications li... Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications like news summarization and event tracking.However,NER in the news domain faces challenges due to insufficient annotated data,complex entity structures,and strong context dependencies.To address these issues,we propose a new Chinesenamed entity recognition method that integrates transfer learning with word embeddings.Our approach leverages the ERNIE pre-trained model for transfer learning and obtaining general language representations and incorporates the Soft-lexicon word embedding technique to handle varied entity structures.This dual-strategy enhances the model’s understanding of context and boosts its ability to process complex texts.Experimental results show that our method achieves an F1 score of 94.72% on a news dataset,surpassing baseline methods by 3%–4%,thereby confirming its effectiveness for Chinese-named entity recognition in the news domain. 展开更多
关键词 News domain named entity recognition(NER) transfer learning word embeddings ERNIE soft-lexicon
在线阅读 下载PDF
Multi-Modal Pre-Synergistic Fusion Entity Alignment Based on Mutual Information Strategy Optimization
3
作者 Huayu Li Xinxin Chen +3 位作者 Lizhuang Tan Konstantin I.Kostromitin Athanasios V.Vasilakos Peiying Zhang 《Computers, Materials & Continua》 2025年第11期4133-4153,共21页
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities... To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model. 展开更多
关键词 Knowledge graph MULTI-MODAL entity alignment feature fusion pre-synergistic fusion
在线阅读 下载PDF
Syntax-Enhanced Entity Relation Extraction with Complex Knowledge
4
作者 Mingwen Bi Hefei Chen Zhenghong Yang 《Computers, Materials & Continua》 2025年第4期861-876,共16页
Entity relation extraction,a fundamental and essential task in natural language processing(NLP),has garnered significant attention over an extended period.,aiming to extract the core of semantic knowledge from unstruc... Entity relation extraction,a fundamental and essential task in natural language processing(NLP),has garnered significant attention over an extended period.,aiming to extract the core of semantic knowledge from unstructured text,i.e.,entities and the relations between them.At present,the main dilemma of Chinese entity relation extraction research lies in nested entities,relation overlap,and lack of entity relation interaction.This dilemma is particularly prominent in complex knowledge extraction tasks with high-density knowledge,imprecise syntactic structure,and lack of semantic roles.To address these challenges,this paper presents an innovative“character-level”Chinese part-of-speech(CN-POS)tagging approach and incorporates part-of-speech(POS)information into the pre-trained model,aiming to improve its semantic understanding and syntactic information processing capabilities.Additionally,A relation reference filling mechanism(RF)is proposed to enhance the semantic interaction between relations and entities,utilize relations to guide entity modeling,improve the boundary prediction ability of entity models for nested entity phenomena,and increase the cascading accuracy of entity-relation triples.Meanwhile,the“Queue”sub-task connection strategy is adopted to alleviate triplet cascading errors caused by overlapping relations,and a Syntax-enhanced entity relation extraction model(SE-RE)is constructed.The model showed excellent performance on the self-constructed E-commerce Product Information dataset(EPI)in this article.The results demonstrate that integrating POS enhancement into the pre-trained encoding model significantly boosts the performance of entity relation extraction models compared to baseline methods.Specifically,the F1-score fluctuation in subtasks caused by error accumulation was reduced by 3.21%,while the F1-score for entity-relation triplet extraction improved by 1.91%. 展开更多
关键词 entity relation extraction complex knowledge syntax-enhanced semantic interaction pre-trained BERT
在线阅读 下载PDF
Railway accident entity extraction method based on accident phase classification and mutual learning
5
作者 Zhibo Cheng Yanhua Wu +2 位作者 Zheqian Liu Yong Shi Ze Li 《Railway Sciences》 2025年第6期815-832,共18页
Purpose–This study aims to enhance the accuracy of key entity extraction from railway accident report texts and address challenges such as complex domain-specific semantics,data sparsity and strong inter-sentence sem... Purpose–This study aims to enhance the accuracy of key entity extraction from railway accident report texts and address challenges such as complex domain-specific semantics,data sparsity and strong inter-sentence semantic dependencies.A robust entity extraction method tailored for accident texts is proposed.Design/methodology/approach–This method is implemented through a dual-branch multi-task mutual learning model named R-MLP,which jointly performs entity recognition and accident phase classification.The model leverages a shared BERT encoder to extract contextual features and incorporates a sentence span indexing module to align feature granularity.A cross-task mutual learning mechanism is also introduced to strengthen semantic representation.Findings–R-MLP effectively mitigates the impact of semantic complexity and data sparsity in domain entities and enhances the model’s ability to capture inter-sentence semantic dependencies.Experimental results show that R-MLP achieves a maximum F1-score of 0.736 in extracting six types of key railway accident entities,significantly outperforming baseline models such as RoBERTa and MacBERT.Originality/value–This demonstrates the proposed method’s superior generalization and accuracy in domainspecific entity extraction tasks,confirming its effectiveness and practical value. 展开更多
关键词 Accident report texts entity extraction Accident phase classification Multi-task model Mutual learning mechanism
在线阅读 下载PDF
Chinese Named Entity Recognition Method for Musk Deer Domain Based on Cross-Attention Enhanced Lexicon Features
6
作者 Yumei Hao Haiyan Wang Dong Zhang 《Computers, Materials & Continua》 2025年第5期2989-3005,共17页
Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarizatio... Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarization system of musk deer domain.Due to limited annotated data,diverse entity types,and the ambiguity of Chinese word boundaries in musk deer domain NER,we present a novel NER model,CAELF-GP,which is based on cross-attention mechanism enhanced lexical features(CAELF).Specifically,we employ BERT as a character encoder and advocate the integration of external lexical information at the character representation layer.In the feature fusion module,instead of indiscriminately merging external dictionary information,we innovatively adopted a feature fusion method based on a cross-attention mechanism,which guides the model to focus on important lexical information by calculating the correlation between each character and its corresponding word sets.This module enhances the model’s semantic representation ability and entity boundary recognition capability.Ultimately,we introduce the decoding module of GlobalPointer(GP)for entity type recognition,capable of identifying both nested and non-nested entities.Since there is currently no publicly available dataset for the musk deer domain,we built a named entity recognition dataset for this domain by collecting relevant literature and working under the guidance of domain experts.The dataset facilitates the training and validation of the model and provides data foundation for subsequent related research.The model undergoes experimentation on two public datasets and the dataset of musk deer domain.The results show that it is superior to the baseline models,offering a promising technical avenue for the intelligent recognition of named entities in the musk deer domain. 展开更多
关键词 Named entity recognition musk deer cross-attention lexicon enhancement
在线阅读 下载PDF
基于GEOWAY Entity 的存量DLG转基础地理实体流程探讨
7
作者 张岱琼 《测绘与空间地理信息》 2025年第6期77-80,共4页
自然资源部于2021年3月发布的《新型基础测绘体系建设试点技术大纲》指出,突破口是地理实体,以此积极推进新型基础测绘试点工作。其中,一项非常重要的任务就是将存量基础测绘矢量数据DLG转为地理实体,这样可以大大降低构建地理实体的成... 自然资源部于2021年3月发布的《新型基础测绘体系建设试点技术大纲》指出,突破口是地理实体,以此积极推进新型基础测绘试点工作。其中,一项非常重要的任务就是将存量基础测绘矢量数据DLG转为地理实体,这样可以大大降低构建地理实体的成本。本文先分析存量DLG,接着理清DLG、图元和地理实体之间的关系,然后基于山西省基础地理实体试点项目,探讨通过GEOWAY Entity软件实现忻府区存量DLG转地理实体的流程,以实际操作验证了流程的可行性,并提出未来改进的方向。 展开更多
关键词 GEOWAY entity 存量DLG 基础地理实体 流程
在线阅读 下载PDF
Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer
8
作者 Jin Zhang Ziyue Zhang +7 位作者 Lobsang Yeshi Dorje Tashi Xiangshi Wang Yuqing Cai Yongbin Yu Xiangxiang Wang Nyima Tashi Gadeng Luosang 《CAAI Transactions on Intelligence Technology》 2025年第4期1148-1158,共11页
Tibetan medical named entity recognition(Tibetan MNER)involves extracting specific types of medical entities from unstructured Tibetan medical texts.Tibetan MNER provide important data support for the work related to ... Tibetan medical named entity recognition(Tibetan MNER)involves extracting specific types of medical entities from unstructured Tibetan medical texts.Tibetan MNER provide important data support for the work related to Tibetan medicine.However,existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information,failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information,which ultimately impacts the accuracy of entity recognition.This paper proposes an improved embedding representation method called syllable-word-sentence embedding.By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion,the syllable-word-sentence embedding is integrated into the transformer,enhancing the specificity and diversity of feature representations.The model leverages multi-level and multi-granularity semantic information,thereby improving the performance of Tibetan MNER.We evaluate our proposed model on datasets from various domains.The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed,achieving an F1 score of 93.59%,which represents an improvement of 1.24%compared to the vanilla FLAT.Additionally,results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities,with an F1 score of 71.39%,which is a 1.34%improvement over the vanilla FLAT. 展开更多
关键词 named entity recognition syllable-word-sentence embedding Tibetan lexicon Tibetan medicine
在线阅读 下载PDF
Multi-Modal Named Entity Recognition with Auxiliary Visual Knowledge and Word-Level Fusion
9
作者 Huansha Wang Ruiyang Huang +1 位作者 Qinrang Liu Xinghao Wang 《Computers, Materials & Continua》 2025年第6期5747-5760,共14页
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ... Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines. 展开更多
关键词 Multi-modal named entity recognition large language model multi-modal fusion
在线阅读 下载PDF
Causal Representation Enhances Cross-Domain Named Entity Recognition in Large Language Models
10
作者 Jiahao Wu Jinzhong Xu +2 位作者 Xiaoming Liu Guan Yang Jie Liu 《Computers, Materials & Continua》 2025年第5期2809-2828,共20页
Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information ... Large language models cross-domain named entity recognition task in the face of the scarcity of large language labeled data in a specific domain,due to the entity bias arising from the variation of entity information between different domains,which makes large language models prone to spurious correlations problems when dealing with specific domains and entities.In order to solve this problem,this paper proposes a cross-domain named entity recognition method based on causal graph structure enhancement,which captures the cross-domain invariant causal structural representations between feature representations of text sequences and annotation sequences by establishing a causal learning and intervention module,so as to improve the utilization of causal structural features by the large languagemodels in the target domains,and thus effectively alleviate the false entity bias triggered by the false relevance problem;meanwhile,through the semantic feature fusion module,the semantic information of the source and target domains is effectively combined.The results show an improvement of 2.47%and 4.12%in the political and medical domains,respectively,compared with the benchmark model,and an excellent performance in small-sample scenarios,which proves the effectiveness of causal graph structural enhancement in improving the accuracy of cross-domain entity recognition and reducing false correlations. 展开更多
关键词 Large language model entity bias causal graph structure
在线阅读 下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer 被引量:3
11
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
在线阅读 下载PDF
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning 被引量:1
12
作者 Hang He Chao Ma +6 位作者 Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou 《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ... Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts. 展开更多
关键词 Prompt Learning Named entity Recognition(NER) low resource geological text text information mining big data geology.
原文传递
Chinese named entity recognition with multi-network fusion of multi-scale lexical information 被引量:1
13
作者 Yan Guo Hong-Chen Liu +3 位作者 Fu-Jiang Liu Wei-Hua Lin Quan-Sen Shao Jun-Shun Su 《Journal of Electronic Science and Technology》 EI CAS CSCD 2024年第4期53-80,共28页
Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is ... Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is widely used and often yields notable results.However,recognizing each entity with high accuracy remains challenging.Many entities do not appear as single words but as part of complex phrases,making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance.To address this issue,we propose an improved Bidirectional Encoder Representations from Transformers(BERT)character word conditional random field(CRF)(BCWC)model.It incorporates a pre-trained word embedding model using the skip-gram with negative sampling(SGNS)method,alongside traditional BERT embeddings.By comparing datasets with different word segmentation tools,we obtain enhanced word embedding features for segmented data.These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks(IDCNNs)with varying expansion rates to capture features at multiple scales and extract diverse contextual information.Additionally,a multi-attention mechanism is employed to fuse word and character embeddings.Finally,CRFs are applied to learn sequence constraints and optimize entity label annotations.A series of experiments are conducted on three public datasets,demonstrating that the proposed method outperforms the recent advanced baselines.BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information,thereby improving the accuracy of CNER.Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval,particularly in domain-specific natural language processing tasks that require high entity recognition precision. 展开更多
关键词 Bi-directional long short-term memory(BiLSTM) Chinese named entity recognition(CNER) Iterated dilated convolutional neural network(IDCNN) Multi-network integration Multi-scale lexical features
在线阅读 下载PDF
Unveiling the intrinsic properties of single NiZnFeO_x entity for promoting electrocatalytic oxygen evolution
14
作者 Zhihao Gu Jiabo Le +3 位作者 Hehe Wei Zehui Sun Mahmoud Elsayed Hafez Wei Ma 《Chinese Chemical Letters》 SCIE CAS CSCD 2024年第4期181-186,共6页
Although considerable research efforts have been devoted to the design and development of non-noble electrocatalysts for oxygen evolution reaction(OER), substantial enhancement of OER performance with commercial-scale... Although considerable research efforts have been devoted to the design and development of non-noble electrocatalysts for oxygen evolution reaction(OER), substantial enhancement of OER performance with commercial-scale water electrolysis remains a big challenge. This could result from the difficulties in detecting the intrinsic properties and overlooking the assembly process for electrochemical OER process. Here, we employ a microjet collision method to investigate the intrinsic OER activities of individual NiZnFeO_x entities with and without a moderate magnetic field. Our results demonstrate that single NiZnFeO_x nanoparticles(NPs) show the excellent OER performance with a lowest onset potential(~1.35 V vs. RHE) and a greatest magnetic enhancement(~118%) among bulk materials, single agglomerations and NPs. Furthermore, we explore the utility of theoretical investigation by density functional theory(DFT)calculations for studying OER process on NiZnFeO_x surfaces without and with spin alignment, indicating monodispersed NiZnFeO_xNPs with totally spin alignment facilitates the OER process under the external magnetic field. It is found that the well-dispersion of NiZnFeO_x NPs would increase the electrical conductivity and the surface spin state, resulting in promoting their OER activities. This work provides a test for uncovering the essential roles of NPs assembly to a significant promotion of their magnet-assisted OER. 展开更多
关键词 Single entity OER Magnetic enhancement ELECTROCATALYSIS Well-dispersion
原文传递
GeoNER:Geological Named Entity Recognition with Enriched Domain Pre-Training Model and Adversarial Training
15
作者 MA Kai HU Xinxin +4 位作者 TIAN Miao TAN Yongjian ZHENG Shuai TAO Liufeng QIU Qinjun 《Acta Geologica Sinica(English Edition)》 SCIE CAS CSCD 2024年第5期1404-1417,共14页
As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate unders... As important geological data,a geological report contains rich expert and geological knowledge,but the challenge facing current research into geological knowledge extraction and mining is how to render accurate understanding of geological reports guided by domain knowledge.While generic named entity recognition models/tools can be utilized for the processing of geoscience reports/documents,their effectiveness is hampered by a dearth of domain-specific knowledge,which in turn leads to a pronounced decline in recognition accuracy.This study summarizes six types of typical geological entities,with reference to the ontological system of geological domains and builds a high quality corpus for the task of geological named entity recognition(GNER).In addition,Geo Wo BERT-adv BGP(Geological Word-base BERTadversarial training Bi-directional Long Short-Term Memory Global Pointer)is proposed to address the issues of ambiguity,diversity and nested entities for the geological entities.The model first uses the fine-tuned word granularitybased pre-training model Geo Wo BERT(Geological Word-base BERT)and combines the text features that are extracted using the Bi LSTM(Bi-directional Long Short-Term Memory),followed by an adversarial training algorithm to improve the robustness of the model and enhance its resistance to interference,the decoding finally being performed using a global association pointer algorithm.The experimental results show that the proposed model for the constructed dataset achieves high performance and is capable of mining the rich geological information. 展开更多
关键词 geological named entity recognition geological report adversarial training confrontation training global pointer pre-training model
在线阅读 下载PDF
Network Configuration Entity Extraction Method Based on Transformer with Multi-Head Attention Mechanism
16
作者 Yang Yang Zhenying Qu +2 位作者 Zefan Yan Zhipeng Gao Ti Wang 《Computers, Materials & Continua》 SCIE EI 2024年第1期735-757,共23页
Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurat... Nowadays,ensuring thequality of networkserviceshas become increasingly vital.Experts are turning toknowledge graph technology,with a significant emphasis on entity extraction in the identification of device configurations.This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms.Initially,an improved active learning approach is employed to select the most valuable unlabeled samples,which are subsequently submitted for expert labeling.This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set.Then the labeled samples are utilized to train the model for network configuration entity extraction.Furthermore,the multi-head self-attention of the transformer model is enhanced by introducing the Adaptive Weighting method based on the Laplace mixture distribution.This enhancement enables the transformer model to dynamically adapt its focus to words in various positions,displaying exceptional adaptability to abnormal data and further elevating the accuracy of the proposed model.Through comparisons with Random Sampling(RANDOM),Maximum Normalized Log-Probability(MNLP),Least Confidence(LC),Token Entrop(TE),and Entropy Query by Bagging(EQB),the proposed method,Entropy Query by Bagging and Maximum Influence Active Learning(EQBMIAL),achieves comparable performance with only 40% of the samples on both datasets,while other algorithms require 50% of the samples.Furthermore,the entity extraction algorithm with the Adaptive Weighted Multi-head Attention mechanism(AW-MHA)is compared with BILSTM-CRF,Mutil_Attention-Bilstm-Crf,Deep_Neural_Model_NER and BERT_Transformer,achieving precision rates of 75.98% and 98.32% on the two datasets,respectively.Statistical tests demonstrate the statistical significance and effectiveness of the proposed algorithms in this paper. 展开更多
关键词 entity extraction network configuration knowledge graph active learning TRANSFORMER
在线阅读 下载PDF
Few-Shot Named Entity Recognition with the Integration of Spatial Features
17
作者 LIU Zhiwei HUANG Bo +3 位作者 XIA Chunming XIONG Yujie ZANG Zhensen ZHANG Yongqiang 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2024年第2期125-133,共9页
The few-shot named entity recognition(NER)task aims to train a robust model in the source domain and transfer it to the target domain with very few annotated data.Currently,some approaches rely on the prototypical net... The few-shot named entity recognition(NER)task aims to train a robust model in the source domain and transfer it to the target domain with very few annotated data.Currently,some approaches rely on the prototypical network for NER.However,these approaches often overlook the spatial relations in the span boundary matrix because entity words tend to depend more on adjacent words.We propose using a multidimensional convolution module to address this limitation to capture short-distance spatial dependencies.Additionally,we uti-lize an improved prototypical network and assign different weights to different samples that belong to the same class,thereby enhancing the performance of the few-shot NER task.Further experimental analysis demonstrates that our approach has significantly improved over baseline models across multiple datasets. 展开更多
关键词 named entity recognition prototypical network spatial relation multidimensional convolution
原文传递
SciCN:A Scientific Dataset for Chinese Named Entity Recognition
18
作者 Jing Yang Bin Ji +2 位作者 Shasha Li Jun Ma Jie Yu 《Computers, Materials & Continua》 SCIE EI 2024年第3期4303-4315,共13页
Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly prom... Named entity recognition(NER)is a fundamental task of information extraction(IE),and it has attracted considerable research attention in recent years.The abundant annotated English NER datasets have significantly promoted the NER research in the English field.By contrast,much fewer efforts are made to the Chinese NER research,especially in the scientific domain,due to the scarcity of Chinese NER datasets.To alleviate this problem,we present aChinese scientificNER dataset–SciCN,which contains entity annotations of titles and abstracts derived from 3,500 scientific papers.We manually annotate a total of 62,059 entities,and these entities are classified into six types.Compared to English scientific NER datasets,SciCN has a larger scale and is more diverse,for it not only contains more paper abstracts but these abstracts are derived from more research fields.To investigate the properties of SciCN and provide baselines for future research,we adapt a number of previous state-of-theart Chinese NER models to evaluate SciCN.Experimental results show that SciCN is more challenging than other Chinese NER datasets.In addition,previous studies have proven the effectiveness of using lexicons to enhance Chinese NER models.Motivated by this fact,we provide a scientific domain-specific lexicon.Validation results demonstrate that our lexicon delivers better performance gains than lexicons of other domains.We hope that the SciCN dataset and the lexicon will enable us to benchmark the NER task regarding the Chinese scientific domain and make progress for future research.The dataset and lexicon are available at:https://github.com/yangjingla/SciCN.git. 展开更多
关键词 Named entity recognition DATASET scientific information extraction LEXICON
在线阅读 下载PDF
Semantic Entity Recognition and Relation Construction Method for Assembly Process Document
19
作者 顾星海 花豹 +2 位作者 刘亚辉 孙学民 鲍劲松 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期537-556,共20页
Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured... Assembly process documents record the designers'intention or knowledge.However,common knowl-edge extraction methods are not well suitable for assembly process documents,because of its tabular form and unstructured natural language texts.In this paper,an assembly semantic entity recognition and relation con-struction method oriented to assembly process documents is proposed.First,the assembly process sentences are extracted from the table through concerned region recognition and cell division,and they will be stored as a key-value object file.Then,the semantic entities in the sentence are identified through the sequence tagging model based on the specific attention mechanism for assembly operation type.The syntactic rules are designed for realizing automatic construction of relation between entities.Finally,by using the self-constructed corpus,it is proved that the sequence tagging model in the proposed method performs better than the mainstream named entity recognition model when handling assembly process design language.The effectiveness of the proposed method is also analyzed through the simulation experiment in the small-scale real scene,compared with manual method.The results show that the proposed method can help designers accumulate knowledge automatically and efficiently. 展开更多
关键词 assembly process design knowledge extraction named entity recognition text extraction in table dependency syntactic parsing attention mechanism
原文传递
A U-Shaped Network-Based Grid Tagging Model for Chinese Named Entity Recognition
20
作者 Yan Xiang Xuedong Zhao +3 位作者 Junjun Guo Zhiliang Shi Enbang Chen Xiaobo Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4149-4167,共19页
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d... Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively. 展开更多
关键词 Chinese named entity recognition character-pair relation classification grid tagging U-shaped segmentation network
在线阅读 下载PDF
上一页 1 2 20 下一页 到第
使用帮助 返回顶部