期刊文献+
共找到78篇文章
< 1 2 4 >
每页显示 20 50 100
A Chinese Named Entity Recognition Method for News Domain Based on Transfer Learning and Word Embeddings
1
作者 Rui Fang Liangzhong Cui 《Computers, Materials & Continua》 2025年第5期3247-3275,共29页
Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications li... Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications like news summarization and event tracking.However,NER in the news domain faces challenges due to insufficient annotated data,complex entity structures,and strong context dependencies.To address these issues,we propose a new Chinesenamed entity recognition method that integrates transfer learning with word embeddings.Our approach leverages the ERNIE pre-trained model for transfer learning and obtaining general language representations and incorporates the Soft-lexicon word embedding technique to handle varied entity structures.This dual-strategy enhances the model’s understanding of context and boosts its ability to process complex texts.Experimental results show that our method achieves an F1 score of 94.72% on a news dataset,surpassing baseline methods by 3%–4%,thereby confirming its effectiveness for Chinese-named entity recognition in the news domain. 展开更多
关键词 News domain named entity recognition(NER) transfer learning word embeddings ERNIE soft-lexicon
在线阅读 下载PDF
Named Entity Identification of Chinese Poetry and Wine Culture Based on ALBERT
2
作者 YANG Zhuang LI Zhaofei +2 位作者 WANG Jihua WEI Xudong ZHANG Yijie 《Journal of Shanghai Jiaotong university(Science)》 2025年第5期1065-1072,共8页
The task of identifying Chinese named entities of Chinese poetry and wine culture is a key step in the construction of a knowledge graph and a question and answer system.Aimed at the characteristics of Chinese poetry ... The task of identifying Chinese named entities of Chinese poetry and wine culture is a key step in the construction of a knowledge graph and a question and answer system.Aimed at the characteristics of Chinese poetry and wine culture entities with different lengths and high training cost of named entity recognition models at the present stage,this study proposes a lite BERT+bi-directional long short-term memory+attentional mechanisms+conditional random field(ALBERT+BILSTM+Att+CRF).The method first obtains the characterlevel semantic information by ALBERT module,then extracts its high-dimensional features by BILSTM module,weights the original word vector and the learned text vector by attention layer,and finally predicts the true label in CRF module(including five types:poem title,author,time,genre,and category).Through experiments on data sets related to Chinese poetry and wine culture,the results show that the method is more effective than existing mainstream models and can efficiently extract important entity information in Chinese poetry and wine culture,which is an effective method for the identification of named entities of varying lengths of poetry. 展开更多
关键词 poetry and wine culture named entity identification deep learning ALBERT bi-directional long short-term memory(BILSTM) attentional mechanisms(Att) conditional random field(CRF)
原文传递
Chinese Named Entity Recognition Method for Musk Deer Domain Based on Cross-Attention Enhanced Lexicon Features
3
作者 Yumei Hao Haiyan Wang Dong Zhang 《Computers, Materials & Continua》 2025年第5期2989-3005,共17页
Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarizatio... Named entity recognition(NER)in musk deer domain is the extraction of specific types of entities from unstructured texts,constituting a fundamental component of the knowledge graph,Q&A system,and text summarization system of musk deer domain.Due to limited annotated data,diverse entity types,and the ambiguity of Chinese word boundaries in musk deer domain NER,we present a novel NER model,CAELF-GP,which is based on cross-attention mechanism enhanced lexical features(CAELF).Specifically,we employ BERT as a character encoder and advocate the integration of external lexical information at the character representation layer.In the feature fusion module,instead of indiscriminately merging external dictionary information,we innovatively adopted a feature fusion method based on a cross-attention mechanism,which guides the model to focus on important lexical information by calculating the correlation between each character and its corresponding word sets.This module enhances the model’s semantic representation ability and entity boundary recognition capability.Ultimately,we introduce the decoding module of GlobalPointer(GP)for entity type recognition,capable of identifying both nested and non-nested entities.Since there is currently no publicly available dataset for the musk deer domain,we built a named entity recognition dataset for this domain by collecting relevant literature and working under the guidance of domain experts.The dataset facilitates the training and validation of the model and provides data foundation for subsequent related research.The model undergoes experimentation on two public datasets and the dataset of musk deer domain.The results show that it is superior to the baseline models,offering a promising technical avenue for the intelligent recognition of named entities in the musk deer domain. 展开更多
关键词 named entity recognition musk deer cross-attention lexicon enhancement
在线阅读 下载PDF
Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer
4
作者 Jin Zhang Ziyue Zhang +7 位作者 Lobsang Yeshi Dorje Tashi Xiangshi Wang Yuqing Cai Yongbin Yu Xiangxiang Wang Nyima Tashi Gadeng Luosang 《CAAI Transactions on Intelligence Technology》 2025年第4期1148-1158,共11页
Tibetan medical named entity recognition(Tibetan MNER)involves extracting specific types of medical entities from unstructured Tibetan medical texts.Tibetan MNER provide important data support for the work related to ... Tibetan medical named entity recognition(Tibetan MNER)involves extracting specific types of medical entities from unstructured Tibetan medical texts.Tibetan MNER provide important data support for the work related to Tibetan medicine.However,existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information,failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information,which ultimately impacts the accuracy of entity recognition.This paper proposes an improved embedding representation method called syllable-word-sentence embedding.By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion,the syllable-word-sentence embedding is integrated into the transformer,enhancing the specificity and diversity of feature representations.The model leverages multi-level and multi-granularity semantic information,thereby improving the performance of Tibetan MNER.We evaluate our proposed model on datasets from various domains.The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed,achieving an F1 score of 93.59%,which represents an improvement of 1.24%compared to the vanilla FLAT.Additionally,results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities,with an F1 score of 71.39%,which is a 1.34%improvement over the vanilla FLAT. 展开更多
关键词 named entity recognition syllable-word-sentence embedding Tibetan lexicon Tibetan medicine
在线阅读 下载PDF
FedCE:A Contrast Enhancement Federated Learning Method for Heterogeneous Medical Named Entity Recognition
5
作者 Kai Chang Hailong Sun +6 位作者 Jindou Wan Naiqian Zhang Yiming Liu Kuo Yang Zixin Shu Jianan Xia Xuezhong Zhou 《Tsinghua Science and Technology》 2025年第6期2384-2398,共15页
Medical Named Entity Recognition(NER)plays a crucial role in attaining precise patient portraits as well as providing support for intelligent diagnosis and treatment decisions.Federated Learning(FL)enables collaborati... Medical Named Entity Recognition(NER)plays a crucial role in attaining precise patient portraits as well as providing support for intelligent diagnosis and treatment decisions.Federated Learning(FL)enables collaborative modeling and training across multiple endpoints without exposing the original data.However,the statistical heterogeneity exhibited by clinical medical text records poses a challenge for FL methods to support the training of NER models in such scenarios.We propose a Federated Contrast Enhancement(FedCE)method for NER to address the challenges faced by non-large-scale pre-trained models in FL for labelheterogeneous.The method leverages a multi-view encoder structure to capture both global and local semantic information,and employs contrastive learning to enhance the interoperability of global knowledge and local context.We evaluate the performance of the FedCE method on three real-world clinical record datasets.We investigate the impact of factors,such as pooling methods,maximum input text length,and training rounds on FedCE.Additionally,we assess how well FedCE adapts to the base NER models and evaluate its generalization performance.The experimental results show that the FedCE method has obvious advantages and can be effectively applied to various basic models,which is of great theoretical and practical significance for advancing FL in healthcare settings. 展开更多
关键词 Federated Learning(FL) contrast enhancement heterogeneous data named entity Recognition(NER)
原文传递
Multi-Modal Named Entity Recognition with Auxiliary Visual Knowledge and Word-Level Fusion
6
作者 Huansha Wang Ruiyang Huang +1 位作者 Qinrang Liu Xinghao Wang 《Computers, Materials & Continua》 2025年第6期5747-5760,共14页
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ... Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines. 展开更多
关键词 Multi-modal named entity recognition large language model multi-modal fusion
在线阅读 下载PDF
Research of Clinical Named Entity Recognition Based on Bi-LSTM-CRF 被引量:17
7
作者 QIN Ying ZENG Yingfei 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第3期392-397,共6页
Electronic Medical Records(EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition(NER)in Natural Langu... Electronic Medical Records(EMR) with unstructured sentences and various conceptual expressions provide rich information for medical information extraction. However, common Named Entity Recognition(NER)in Natural Language Processing(NLP) are not well suitable for clinical NER in EMR. This study aims at applying neural networks to clinical concept extractions. We integrate Bidirectional Long Short-Term Memory Networks(Bi-LSTM) with a Conditional Random Fields(CRF) layer to detect three types of clinical named entities. Word representations fed into the neural networks are concatenated by character-based word embeddings and Continuous Bag of Words(CBOW) embeddings trained both on domain and non-domain corpus. We test our NER system on i2b2/VA open datasets and compare the performance with six related works, achieving the best result of NER with F1 value 0.853 7. We also point out a few specific problems in clinical concept extractions which will give some hints to deeper studies. 展开更多
关键词 clinical named entity recognition bidirectional long short-term memory networks conditional random fields
原文传递
Arabic Named Entity Recognition:A BERT-BGRU Approach 被引量:6
8
作者 Norah Alsaaran Maha Alrabiah 《Computers, Materials & Continua》 SCIE EI 2021年第7期471-485,共15页
Named Entity Recognition(NER)is one of the fundamental tasks in Natural Language Processing(NLP),which aims to locate,extract,and classify named entities into a predefined category such as person,organization and loca... Named Entity Recognition(NER)is one of the fundamental tasks in Natural Language Processing(NLP),which aims to locate,extract,and classify named entities into a predefined category such as person,organization and location.Most of the earlier research for identifying named entities relied on using handcrafted features and very large knowledge resources,which is time consuming and not adequate for resource-scarce languages such as Arabic.Recently,deep learning achieved state-of-the-art performance on many NLP tasks including NER without requiring hand-crafted features.In addition,transfer learning has also proven its efficiency in several NLP tasks by exploiting pretrained language models that are used to transfer knowledge learned from large-scale datasets to domain-specific tasks.Bidirectional Encoder Representation from Transformer(BERT)is a contextual language model that generates the semantic vectors dynamically according to the context of the words.BERT architecture relay on multi-head attention that allows it to capture global dependencies between words.In this paper,we propose a deep learning-based model by fine-tuning BERT model to recognize and classify Arabic named entities.The pre-trained BERT context embeddings were used as input features to a Bidirectional Gated Recurrent Unit(BGRU)and were fine-tuned using two annotated Arabic Named Entity Recognition(ANER)datasets.Experimental results demonstrate that the proposed model outperformed state-of-the-art ANER models achieving 92.28%and 90.68%F-measure values on the ANERCorp dataset and the merged ANERCorp and AQMAR dataset,respectively. 展开更多
关键词 named entity recognition ARABIC deep learning BGRU BERT
在线阅读 下载PDF
Adversarial Active Learning for Named Entity Recognition in Cybersecurity 被引量:5
9
作者 Tao Li Yongjin Hu +1 位作者 Ankang Ju Zhuoran Hu 《Computers, Materials & Continua》 SCIE EI 2021年第1期407-420,共14页
Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intellig... Owing to the continuous barrage of cyber threats,there is a massive amount of cyber threat intelligence.However,a great deal of cyber threat intelligence come from textual sources.For analysis of cyber threat intelligence,many security analysts rely on cumbersome and time-consuming manual efforts.Cybersecurity knowledge graph plays a significant role in automatics analysis of cyber threat intelligence.As the foundation for constructing cybersecurity knowledge graph,named entity recognition(NER)is required for identifying critical threat-related elements from textual cyber threat intelligence.Recently,deep neural network-based models have attained very good results in NER.However,the performance of these models relies heavily on the amount of labeled data.Since labeled data in cybersecurity is scarce,in this paper,we propose an adversarial active learning framework to effectively select the informative samples for further annotation.In addition,leveraging the long short-term memory(LSTM)network and the bidirectional LSTM(BiLSTM)network,we propose a novel NER model by introducing a dynamic attention mechanism into the BiLSTM-LSTM encoderdecoder.With the selected informative samples annotated,the proposed NER model is retrained.As a result,the performance of the NER model is incrementally enhanced with low labeling cost.Experimental results show the effectiveness of the proposed method. 展开更多
关键词 Adversarial learning active learning named entity recognition dynamic attention mechanism
在线阅读 下载PDF
RoBGP:A Chinese Nested Biomedical Named Entity Recognition Model Based on RoBERTa and Global Pointer 被引量:3
10
作者 Xiaohui Cui Chao Song +4 位作者 Dongmei Li Xiaolong Qu Jiao Long Yu Yang Hanchao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3603-3618,共16页
Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c... Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction. 展开更多
关键词 BIOMEDICINE knowledge base named entity recognition pretrained language model global pointer
在线阅读 下载PDF
A CONDITIONAL RANDOM FIELDS APPROACH TO BIOMEDICAL NAMED ENTITY RECOGNITION 被引量:4
11
作者 Wang Haochang Zhao Tiejun Li Sheng Yu Hao 《Journal of Electronics(China)》 2007年第6期838-844,共7页
Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system mak... Named entity recognition is a fundamental task in biomedical data mining. In this letter, a named entity recognition system based on CRFs (Conditional Random Fields) for biomedical texts is presented. The system makes extensive use of a diverse set of features, including local features, full text features and external resource features. All features incorporated in this system are described in detail, and the impacts of different feature sets on the performance of the system are evaluated. In order to improve the performance of system, post-processing modules are exploited to deal with the abbreviation phenomena, cascaded named entity and boundary errors identification. Evaluation on this system proved that the feature selection has important impact on the system performance, and the post-processing explored has an important contribution on system performance to achieve better resuits. 展开更多
关键词 Conditional Random Fields (CRFs) named entity recognition Feature selection Post-processing
在线阅读 下载PDF
A Novel Named Entity Recognition Scheme for Steel E-Commerce Platforms Using a Lite BERT 被引量:2
12
作者 Maojian Chen Xiong Luo +2 位作者 Hailun Shen Ziyang Huang Qiaojuan Peng 《Computer Modeling in Engineering & Sciences》 SCIE EI 2021年第10期47-63,共17页
In the era of big data,E-commerce plays an increasingly important role,and steel E-commerce certainly occupies a positive position.However,it is very difficult to choose satisfactory steel raw materials from diverse s... In the era of big data,E-commerce plays an increasingly important role,and steel E-commerce certainly occupies a positive position.However,it is very difficult to choose satisfactory steel raw materials from diverse steel commodities online on steel E-commerce platforms in the purchase of staffs.In order to improve the efficiency of purchasers searching for commodities on the steel E-commerce platforms,we propose a novel deep learning-based loss function for named entity recognition(NER).Considering the impacts of small sample and imbalanced data,in our NER scheme,the focal loss,the label smoothing,and the cross entropy are incorporated into a lite bidirectional encoder representations from transformers(BERT)model to avoid the over-fitting.Moreover,through the analysis of different classic annotation techniques used to tag data,an ideal one is chosen for the training model in our proposed scheme.Experiments are conducted on Chinese steel E-commerce datasets.The experimental results show that the training time of a lite BERT(ALBERT)-based method is much shorter than that of BERT-based models,while achieving the similar computational performance in terms of metrics precision,recall,and F1 with BERT-based models.Meanwhile,our proposed approach performs much better than that of combining Word2Vec,bidirectional long short-term memory(Bi-LSTM),and conditional random field(CRF)models,in consideration of training time and F1. 展开更多
关键词 named entity recognition bidirectional encoder representations from transformers steel E-commerce platform annotation technique
在线阅读 下载PDF
A Federated Named Entity Recognition Model with Explicit Relation for Power Grid 被引量:2
13
作者 Jingtang Luo Shiying Yao +2 位作者 Changming Zhao Jie Xu Jim Feng 《Computers, Materials & Continua》 SCIE EI 2023年第5期4207-4216,共10页
The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but th... The power grid operation process is complex,and many operation process data involve national security,business secrets,and user privacy.Meanwhile,labeled datasets may exist in many different operation platforms,but they cannot be directly shared since power grid data is highly privacysensitive.How to use these multi-source heterogeneous data as much as possible to build a power grid knowledge map under the premise of protecting privacy security has become an urgent problem in developing smart grid.Therefore,this paper proposes federated learning named entity recognition method for the power grid field,aiming to solve the problem of building a named entity recognition model covering the entire power grid process training by data with different security requirements.We decompose the named entity recognition(NER)model FLAT(Chinese NER Using Flat-Lattice Transformer)in each platform into a global part and a local part.The local part is used to capture the characteristics of the local data in each platform and is updated using locally labeled data.The global part is learned across different operation platforms to capture the shared NER knowledge.Its local gradients fromdifferent platforms are aggregated to update the global model,which is further delivered to each platform to update their global part.Experiments on two publicly available Chinese datasets and one power grid dataset validate the effectiveness of our method. 展开更多
关键词 Power grid named entity recognition federal learning
在线阅读 下载PDF
Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF 被引量:2
14
作者 Zhen Zhen Jian Gao 《Computers, Materials & Continua》 SCIE EI 2023年第10期299-323,共25页
In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in ... In recent years,cyber attacks have been intensifying and causing great harm to individuals,companies,and countries.The mining of cyber threat intelligence(CTI)can facilitate intelligence integration and serve well in combating cyber attacks.Named Entity Recognition(NER),as a crucial component of text mining,can structure complex CTI text and aid cybersecurity professionals in effectively countering threats.However,current CTI NER research has mainly focused on studying English CTI.In the limited studies conducted on Chinese text,existing models have shown poor performance.To fully utilize the power of Chinese pre-trained language models(PLMs)and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs,we propose a residual dilated convolutional neural network(RDCNN)with a conditional random field(CRF)based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking(RoBERTa-wwm),abbreviated as RoBERTa-wwm-RDCNN-CRF.We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%,which exceeds the common baseline model bidirectional encoder representation from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-CRF in this field by about 19.52%and exceeds the current state-of-the-art model,BERT-RDCNN-CRF,by about 3.53%.In addition,we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model.The RoBERTa-wwm-RDCNN-CRF model,the shared pre-processing,and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction,contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat(APT)attack detection. 展开更多
关键词 CYBERSECURITY cyber threat intelligence named entity recognition
在线阅读 下载PDF
Extracting Named Entity Using Entity Labeling in Geological Text Using Deep Learning Approach 被引量:1
15
作者 Qinjun Qiu Miao Tian +5 位作者 Zhong Xie Yongjian Tan Kai Ma Qingfang Wang Shengyong Pan Liufeng Tao 《Journal of Earth Science》 SCIE CAS CSCD 2023年第5期1406-1417,共12页
Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integrati... Artificial intelligence(AI) is the key to mining and enhancing the value of big data, and knowledge graph is one of the important cornerstones of artificial intelligence, which is the core foundation for the integration of statistical and physical representations. Named entity recognition is a fundamental research task for building knowledge graphs, which needs to be supported by a high-quality corpus, and currently there is a lack of high-quality named entity recognition corpus in the field of geology, especially in Chinese. In this paper, based on the conceptual structure of geological ontology and the analysis of the characteristics of geological texts, a classification system of geological named entity types is designed with the guidance and participation of geological experts, a corresponding annotation specification is formulated, an annotation tool is developed, and the first named entity recognition corpus for the geological domain is annotated based on real geological reports. The total number of words annotated was 698 512 and the number of entities was 23 345. The paper also explores the feasibility of a model pre-annotation strategy and presents a statistical analysis of the distribution of technical and term categories across genres and the consistency of corpus annotation. Based on this corpus, a Lite Bidirectional Encoder Representations from Transformers(ALBERT)-Bi-directional Long Short-Term Memory(BiLSTM)-Conditional Random Fields(CRF) and ALBERT-BiLSTM models are selected for experiments, and the results show that the F1-scores of the recognition performance of the two models reach 0.75 and 0.65 respectively, providing a corpus basis and technical support for information extraction in the field of geology. 展开更多
关键词 ontology geological reports named entity recognition geological corpus construction semi-automated annotation platforms deep learning
原文传递
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning 被引量:1
16
作者 Hang He Chao Ma +6 位作者 Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou 《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ... Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts. 展开更多
关键词 Prompt Learning named entity Recognition(NER) low resource geological text text information mining big data geology.
原文传递
Chinese named entity recognition with multi-network fusion of multi-scale lexical information 被引量:1
17
作者 Yan Guo Hong-Chen Liu +3 位作者 Fu-Jiang Liu Wei-Hua Lin Quan-Sen Shao Jun-Shun Su 《Journal of Electronic Science and Technology》 EI CAS CSCD 2024年第4期53-80,共28页
Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is ... Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is widely used and often yields notable results.However,recognizing each entity with high accuracy remains challenging.Many entities do not appear as single words but as part of complex phrases,making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance.To address this issue,we propose an improved Bidirectional Encoder Representations from Transformers(BERT)character word conditional random field(CRF)(BCWC)model.It incorporates a pre-trained word embedding model using the skip-gram with negative sampling(SGNS)method,alongside traditional BERT embeddings.By comparing datasets with different word segmentation tools,we obtain enhanced word embedding features for segmented data.These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks(IDCNNs)with varying expansion rates to capture features at multiple scales and extract diverse contextual information.Additionally,a multi-attention mechanism is employed to fuse word and character embeddings.Finally,CRFs are applied to learn sequence constraints and optimize entity label annotations.A series of experiments are conducted on three public datasets,demonstrating that the proposed method outperforms the recent advanced baselines.BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information,thereby improving the accuracy of CNER.Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval,particularly in domain-specific natural language processing tasks that require high entity recognition precision. 展开更多
关键词 Bi-directional long short-term memory(BiLSTM) Chinese named entity recognition(CNER) Iterated dilated convolutional neural network(IDCNN) Multi-network integration Multi-scale lexical features
在线阅读 下载PDF
Data Masking for Chinese Electronic Medical Records with Named Entity Recognition 被引量:1
18
作者 Tianyu He Xiaolong Xu +3 位作者 Zhichen Hu Qingzhan Zhao Jianguo Dai Fei Dai 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3657-3673,共17页
With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ... With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models. 展开更多
关键词 named entity recognition Chinese electronic medical records data masking principal component analysis regular expression
在线阅读 下载PDF
Overview of Named Entity Recognition 被引量:8
19
作者 Xing Liu Huiqin Chen Wangui Xia 《Journal of Contemporary Educational Research》 2022年第5期65-68,共4页
Named entity recognition,as a sub-task of information extraction,has attracted widespread attention from scholars at home and abroad since it was proposed,and a series of studies and discussions have been carried out ... Named entity recognition,as a sub-task of information extraction,has attracted widespread attention from scholars at home and abroad since it was proposed,and a series of studies and discussions have been carried out based on it.This paper discusses the existing named entity recognition technology based on its history of development. 展开更多
关键词 named entity recognition Information extraction
在线阅读 下载PDF
Person-specific named entity recognition using SVM with rich feature sets 被引量:2
20
作者 Hui NIE 《Chinese Journal of Library and Information Science》 2012年第3期27-46,共20页
Purpose: The purpose of the study is to explore the potential use of nature language process(NLP) and machine learning(ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER t... Purpose: The purpose of the study is to explore the potential use of nature language process(NLP) and machine learning(ML) techniques and intents to find a feasible strategy and effective approach to fulfill the NER task for Web oriented person-specific information extraction.Design/methodology/approach: An SVM-based multi-classification approach combined with a set of rich NLP features derived from state-of-the-art NLP techniques has been proposed to fulfill the NER task. A group of experiments has been designed to investigate the influence of various NLP-based features to the performance of the system,especially the semantic features. Optimal parameter settings regarding with SVM models,including kernel functions,margin parameter of SVM model and the context window size,have been explored through experiments as well.Findings: The SVM-based multi-classification approach has been proved to be effective for the NER task. This work shows that NLP-based features are of great importance in datadriven NE recognition,particularly the semantic features. The study indicates that higher order kernel function may not be desirable for the specific classification problem in practical application. The simple linear-kernel SVM model performed better in this case. Moreover,the modified SVM models with uneven margin parameter are more common and flexible,which have been proved to solve the imbalanced data problem better.Research limitations/implications: The SVM-based approach for NER problem is only proved to be effective on limited experiment data. Further research need to be conducted on the large batch of real Web data. In addition,the performance of the NER system need be tested when incorporated into a complete IE framework.Originality/value: The specially designed experiments make it feasible to fully explore the characters of the data and obtain the optimal parameter settings for the NER task,leading to a preferable rate in recall,precision and F1measures. The overall system performance(F1value) for all types of name entities can achieve above 88.6%,which can meet the requirements for the practical application. 展开更多
关键词 named entity recognition Natural language processing SVM-based classifier Feature selection
原文传递
上一页 1 2 4 下一页 到第
使用帮助 返回顶部