期刊文献+
共找到416篇文章
< 1 2 21 >
每页显示 20 50 100
A Chinese Named Entity Recognition Method for News Domain Based on Transfer Learning and Word Embeddings
1
作者 Rui Fang Liangzhong Cui 《Computers, Materials & Continua》 2025年第5期3247-3275,共29页
Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications li... Named Entity Recognition(NER)is vital in natural language processing for the analysis of news texts,as it accurately identifies entities such as locations,persons,and organizations,which is crucial for applications like news summarization and event tracking.However,NER in the news domain faces challenges due to insufficient annotated data,complex entity structures,and strong context dependencies.To address these issues,we propose a new Chinesenamed entity recognition method that integrates transfer learning with word embeddings.Our approach leverages the ERNIE pre-trained model for transfer learning and obtaining general language representations and incorporates the Soft-lexicon word embedding technique to handle varied entity structures.This dual-strategy enhances the model’s understanding of context and boosts its ability to process complex texts.Experimental results show that our method achieves an F1 score of 94.72% on a news dataset,surpassing baseline methods by 3%–4%,thereby confirming its effectiveness for Chinese-named entity recognition in the news domain. 展开更多
关键词 News domain named entity recognition(ner) transfer learning word embeddings ERNIE soft-lexicon
在线阅读 下载PDF
Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning 被引量:2
2
作者 Hang He Chao Ma +6 位作者 Shan Ye Wenqiang Tang Yuxuan Zhou Zhen Yu Jiaxin Yi Li Hou Mingcai Hou 《Journal of Earth Science》 SCIE CAS CSCD 2024年第3期1035-1043,共9页
Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science ... Geological reports are a significant accomplishment for geologists involved in geological investigations and scientific research as they contain rich data and textual information.With the rapid development of science and technology,a large number of textual reports have accumulated in the field of geology.However,many non-hot topics and non-English speaking regions are neglected in mainstream geoscience databases for geological information mining,making it more challenging for some researchers to extract necessary information from these texts.Natural Language Processing(NLP)has obvious advantages in processing large amounts of textual data.The objective of this paper is to identify geological named entities from Chinese geological texts using NLP techniques.We propose the RoBERTa-Prompt-Tuning-NER method,which leverages the concept of Prompt Learning and requires only a small amount of annotated data to train superior models for recognizing geological named entities in low-resource dataset configurations.The RoBERTa layer captures context-based information and longer-distance dependencies through dynamic word vectors.Finally,we conducted experiments on the constructed Geological Named Entity Recognition(GNER)dataset.Our experimental results show that the proposed model achieves the highest F1 score of 80.64%among the four baseline algorithms,demonstrating the reliability and robustness of using the model for Named Entity Recognition of geological texts. 展开更多
关键词 Prompt Learning named entity recognition(ner) low resource geological text text information mining big data geology.
原文传递
Chinese named entity recognition with multi-network fusion of multi-scale lexical information 被引量:3
3
作者 Yan Guo Hong-Chen Liu +3 位作者 Fu-Jiang Liu Wei-Hua Lin Quan-Sen Shao Jun-Shun Su 《Journal of Electronic Science and Technology》 EI CAS CSCD 2024年第4期53-80,共28页
Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is ... Named entity recognition(NER)is an important part in knowledge extraction and one of the main tasks in constructing knowledge graphs.In today’s Chinese named entity recognition(CNER)task,the BERT-BiLSTM-CRF model is widely used and often yields notable results.However,recognizing each entity with high accuracy remains challenging.Many entities do not appear as single words but as part of complex phrases,making it difficult to achieve accurate recognition using word embedding information alone because the intricate lexical structure often impacts the performance.To address this issue,we propose an improved Bidirectional Encoder Representations from Transformers(BERT)character word conditional random field(CRF)(BCWC)model.It incorporates a pre-trained word embedding model using the skip-gram with negative sampling(SGNS)method,alongside traditional BERT embeddings.By comparing datasets with different word segmentation tools,we obtain enhanced word embedding features for segmented data.These features are then processed using the multi-scale convolution and iterated dilated convolutional neural networks(IDCNNs)with varying expansion rates to capture features at multiple scales and extract diverse contextual information.Additionally,a multi-attention mechanism is employed to fuse word and character embeddings.Finally,CRFs are applied to learn sequence constraints and optimize entity label annotations.A series of experiments are conducted on three public datasets,demonstrating that the proposed method outperforms the recent advanced baselines.BCWC is capable to address the challenge of recognizing complex entities by combining character-level and word-level embedding information,thereby improving the accuracy of CNER.Such a model is potential to the applications of more precise knowledge extraction such as knowledge graph construction and information retrieval,particularly in domain-specific natural language processing tasks that require high entity recognition precision. 展开更多
关键词 Bi-directional long short-term memory(BiLSTM) chinese named entity recognition(Cner) Iterated dilated convolutional neural network(IDCNN) Multi-network integration Multi-scale lexical features
在线阅读 下载PDF
Chinese Named Entity Recognition with Character-Level BLSTM and Soft Attention Model 被引量:1
4
作者 Jize Yin Senlin Luo +1 位作者 Zhouting Wu Limin Pan 《Journal of Beijing Institute of Technology》 EI CAS 2020年第1期60-71,共12页
Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-le... Unlike named entity recognition(NER)for English,the absence of word boundaries reduces the final accuracy for Chinese NER.To avoid accumulated error introduced by word segmentation,a deep model extracting character-level features is carefully built and becomes a basis for a new Chinese NER method,which is proposed in this paper.This method converts the raw text to a character vector sequence,extracts global text features with a bidirectional long short-term memory and extracts local text features with a soft attention model.A linear chain conditional random field is also used to label all the characters with the help of the global and local text features.Experiments based on the Microsoft Research Asia(MSRA)dataset are designed and implemented.Results show that the proposed method has good performance compared to other methods,which proves that the global and local text features extracted have a positive influence on Chinese NER.For more variety in the test domains,a resume dataset from Sina Finance is also used to prove the effectiveness of the proposed method. 展开更多
关键词 chinese named entity recognition(ner) character-level BIDIRECTIONAL long SHORT-TERM memory SOFT attention model
在线阅读 下载PDF
Data Masking for Chinese Electronic Medical Records with Named Entity Recognition 被引量:1
5
作者 Tianyu He Xiaolong Xu +3 位作者 Zhichen Hu Qingzhan Zhao Jianguo Dai Fei Dai 《Intelligent Automation & Soft Computing》 SCIE 2023年第6期3657-3673,共17页
With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so ... With the rapid development of information technology,the electronifi-cation of medical records has gradually become a trend.In China,the population base is huge and the supporting medical institutions are numerous,so this reality drives the conversion of paper medical records to electronic medical records.Electronic medical records are the basis for establishing a smart hospital and an important guarantee for achieving medical intelligence,and the massive amount of electronic medical record data is also an important data set for conducting research in the medical field.However,electronic medical records contain a large amount of private patient information,which must be desensitized before they are used as open resources.Therefore,to solve the above problems,data masking for Chinese electronic medical records with named entity recognition is proposed in this paper.Firstly,the text is vectorized to satisfy the required format of the model input.Secondly,since the input sentences may have a long or short length and the relationship between sentences in context is not negligible.To this end,a neural network model for named entity recognition based on bidirectional long short-term memory(BiLSTM)with conditional random fields(CRF)is constructed.Finally,the data masking operation is performed based on the named entity recog-nition results,mainly using regular expression filtering encryption and principal component analysis(PCA)word vector compression and replacement.In addi-tion,comparison experiments with the hidden markov model(HMM)model,LSTM-CRF model,and BiLSTM model are conducted in this paper.The experi-mental results show that the method used in this paper achieves 92.72%Accuracy,92.30%Recall,and 92.51%F1_score,which has higher accuracy compared with other models. 展开更多
关键词 named entity recognition chinese electronic medical records data masking principal component analysis regular expression
在线阅读 下载PDF
A U-Shaped Network-Based Grid Tagging Model for Chinese Named Entity Recognition
6
作者 Yan Xiang Xuedong Zhao +3 位作者 Junjun Guo Zhiliang Shi Enbang Chen Xiaobo Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第6期4149-4167,共19页
Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or d... Chinese named entity recognition(CNER)has received widespread attention as an important task of Chinese information extraction.Most previous research has focused on individually studying flat CNER,overlapped CNER,or discontinuous CNER.However,a unified CNER is often needed in real-world scenarios.Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER.Nevertheless,how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge.In this study,we enhance the character-pair grid representation by incorporating both local and global information.Significantly,we introduce a new approach by considering the character-pair grid representation matrix as a specialized image,converting the classification of character-pair relationships into a pixel-level semantic segmentation task.We devise a U-shaped network to extract multi-scale and deeper semantic information from the grid image,allowing for a more comprehensive understanding of associative features between character pairs.This approach leads to improved accuracy in predicting their relationships,ultimately enhancing entity recognition performance.We conducted experiments on two public CNER datasets in the biomedical domain,namely CMeEE-V2 and Diakg.The results demonstrate the effectiveness of our approach,which achieves F1-score improvements of 7.29 percentage points and 1.64 percentage points compared to the current state-of-the-art(SOTA)models,respectively. 展开更多
关键词 chinese named entity recognition character-pair relation classification grid tagging U-shaped segmentation network
在线阅读 下载PDF
End End-to to-End Chinese Entity Recognition Based on BERT BERT-BiLSTM BiLSTM-ATT ATT-CRF 被引量:3
7
作者 LI Daiyi TU Yaofeng +2 位作者 ZHOU Xiangsheng ZHANG Yangming MA Zongmin 《ZTE Communications》 2022年第S01期27-35,共9页
Traditional named entity recognition methods need professional domain knowl-edge and a large amount of human participation to extract features,as well as the Chinese named entity recognition method based on a neural n... Traditional named entity recognition methods need professional domain knowl-edge and a large amount of human participation to extract features,as well as the Chinese named entity recognition method based on a neural network model,which brings the prob-lem that vector representation is too singular in the process of character vector representa-tion.To solve the above problem,we propose a Chinese named entity recognition method based on the BERT-BiLSTM-ATT-CRF model.Firstly,we use the bidirectional encoder representations from transformers(BERT)pre-training language model to obtain the se-mantic vector of the word according to the context information of the word;Secondly,the word vectors trained by BERT are input into the bidirectional long-term and short-term memory network embedded with attention mechanism(BiLSTM-ATT)to capture the most important semantic information in the sentence;Finally,the conditional random field(CRF)is used to learn the dependence between adjacent tags to obtain the global optimal sentence level tag sequence.The experimental results show that the proposed model achieves state-of-the-art performance on both Microsoft Research Asia(MSRA)corpus and people’s daily corpus,with F1 values of 94.77% and 95.97% respectively. 展开更多
关键词 named entity recognition(ner) feature extraction BERT model BiLSTM at-tention mechanism CRF
在线阅读 下载PDF
Positive unlabeled named entity recognition with multi-granularity linguistic information
8
作者 Ouyang Xiaoye Chen Shudong Wang Rong 《High Technology Letters》 EI CAS 2021年第4期373-380,共8页
The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language inform... The research on named entity recognition for label-few domain is becoming increasingly important.In this paper,a novel algorithm,positive unlabeled named entity recognition(PUNER)with multi-granularity language information,is proposed,which combines positive unlabeled(PU)learning and deep learning to obtain the multi-granularity language information from a few labeled in-stances and many unlabeled instances to recognize named entities.First,PUNER selects reliable negative instances from unlabeled datasets,uses positive instances and a corresponding number of negative instances to train the PU learning classifier,and iterates continuously to label all unlabeled instances.Second,a neural network-based architecture to implement the PU learning classifier is used,and comprehensive text semantics through multi-granular language information are obtained,which helps the classifier correctly recognize named entities.Performance tests of the PUNER are carried out on three multilingual NER datasets,which are CoNLL2003,CoNLL 2002 and SIGHAN Bakeoff 2006.Experimental results demonstrate the effectiveness of the proposed PUNER. 展开更多
关键词 named entity recognition(ner) deep learning neural network positive-unla-beled learning label-few domain multi-granularity(PU)
在线阅读 下载PDF
无人机故障诊断NER数据集构建及模型应用
9
作者 贾龙飞 李志农 +1 位作者 王奉涛 李喆 《兵器装备工程学报》 北大核心 2025年第8期45-52,共8页
针对无人机故障诊断领域缺乏专用NER数据集的现状,提出构建该垂直领域NER数据集及其命名实体识别模型的方案。依据无人机故障诊断领域文本数据的特点,创建了包含5677个领域专用名词的词典辅助分词操作,并采用中文标签进行标注。通过机... 针对无人机故障诊断领域缺乏专用NER数据集的现状,提出构建该垂直领域NER数据集及其命名实体识别模型的方案。依据无人机故障诊断领域文本数据的特点,创建了包含5677个领域专用名词的词典辅助分词操作,并采用中文标签进行标注。通过机器标注人工校对与人工标注相结合的方式,成功构建出包含235045个字符及38421个实体的无人机故障诊断领域命名实体识别数据集,命名为UFDNER。结合预训练语言模型BERT与BiLSTM-CRF方法训练得到基于该数据集的命名实体识别模型,该模型在测试集上的F1值达到87.84%,为该领域故障信息识别及知识图谱构建提供强有力的工具模型。UFDNER作为无人机故障诊断领域NER数据集,为该领域NER研究提供丰富可靠的数据集支撑,填补了无人机故障诊断领域NER数据集空白。 展开更多
关键词 无人机故障诊断 ner数据集 命名实体识别 预训练模型 BiLSTM-CRF
在线阅读 下载PDF
一种集成NER的文本分类特征选择方法 被引量:3
10
作者 施德明 林洋港 陈恩红 《计算机工程与科学》 CSCD 2007年第11期152-156,共5页
文本分类是将自由文本自动划分到若干预先定义类别的方法,在信息检索等领域有很重要的作用。其中,如何选择有效的文本特征是影响文本分类器分类性能的一个重要步骤。很多应用中需要处理的文本信息包含了很多的命名实体,如某个行业的名人... 文本分类是将自由文本自动划分到若干预先定义类别的方法,在信息检索等领域有很重要的作用。其中,如何选择有效的文本特征是影响文本分类器分类性能的一个重要步骤。很多应用中需要处理的文本信息包含了很多的命名实体,如某个行业的名人,往往能够在很大程度上影响着文本所属的类别。然而,现阶段的文本特征方法都只利用关键词的统计意义,而没有考虑关键词作为命名实体所含有的分类特征。针对这一问题,本文提出了一种将命名实体识别方法NER集成到文本分类特征选择中的方法,在保留关键词统计特征之外,还保留了单词作为命名实体的分类特征。实验结果表明,相对于其他特征选择方法而言,本文提出的方法在一定程度上提高了文本分类的分类准确率。 展开更多
关键词 命名实体识别 命名实体 特征选择 文本分类 隐马尔可夫模型
在线阅读 下载PDF
FCG-NNER:一种融合字形信息的中文嵌套命名实体识别方法 被引量:5
11
作者 陈鹏 马洪彬 +2 位作者 周佳伦 李琳宇 余肖生 《重庆理工大学学报(自然科学)》 CAS 北大核心 2023年第12期222-231,共10页
基于跨度的模型是嵌套命名实体识别的主要方法,其核心是将实体识别问题转化为跨度分类问题。而在中文数据集中,由于中文单词不具有明显的分割符号,导致语义和边界信息不明确,进而造成中文嵌套命名实体识别效果不佳。为了解决这一问题,... 基于跨度的模型是嵌套命名实体识别的主要方法,其核心是将实体识别问题转化为跨度分类问题。而在中文数据集中,由于中文单词不具有明显的分割符号,导致语义和边界信息不明确,进而造成中文嵌套命名实体识别效果不佳。为了解决这一问题,提出了融合字形信息的基于跨度的中文嵌套命名实体识别算法——FCG-NNER,首先通过卷积神经网络获取汉字的字形信息,其次通过交叉Biaffine双仿射解码层实现原文信息与字形信息融合,然后通过对角融合CNN层获取不同跨度之间的局部相互作用,最后将交叉Biaffine双仿射解码层的输出与对角融合CNN层的输出相加后输入到全连接层中,得到最终的预测结果。采用2个具有代表性的中文嵌套NER数据集(CMeEE和CLUENER2020)用于实验验证。结果显示,FCG-NNER在CMeEE数据集中的精度为65.02%,召回率为67.93%,F1值达到0.664 4;在CLUENER2020数据集中的精度为79.45%,召回率为82.33%,F1值达到0.808 6,证明FCG-NNER算法的性能明显超过2个数据集的基线。 展开更多
关键词 中文嵌套命名实体识别 字形特征 跨度分类 特征融合
在线阅读 下载PDF
基于BERT多知识图融合嵌入的中文NER模型 被引量:3
12
作者 张凤荔 黄鑫 +2 位作者 王瑞锦 周志远 韩英军 《电子科技大学学报》 EI CAS CSCD 北大核心 2023年第3期390-397,共8页
针对目前特定领域知识图谱构建效率低、领域已有知识图谱利用率不足、传统模型提取领域语义专业性强实体困难的问题,提出了基于BERT多知识图融合嵌入的中文NER模型(BERT-FKG),实现了对多个知识图通过融合语义进行实体间属性共享,丰富了... 针对目前特定领域知识图谱构建效率低、领域已有知识图谱利用率不足、传统模型提取领域语义专业性强实体困难的问题,提出了基于BERT多知识图融合嵌入的中文NER模型(BERT-FKG),实现了对多个知识图通过融合语义进行实体间属性共享,丰富了句子嵌入的知识。该模型在开放域和医疗领域的中文NER任务中,表现出了更好的性能。实验结果表明,多个领域知识图通过计算语义相似度进行相似实体的属性共享,能够使模型吸纳更多的领域知识,提高在NER任务中的准确率。 展开更多
关键词 BERT 中文命名实体识别 医疗领域 多知识图融合嵌入
在线阅读 下载PDF
RIB-NER:基于跨度的中文命名实体识别模型 被引量:2
13
作者 田红鹏 吴璟玮 《计算机工程与科学》 CSCD 北大核心 2024年第7期1311-1320,共10页
命名实体识别是自然语言处理领域中诸多下游任务的重要基础。汉语作为重要的国际语言,在许多方面具有独特性。传统上,中文命名实体识别任务模型使用序列标记机制,该机制需要条件随机场捕获标签的依赖性,然而,这种方法容易出现标签的错... 命名实体识别是自然语言处理领域中诸多下游任务的重要基础。汉语作为重要的国际语言,在许多方面具有独特性。传统上,中文命名实体识别任务模型使用序列标记机制,该机制需要条件随机场捕获标签的依赖性,然而,这种方法容易出现标签的错误分类。针对这个问题,提出基于跨度的命名实体识别模型RIB-NER。首先,以RoBERTa-wwm-ext作为模型嵌入层,提供字符级嵌入,以获得更多的上下文语义信息和词汇信息。其次,利用IDCNN的并行卷积核来增强词之间的位置信息,从而使词与词之间联系更加紧密。同时,在模型中融合BiLSTM网络来获取上下文信息。最后,采用双仿射模型对句子中的开始标记和结束标记评分,使用这些标记探索跨度。在MSRA和Weibo 2个语料库上的实验结果表明,RIB-NER能够较为准确地识别实体边界,并分别获得了95.11%和73.94%的F1值。与传统深度学习相比,有更好的识别效果。 展开更多
关键词 中文命名实体识别 双仿射模型 迭代膨胀卷积神经网络 预训练模型 跨度
在线阅读 下载PDF
一种面向特定信息领域的大模型命名实体识别方法
14
作者 李永斌 刘楝 郑杰 《电子与信息学报》 北大核心 2026年第2期662-672,共11页
在特定信息领域,尤其是开源信息领域,传统模型命名实体识别面临缺乏充足标注数据、难以满足复杂信息抽取任务等困难。该文聚焦开源信息领域,提出一种基于大语言模型的命名实体识别方法,旨在通过大语言模型强大的语义推理能力准确理解复... 在特定信息领域,尤其是开源信息领域,传统模型命名实体识别面临缺乏充足标注数据、难以满足复杂信息抽取任务等困难。该文聚焦开源信息领域,提出一种基于大语言模型的命名实体识别方法,旨在通过大语言模型强大的语义推理能力准确理解复杂的抽取要求,并自动完成抽取任务。通过指令微调和利用检索增强生成将专家知识融入模型,结合问题回归模块,使低参数通用型大模型基座能够快速适应开源信息这一特定领域,形成领域专家模型。实验结果表明,仅需少量的成本,便能构建一个高效的领域专家系统,为开源信息领域的命名实体识别提供了一种更为有效的解决方案。 展开更多
关键词 大语言模型 命名实体识别 指令微调 检索增强生成 知识库
在线阅读 下载PDF
软注意力掩码嵌入下中文命名实体识别算法
15
作者 王秀慧 徐永波 《吉林大学学报(工学版)》 北大核心 2026年第1期231-238,共8页
中文词汇的语义存在一定的模糊性,在中文文本中,存在一些与命名实体识别相关性较低的特征,同一个词汇在不同语境中具有不同的含义,不同的词汇和短语对命名实体的识别具有不同的贡献度,若不进行加权或掩码操作,这些特征则会干扰模型的识... 中文词汇的语义存在一定的模糊性,在中文文本中,存在一些与命名实体识别相关性较低的特征,同一个词汇在不同语境中具有不同的含义,不同的词汇和短语对命名实体的识别具有不同的贡献度,若不进行加权或掩码操作,这些特征则会干扰模型的识别准确率。为此,本文提出一种软注意力掩码嵌入的中文命名实体识别(CNER)算法。首先,建立多层次CNER模型,在模型的词向量表示层,借助jieba技术对输入层传递过来的中文文本进行分词处理,并利用Word2Vec方法获取各词汇的词向量,形成词向量序列。其次,在BiLSTM层对词向量序列进行双向长短期记忆处理,得到每个词向量对应的融合了前后文信息的特征向量。再次,在BiLSTM层后嵌入一个软注意力掩码模块,利用该模块的软注意力机制对BiLSTM层输出的特征向量进行加权和掩码操作,关注对实体识别有重要贡献的特征,去除和抑制不重要的特征,提高识别的精度。最后,在条件随机场(CRF)层对经过软注意力掩码模块处理后的特征向量进行标签标注与解码,从而得到最佳实体标签序列,该序列即为中文命名实体识别结果。实验结果表明,该算法可以精准识别中文命名实体,在实体标签标注覆盖性和F1值方面均有较好的表现。 展开更多
关键词 中文命名 软注意力 实体识别 掩码操作 Word2Vec BiLSTM模型
原文传递
基于增强跨度信息表示的中文命名实体识别
16
作者 杨力益 邢树礼 毛国君 《计算机工程与应用》 北大核心 2026年第5期263-271,共9页
命名实体识别是自然语言处理领域中的一项基本任务。以往的中文命名实体识别方法大多未能充分利用文本跨度本身蕴含的语义信息,导致特征提取不足,影响模型识别效果。此外,邻近跨度间的关系也常未得到重视。为解决上述问题,提出一种基于... 命名实体识别是自然语言处理领域中的一项基本任务。以往的中文命名实体识别方法大多未能充分利用文本跨度本身蕴含的语义信息,导致特征提取不足,影响模型识别效果。此外,邻近跨度间的关系也常未得到重视。为解决上述问题,提出一种基于增强跨度信息表示的中文命名实体识别方法。该方法包含两个核心模块:跨度筛选器负责判别实体与非实体,其使用嵌入位置信息的首、尾词元特征表示向量来计算评分;跨度分类器使用融合边界与内部信息的跨度信息表示,按实体类型计算跨度评分,同时辅以单个二维卷积层作跨度间交互从而校正评分。两个模块的输出评分之和用于确定每个跨度的预测结果。该法在Resume、MSRA、CLUENER2020和CMeEEV2四个中文命名实体识别任务上的F1值分别达到了96.71%、96.15%、81.88%和75.28%。消融实验结果验证了各个组件的有效性。 展开更多
关键词 中文命名实体识别 实体抽取 跨度信息表示 跨度间交互
在线阅读 下载PDF
基于Biaffine机制和词汇增强的中文命名实体识别方法
17
作者 张润梅 王明曦 陈中 《计算机应用研究》 北大核心 2026年第1期120-128,共9页
针对中文命名实体识别任务中存在的实体边界模糊、结构复杂及专业领域数据稀缺等问题,提出了一种基于Biaffine机制和词汇增强的中文命名实体识别模型WLASC。在模型的编码层中设计了动态双仿射模块和多级词汇增强模块,通过引入相对位置... 针对中文命名实体识别任务中存在的实体边界模糊、结构复杂及专业领域数据稀缺等问题,提出了一种基于Biaffine机制和词汇增强的中文命名实体识别模型WLASC。在模型的编码层中设计了动态双仿射模块和多级词汇增强模块,通过引入相对位置编码和双仿射变换增强上下文建模能力,有效解决实体边界模糊问题;同时利用多级词汇信息和多头注意力机制加权融合不同的粒度特征,提升嵌套实体识别精度并减少对标注数据的依赖。此外,采用双向门控循环神经网络对提取特征进行融合以增强模型表达能力。在民航飞行安全领域数据集CANER及公共数据集Weibo、Resume上的实验结果表明,改进后的算法F 1值最高分别提升9.77%、6.97%和1.38%,最低提升2.72%、1.27%和0.31%。在CANER数据集上的实验证明,该模型可有效解决中文特殊领域中结构复杂以及专业术语的实体识别,在公共数据集上的实验表明,该模型具有较好的泛化能力。 展开更多
关键词 中文命名实体识别 多级词汇增强 Biaffine机制 特征融合
在线阅读 下载PDF
中文医疗命名实体识别研究
18
作者 沈锐 《科技风》 2026年第2期58-60,共3页
中文医疗命名实体识别是人工智能和医疗领域深度融合形成的跨学科应用技术,其目的是从中文医疗记录中识别并归类与医疗相关的实体名称至预定义类别,是对医疗记录进行数据挖掘与信息抽取的前提。本文对中文医疗命名实体识别技术进行了介... 中文医疗命名实体识别是人工智能和医疗领域深度融合形成的跨学科应用技术,其目的是从中文医疗记录中识别并归类与医疗相关的实体名称至预定义类别,是对医疗记录进行数据挖掘与信息抽取的前提。本文对中文医疗命名实体识别技术进行了介绍,梳理了基于深度学习方法的相关模型及其创新点,总结了该领域常用的数据集和评估指标,最后针对其未来研究方向提出了一些观点,为后续的研究提供了参考思路。 展开更多
关键词 中文医疗命名实体识别 人工智能 深度学习
在线阅读 下载PDF
FedCE:A Contrast Enhancement Federated Learning Method for Heterogeneous Medical Named Entity Recognition
19
作者 Kai Chang Hailong Sun +6 位作者 Jindou Wan Naiqian Zhang Yiming Liu Kuo Yang Zixin Shu Jianan Xia Xuezhong Zhou 《Tsinghua Science and Technology》 2025年第6期2384-2398,共15页
Medical Named Entity Recognition(NER)plays a crucial role in attaining precise patient portraits as well as providing support for intelligent diagnosis and treatment decisions.Federated Learning(FL)enables collaborati... Medical Named Entity Recognition(NER)plays a crucial role in attaining precise patient portraits as well as providing support for intelligent diagnosis and treatment decisions.Federated Learning(FL)enables collaborative modeling and training across multiple endpoints without exposing the original data.However,the statistical heterogeneity exhibited by clinical medical text records poses a challenge for FL methods to support the training of NER models in such scenarios.We propose a Federated Contrast Enhancement(FedCE)method for NER to address the challenges faced by non-large-scale pre-trained models in FL for labelheterogeneous.The method leverages a multi-view encoder structure to capture both global and local semantic information,and employs contrastive learning to enhance the interoperability of global knowledge and local context.We evaluate the performance of the FedCE method on three real-world clinical record datasets.We investigate the impact of factors,such as pooling methods,maximum input text length,and training rounds on FedCE.Additionally,we assess how well FedCE adapts to the base NER models and evaluate its generalization performance.The experimental results show that the FedCE method has obvious advantages and can be effectively applied to various basic models,which is of great theoretical and practical significance for advancing FL in healthcare settings. 展开更多
关键词 Federated Learning(FL) contrast enhancement heterogeneous data named entity recognition(ner)
原文传递
Generating Chinese named entity data from parallel corpora 被引量:2
20
作者 Ruiji FU Bing QIN Ting LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第4期629-641,共13页
Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from para... Annotating named entity recognition (NER) training corpora is a costly but necessary process for supervised NER approaches. This paper presents a general framework to generate large-scale NER training data from parallel corpora. In our method, we first employ a high performance NER system on one side of a bilingual corpus. Then, we project the named entity (NE) labels to the other side according to the word level alignments. Finally, we propose several strategies to select high-quality auto-labeled NER training data. We apply our approach to Chinese NER using an English-Chinese parallel corpus. Experimental results show that our approach can collect high-quality labeled data and can help improve Chinese NER. 展开更多
关键词 named entity recognition chinese named entity training data generating parallel corpora
原文传递
上一页 1 2 21 下一页 到第
使用帮助 返回顶部