预训练模型BERT(bidirectional encoder representation from transformers)因其卓越的性能在中文命名实体识别任务中取得了显著成果,但BERT在处理中文文本时,未充分考虑词汇信息。为克服这一局限,文章提出了一种基于LEBERT(lexicon enh...预训练模型BERT(bidirectional encoder representation from transformers)因其卓越的性能在中文命名实体识别任务中取得了显著成果,但BERT在处理中文文本时,未充分考虑词汇信息。为克服这一局限,文章提出了一种基于LEBERT(lexicon enhanced BERT)的中文命名实体识别方法,结合BiLSTM(bidirectional long short-term memory)和CRFF(conditional random fi eld)模型,进一步提升识别性能。LEBERT在预训练阶段通过引入词汇嵌入,使得模型能够更好地捕捉词汇的语义信息;BiLSTM用于捕捉序列数据中的双向依赖关系;CRF层则用于解码最优的标签序列,不仅考虑到标签之间的转移概率,还避免了非法实体的出现。实验结果表明,该方法在Weibo、Resume、OntoNotes数据集分别取得71.57%、96.53%、82.04%的F1值,优于其他主流方法。展开更多
In the current Chinese NER task,the language model especially has problems such as the accuracy of Chinese boundary entity recognition and the insufficient learning of Chinese character vocabulary information during th...In the current Chinese NER task,the language model especially has problems such as the accuracy of Chinese boundary entity recognition and the insufficient learning of Chinese character vocabulary information during the train-ing process.This article proposes an entity recognition model LEBERT-IDGRU-CRF based on BERT and introducing external dictionaries for training.The model performs lexical matching on the data text through an external dictionary to con-struct word pairs,and then passes the vector matrix to the feature extraction layer,which introduces an attention mechanism for further extraction.Through com-parative experiments on four data sets,the model results were improved and the feasibility of the model was verified.展开更多
文摘预训练模型BERT(bidirectional encoder representation from transformers)因其卓越的性能在中文命名实体识别任务中取得了显著成果,但BERT在处理中文文本时,未充分考虑词汇信息。为克服这一局限,文章提出了一种基于LEBERT(lexicon enhanced BERT)的中文命名实体识别方法,结合BiLSTM(bidirectional long short-term memory)和CRFF(conditional random fi eld)模型,进一步提升识别性能。LEBERT在预训练阶段通过引入词汇嵌入,使得模型能够更好地捕捉词汇的语义信息;BiLSTM用于捕捉序列数据中的双向依赖关系;CRF层则用于解码最优的标签序列,不仅考虑到标签之间的转移概率,还避免了非法实体的出现。实验结果表明,该方法在Weibo、Resume、OntoNotes数据集分别取得71.57%、96.53%、82.04%的F1值,优于其他主流方法。
基金supported by the Center for Language Education and Cooperation Commissioning Projects(22YHXZ1011)the Construction Project of Teaching Quality and Teaching Reform Project for Undergraduate Universities in Guangdong Province.
文摘In the current Chinese NER task,the language model especially has problems such as the accuracy of Chinese boundary entity recognition and the insufficient learning of Chinese character vocabulary information during the train-ing process.This article proposes an entity recognition model LEBERT-IDGRU-CRF based on BERT and introducing external dictionaries for training.The model performs lexical matching on the data text through an external dictionary to con-struct word pairs,and then passes the vector matrix to the feature extraction layer,which introduces an attention mechanism for further extraction.Through com-parative experiments on four data sets,the model results were improved and the feasibility of the model was verified.