期刊文献+

融合词汇信息的藏文词性一体化方法

Integrated Tibetan part-of-speech tagging method with lexical information
在线阅读 下载PDF
导出
摘要 针对分词错误可能会影响词性标记的正确性,以及单一音节粒度特征难以全面捕捉上下文信息,从而导致实体边界识别不准确的问题。提出了一种融合词汇信息的藏文自动分词和词性标注联合方法,通过构建词汇向量信息库,对输入BERT后的音节编码与对应的词汇级向量进行融合获取更全面的特征输入,增强了模型对词汇语义的理解。在7万句词性标注数据集上训练了融合藏文音节和词汇特征的BERT+Softlexicon(BiLSTM)+CRF模型。实验结果表明,在7千句测试语料上F1值达到92.74%,相比基线一体化模型和大语言模型分别提高了1.8%和1.9%。 To address the issue where word segmentation errors may compromise the accuracy of part-of-speech tagging,and where features at the single-syllable granularity are insufficient for comprehensively capturing contextual information,leading to imprecise entity boundary recognition,a joint approach integrating lexical information for Tibetan automatic word segmentation and part-of-speech tagging was proposed.A lexical vector information database was constructed to achieve more comprehensive feature input by merging the syllable coding after BERT input with corresponding lexical-level vectors,there by enhancing the model’s understanding of lexical semantics.A BERT+Softlexicon(BiLSTM)+CRF model,which integrates Tibetan syllables and lexical features,was trained on a part-of-speech tagging dataset comprising 70000 sentences.Experimental results demonstrate that on a test corpus of 7000 sentences,the method achieves an F1-score of 92.74%,representing improvements of 1.8%and 1.9%over the baseline integrated model and large language model,respectively.
作者 完么措 华却才让 白颖 环科尤 张瑞 WAN Me-cuo;HUAQUE Cai-rang;BAI Ying;HUAN Ke-you;ZHANG Rui(School of Computer Science,Qinghai Normal University,Xining 810008,China;The State Key Laboratory of Tibetan Intelligence,Qinghai Normal University,Xining 810008,China;Key Laboratory of Tibetan Information Processing,Ministry of Education,Qinghai Normal University,Xining 810008,China)
出处 《计算机工程与设计》 北大核心 2025年第12期3578-3585,共8页 Computer Engineering and Design
基金 国家自然科学基金项目(62166034) 藏语智能信息处理及应用国家重点实验室基金项目(2020-ZJ-Y05)。
关键词 藏文词性标注 标注一体化 词汇增强 大语言模型 BERT 藏文分词 特征融合 Tibetan part-of-speech tagging integrated tagging lexical enhancement large language models BERT Tibetan word segmentation feature fusion
  • 相关文献

参考文献10

二级参考文献63

共引文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部