摘要
首先,基于多算法融合的标准文本关键词提取与歧义处理方法基于TF-IDF和TextRank相结合,同时在考虑词语位置、词性、词长与词频的基础上完成标准文本的关键词提取;然后,利用Hanlp对相同文本进行处理,并完成对比歧义处理。通过试验结果分析,该方法对于标准文本的关键词提取与歧义处理的效率提升、处理质量有显著效果,也为大模型结合知识库与智能体开展标准知识挖掘提供一种创新方法。
Firstly,the extraction and ambiguity handling method of standard text keywords based on multi-algorithm fusion combines TF-IDF and TextRank,while considering word position,part of speech,word length,and word frequency to complete the keywords extraction of standard text.Then,it uses Hanlp to process the same text and complete the contrastive ambiguity processing.Through the analysis of experimental results,this method has a significant effect on improving the efficiency and processing quality of keywords extraction and ambiguity handling in standard texts.It also provides an innovative approach for large models to conduct standard knowledge mining by combining knowledge bases with intelligent agents.
作者
付振秋
田辉
FU Zhenqiu;TIAN Hui(Information and Communication Integration Innovation Research Center,China Academy of Information and Communications Technology,Beijing 100191,China;Taier Rongchuang(Beijing)Technology Co.,Ltd.,Beijing 100191,China)
出处
《信息通信技术与政策》
2025年第2期87-96,共10页
Information and Communications Technology and Policy
关键词
标准文本
关键词
提取
歧义
standard text
keywords
extraction
ambiguity(