摘要
针对传统民族文献主题识别不精准,缺乏深层次语义理解等问题,提出一种基于LDA模型和文本聚类的水族文献主题挖掘算法。通过Python抓取中国知网水族文献990篇,利用LDA模型挖掘水族文献主题分布特征,融合水族特征词典进行文本聚类,并挖掘出水族文化、体育、音乐、医学和水书五大主题的关键词,通过准确率、召回率和F特征值进行实验评估。实验结果表明,该方法有效地挖掘出水族文献主题关键词及热门研究领域,使得水族文献的主题脉络更加清晰,为下一步水族引文分析和数字化保护民族文献提供帮助,具有一定的应用前景和实用价值。
Aiming at the inaccurate recognition of traditional national literature topics and the lack of deep semantic understanding,proposes a Shui literature mining algorithm based on LDA model and text clustering.Grabs 990 Shui literature from CNKI by Python,uses the LDA model to explore the distribution characteristics of Shui literature,integrates the feature dictionary for text clustering,and excavates five key themes of Shui culture,sports,music,medicine and Shui word.Carries out experimental evaluation by precision,recall and F-measure.The experimental results show that the method proposed effectively mines the topic keywords and popular research fields of Shui literature,which makes the theme of Shui literature more clear,and provides help for the next step of citation analysis and digital protection of national literature.It has certain application prospects and practical value.
作者
杨秀璋
YANG Xiu-zhang(School of Information,Guizhou University of Finance and Economics,Guiyang 550025)
出处
《现代计算机》
2019年第5期13-17,共5页
Modern Computer
基金
贵州省教育厅青年科技人才成长项目(黔教合KY字[2016]172)