期刊文献+

基于LDA模型和文本聚类的水族文献主题挖掘研究 被引量:16

Research on the Shui Literature Topic Mining Based on LDA Model and Text Clustering
在线阅读 下载PDF
导出
摘要 针对传统民族文献主题识别不精准,缺乏深层次语义理解等问题,提出一种基于LDA模型和文本聚类的水族文献主题挖掘算法。通过Python抓取中国知网水族文献990篇,利用LDA模型挖掘水族文献主题分布特征,融合水族特征词典进行文本聚类,并挖掘出水族文化、体育、音乐、医学和水书五大主题的关键词,通过准确率、召回率和F特征值进行实验评估。实验结果表明,该方法有效地挖掘出水族文献主题关键词及热门研究领域,使得水族文献的主题脉络更加清晰,为下一步水族引文分析和数字化保护民族文献提供帮助,具有一定的应用前景和实用价值。 Aiming at the inaccurate recognition of traditional national literature topics and the lack of deep semantic understanding,proposes a Shui literature mining algorithm based on LDA model and text clustering.Grabs 990 Shui literature from CNKI by Python,uses the LDA model to explore the distribution characteristics of Shui literature,integrates the feature dictionary for text clustering,and excavates five key themes of Shui culture,sports,music,medicine and Shui word.Carries out experimental evaluation by precision,recall and F-measure.The experimental results show that the method proposed effectively mines the topic keywords and popular research fields of Shui literature,which makes the theme of Shui literature more clear,and provides help for the next step of citation analysis and digital protection of national literature.It has certain application prospects and practical value.
作者 杨秀璋 YANG Xiu-zhang(School of Information,Guizhou University of Finance and Economics,Guiyang 550025)
出处 《现代计算机》 2019年第5期13-17,共5页 Modern Computer
基金 贵州省教育厅青年科技人才成长项目(黔教合KY字[2016]172)
关键词 LDA模型 文本聚类 水族文献 主题挖掘 民族研究 LDA Model Text Clustering Shui Literature Topic Mining Ethnic Studies
  • 相关文献

参考文献6

二级参考文献65

  • 1王燕.一种改进的K-means聚类算法[J].计算机应用与软件,2004,21(10):122-123. 被引量:9
  • 2Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 3Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 4Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 5Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 6Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 7Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 8Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 9Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 10Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.

共引文献369

同被引文献204

引证文献16

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部