期刊文献+

一种改进的TFIDF网页关键词提取方法 被引量:31

AN IMPROVED TFIDF-BASED APPROACH TO EXTRACT KEY WORDS FROM WEB PAGES
在线阅读 下载PDF
导出
摘要 传统TFIDF关键词提取方法虽然实现起来简单,时间复杂度低,但是效果并不理想,难以获得对文本内容起到关键性作用的特征。提出了一种在考虑中文文本结构特征和中文词语词性特征的基础上,借助扩展的同义词词林,利用改进的TFIDF公式来提取的方法。实验结果表明:该方法明显优于传统方法,能够抽取到令人满意的结果。 Although the classical TFIDF-based keywords extraction method is easy to implement and has low time cost,its results are not good enough to acquire features which play key roles in the text content.This paper proposes such a method,it uses the improved TFIDF formula to extract keywords by means of extended synonym dictionary and based on the considerations of the structure feature of Chinese texts and the lexical category feature of Chinese words and phrases.Experimental results show that our method outperforms the classical method evidently,and is able to extract satisfied results.
出处 《计算机应用与软件》 CSCD 2011年第5期25-27,共3页 Computer Applications and Software
基金 国家自然科学基金(90920004 60970056 60873150) 江苏省自然科学基金(BK2008160) 江苏省高校自然科学重大基础研究项目(08KJA520002)
关键词 文本结构 关键词抽取 TFIDF Text structure Keywords extraction Term frequency-inverse document frequency(TFIDF)
  • 相关文献

参考文献9

  • 1Cohen J D.Highlights:Language and Domain-independent Automatic Indexing Terms for Abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174.
  • 2Matsuo Y,Ishizuka M.Keyword Extraction from a Single Document Using Word Co-ocuurrence Statistical Information[J].International Journal on Artificial Intelligence Tools,2004,13(1):157-169.
  • 3Yang Wenfen,Li Xing.Chinese keyword extraction based onmax-duplicated strings of the documents[C] //Proceedings ofthe 25th Annual InternationalACM SIGIR Conference on Re-search and Development in Information Retrieval,2002.
  • 4Chien L F.PAT-tree-based Keyword Extraction for Chinese Information Retrieval[C] //Proc of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR1997),1997:50-58.
  • 5王军.词表的自动丰富——从元数据中提取关键词及其定位[J].中文信息学报,2005,19(6):36-43. 被引量:40
  • 6李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:93
  • 7赵鹏,蔡庆生,王清毅,耿焕同.一种基于复杂网络特征的中文文档关键词抽取算法[J].模式识别与人工智能,2007,20(6):827-831. 被引量:44
  • 8Salton G,Yang C S,Yu C T.A Theory of Term Importance in Automatic Text Analysis[J].Journal of the American society for Information Science,1975,26(1):33-44.
  • 9徐文海,温有奎.一种基于TFIDF方法的中文关键词抽取算法[J].情报理论与实践,2008,31(2):298-302. 被引量:66

二级参考文献29

共引文献208

同被引文献220

引证文献31

二级引证文献238

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部