期刊文献+

基于主题词权重和句子特征的自动文摘 被引量:17

Automatic Text Summarization Based on Thematic Word Weight and Sentence Features
在线阅读 下载PDF
导出
摘要 为获得高质量的自动文摘,在组合词识别算法的基础上,充分考虑词的频率、词性、词的位置、词长等因素,构建了一个词语权重计算公式,该公式能使表达主题的词和短语具有较高的权重.对句子权重的计算,则考虑了句子的内容、位置以及线索词的作用和用户偏好等.摘要的生成充分考虑了候选文摘句的相似性,避免了冗余信息的加入.对摘要的评估进行了从句子粒度到词语粒度的改进,提出了一种基于词语粒度的准确率和召回率计算方法.实验证明,该算法生成的自动文摘有着较高的质量,平均准确率达到77.1%. In order to generate high-quality automatic text summarization,a formula based on the combined word recognition algorithm is presented to compute the weight of words in a text,with the word frequency,part of speech,word position and word length being considered. By using the proposed formula,a thematic word/phrase is assigned great weight,a sentence is weighted according to its content and position,the cue words in it and the user's preference,and the final summarization is generated by fully considering the similarity of candidate sentences,thus avoiding the information redundance. Moreover,the evaluation approach based on the accuracy and the recall of summerization is improved to increase the computing precision of summarization to the word level instead of the sentence level. Experimental results show that the proposed algorithm generates high-quality summaries,with an average precision of 77. 1% .
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第7期50-55,共6页 Journal of South China University of Technology(Natural Science Edition)
基金 广东省自然科学基金资助项目(07006474) 广东省科技攻关项目(2007B010200044)
关键词 主题词 自动文摘 组合词 权重计算 句子特征 thematic word automatic text summarization combined word weight computing sentence feature
  • 相关文献

参考文献14

  • 1Ye Shi-ren,Chua Tat-seng,Kan Min-yen,et al.Document concept lattice for text understanding and summarization[J].Information Processing & Management,2007,43(2):1643-1662.
  • 2Luhn H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
  • 3Edmundson H P.New methods in automatic extracting[J].Journal of the ACM (JACM),1969,16(2):264-285.
  • 4Nomoto T,Matsumoto Y.A new approach to unsupervised text summarization[C] ∥Proceedings of ACM SIGIR'01.New York:Idea Group Publishing,2001:26-34.
  • 5Furnas G W,Landauer T K,Gomez L M,et al.The vocabulary problem in human-system communication[J].Communications of the ACM,1987,30(11):964-971.
  • 6李蕾,钟义信,郭祥昊.面向特定领域的理解型中文自动文摘系统[J].计算机研究与发展,2000,37(4):493-497. 被引量:14
  • 7王志琪,王永成,刘传汉.基于互增强关系的自动文摘句子加权方法[J].上海交通大学学报,2007,41(8):1297-1300. 被引量:6
  • 8Chen Zhi-min,Shen Jie.Research on query-based automatic summarization of webpage[C] ∥Proceedings of Computing,Communication,Control,and Management 2009.Sanya:IEEE CPS,2009:173-176.
  • 9陈建超,郑启伦,李庆阳,严桂夺.基于词序列频率有向网的中文组合词提取算法[J].计算机应用研究,2009,26(10):3746-3749. 被引量:6
  • 10Institute of computing technology chinese academy of sciences.ICTCLAS2009[EB/OL].[2009-04-06].http://ictclas.org/.

二级参考文献28

共引文献45

同被引文献190

引证文献17

二级引证文献131

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部