摘要
科技论文的关键词和被引次数与论文主题具有高相关性,是发现论文重要内容的有力线索。但这两个特征尚未应用于当前面向科技论文的多文档自动摘要方法中,其对科技论文自动摘要效果的影响还有待探索。本文通过设计对比算法和实验,定量分析研究了科技论文的关键词、被引次数两个特征对科技论文自动摘要效果的影响。结果表明:引入关键词因子和被引次数因子能显著提高摘要的效果。其中,同时使用两个因子,对摘要效果的积极影响最为显著;单独使用被引次数因子对摘要效果的积极影响也较为显著,但弱于同时使用两个因子;单独使用关键词因子对摘要效果影响不显著,甚至差于基准组;此外两个因子对摘要规模的变化也较为敏感。
The key words and cited times of scientific papers have a high correlation with their themes, which are a powerful clue for identifying the important content in these papers. However, currently, these two features have not been applied in the multi-document automatic summarization of scientific papers; their influence on the automatic summarization of scientific papers remains to be explored. By designing a comparative algorithm and conducting an experiment, this study quantitatively analyzes the influence of these two features of scientific papers on their auto- matic summarization. The results are as follows: the two factors (key words and cited times) can significantly im- prove the quality of scientific papers' automatic summarization; the most significant positive impact is observed when both factors are simultaneously used; when the cited number factor is applied separately, it yields a significant impact, but it is not as strong as that of using the two factors together; when the key words factor is used separately, no sig- nificant impact is observed, and the results are worse than that of the baseline group; and the two factors are very sensitive to changes in the summarization size.
出处
《情报学报》
CSSCI
CSCD
北大核心
2017年第11期1165-1174,共10页
Journal of the China Society for Scientific and Technical Information
基金
国家社会科学基金项目"非相关网络科技信息的识别及其应用研究"(15CTQ022)
关键词
关键词
被引次数
科技论文
多文档自动摘要
key words
cited times
scientific papers
multi-document automatic summarization