期刊文献+

基于综合方法的文本主题句的自动抽取 被引量:16

A Synthesized Method of Extracting Subject Sentences from Text
在线阅读 下载PDF
导出
摘要 提出了基于综合方法的主题句的提取方法,并着重讨论了文本主题概念的提取以及相应的权值体系.根据概念间的相互关系,对同义概念进行语义归并,对上下位概念进行语义聚焦,模拟人工标引专家在分析文本主题时的“兼顾各个方面的主题,同时又有所侧重”的原则.在调整文本主题上下位概念的权值时,既考虑下位概念对上位概念的增强作用,同时又考虑到这种调整不影响整个文本主题的分布,从而更精确地抽取出文本的主题概念.采用多种权重度量方式,综合评估句子反映主题的价值.在此基础上,采用主题句选择算法将文本的主题数与所抽取的主题句的数量关联在一起,保证每一个主要的主题都有对应的主题句被选中,并解决主题句的去重问题,从而进一步提高所抽出主题句的主题覆盖性和概括性. An extraction method for subject sentences of text was put forward. The new method is mainly based on concept equivalence and hierarchies. The emphasis is put on the synonymy relation and hyponymy/hypernymy relation among terms. The concepts and the relationships among them are denoted and organized in the concept base designed for automatic text processing and the prototype system was implemented. Based on concept analysis, the weight of subject terms will be recalculated. The adjusted weight accurately reflects the contribution of term to document topic. Sentences are ranked by a weighted combination of varions metrics. The nonredundant key sentences will be extracted by the rotary choice algorithm which trades off between maximal topic coverage and minimal redundancy of key sentences. The algorithm tries to achieve simultaneously the following goals: (l) Choose the most significant sentences as topic sentences for each topic; (2) Remove the redundant sentences from candidates of topic sentences.
出处 《上海交通大学学报》 EI CAS CSCD 北大核心 2006年第5期771-774,782,共5页 Journal of Shanghai Jiaotong University
基金 国家高技术研究发展计划(863)项目(2002AA119050)
关键词 主题句 主题抽取 文本压缩 subject sentence subject extraction text compressing
  • 相关文献

参考文献8

  • 1Sunayama W, Yachida M. Panoramic view system for extracting key sentences based on viewpoints and application to a search engine[J]. Journal of Network and Computer Applications, 2005, 28(2):115 - 127.
  • 2王继成,武港山,周源远,张福炎.一种篇章结构指导的中文Web文档自动摘要方法[J].计算机研究与发展,2003,40(3):398-405. 被引量:43
  • 3顾慧翔,俞勇.基于领域本体和知识推理的语义互联网应用[J].上海交通大学学报,2004,38(4):583-585. 被引量:14
  • 4Halteren H V. New feature sets for summarization by sentence extraction[J]. IEEE Intelligent Systems,2003, 18(4):34-42.
  • 5Morris A, Kasper G, Adams D. The effects and limitations of automated text condensing on reading comprehension performance [J]. Information Systems Research, 1992, 3(1):17-35.
  • 6Goldstein J, Kantrowitz M, Mittal V, et al.Summarizing text documents: Sentence selection and cvaluation metrics[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Berkeley, CA:ACM, 1999:121-128.
  • 7王建会,胡运发,李荣陆.自适应确定摘要长度[J].计算机研究与发展,2004,41(3):399-406. 被引量:3
  • 8Jing H, Barzily R, McKeown K, etal. Summarization evaluation methods experiments and analysis[C]//AAAI Intelligent Text Summarization Workshop. Stanford, CA: AAAI, 1998:60-68.

二级参考文献27

  • 1Martin P. Knowledge representation in CGLF,CGIF, KIF, frame-CG and formalized-english[A].Proceedings of ICCS'02 10th International Conference on Conceptual Structures [C]. Borovets, Bulgaria:LNAI, 2002. 77-91.
  • 2Fensel D, Decker S, Erdmann M, et al. Ontobroker:how to make the WWW intelligent[DB/OL]. http://www. aifb. uni-karlsruhe. de/WBS/broker/ontobroker. html, 1998-04.
  • 3Lassila O, Swick R R. Resource description framework(RDF) model and syntax specification[DB/OL].http://www. w3. org/TR / 1999/REC-rdf-syntax19990222/, 1999-02-22.
  • 4Brickley D, Guha R V. RDF Vocabulary description language 1.0: RDF schema [DB/OL]. http://www.w3. org/TR/rdf-schema, 2002.
  • 5Stumme G. Using ontologies and formal concept analysis for organizing business knowledge [DB/OL].http: //citeseer. nj. nec. com/stummer01using. html,2001.
  • 6Madhavan J, Bernstein P A, Domingos P, et al. Representing and reasoning about mappings between domain models [DB/OL]. http://citeseer. nj. nec. com/madhavan02representing. html, 2002.
  • 7Stumme G, Maedche A. FCA-merge: bottom-up merging of ontologies[A]. Proc 17th Intl Conf on Artificial Intelligence[C]. Seattle, W A, USA:IJCAI,2001. 225-230.
  • 8Calejo M. InterProlog: a declarative Java-Prolog interface [DB/OL]. http://www. cs. ucy. ac. cy/compulog/dec98updata/projects/interprolog. btm, 2001-12-17.
  • 9J G Carbonell et al. The use of MMR and diversity-based reranking for reordering documents and producing summaries. The SIGIR-98,Melbourne, Australia, 1998
  • 10Strzalkowski et al. A robust practical text summarization system. The Intelligent Text Summarization Workshop, Stanford, CA, 1998

共引文献55

同被引文献154

引证文献16

二级引证文献106

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部