期刊文献+

面向概括性小文本的文本分割算法 被引量:1

Text Segmentation Algorithm Oriented to Small General-text
在线阅读 下载PDF
导出
摘要 文本分割是自然语言文本处理的一项重要研究内容。该文针对现有模型无法有效分割概括性小文本的不足,提出基于隐马尔可夫模型的统计算法。该算法利用小文本中各结构块的长度及词汇信息,对概括性小文本进行同一主题不同论述侧面的分割。对发射概率设计了基于句群和基于分割点2种不同的计算方法。以Medline摘要为样本进行的实验表明,该算法对概括性小文本分割是有效的,明显好于经典的TextTiling算法。 Text segmentation is an important filed in the area of natural language processing. However, there is a defect that the existing models cannot effectively segment small general-text. For the reason, an algorithm based on Hidden Markov Model(HMM) is proposed in this paper. The algorithm segments a small general-text with a single topic into its different aspects of discussion using the length distribution of every structure block and the terms. Two methods are designed for computing symbol emission probabilities of the HMM, one of them is based on sentence group while the other is based on segmentation point. Experiments on Medline abstracts show that the effect of the algorithm proposed is much better than the TextTiling algorithm.
出处 《计算机工程》 CAS CSCD 北大核心 2008年第22期43-45,共3页 Computer Engineering
基金 国家自然科学基金资助项目(60473071) 高等学校博士学科点专项科研基金资助项目(20020610007) 四川大学计算机学院青年基金资助项目
关键词 文本分割 概括性小文本 隐马尔可夫模型 边界识别 相似性度量 text segmentation small general-text Hidden Markov Model(HMM) boundary recognition similarity metric
  • 相关文献

参考文献6

  • 1Beeferman D, Berger A, Lafferty J. Statistical Models for Text Segmentation[J]. Machine Leraning, 1999, 34(1-3): 177-210.
  • 2Hearst M A. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages[J]. Computational Linguistics, 1997, 23(1): 33-64.
  • 3Passoneau R J, Litman D J. Discourse Segmentation by Human and Automated Means[J]. Computational Linguistics, 1997, 23(1): 103-139.
  • 4石晶,戴国忠.基于PLSA模型的文本分割[J].计算机研究与发展,2007,44(2):242-248. 被引量:25
  • 5朱靖波,叶娜,罗海涛.基于多元判别分析的文本分割模型[J].软件学报,2007,18(3):555-564. 被引量:15
  • 6Fragkou P, Petridis V, Kehagias A. A Dynamic Programming Algorithm for the Segmentation of Greek Texts[C]//Proceedings of the CONSOLE XII Conference.[S. l.]: IEEE Press, 2003.

二级参考文献41

  • 1Salton G,Singhal A,Buckley C,Mitra M.Automatic text decomposition using text segments and text themes.In:Bernstein M,Carr L,Osterbye K,eds.Proc.of the 7th ACM Conf.on Hypertext.New York:ACM Press,1996.53-65.
  • 2Hearst MA.TextTiling:Segmenting text into multi-paragraph subtopic passages.Computational Linguistics,1997,23(1):33-64.
  • 3Morris J,Hirst G.Lexical cohesion computed by thesauri relations as an indicator of the structure of text.Computational Linguistics,1991,17(1):21-42.
  • 4Kozima H.Text segmentation based on similarity between words.In:Proc.Of the 31st Annual Meeting of the Association for Computational Linguistics.1993.286-288.Http://acl.ldc.upenn.edu/P/P93/P931041.pdf
  • 5Passoneau RJ,Litman DJ.Intention-Based segmentation:Human reliability and correlation with linguistic cues.In:Proc.Of the 31st Meeting of the Association for Computational Linguistics.1993.148-155.Http://acl.ldc.upenn.edu/P/P93/P931020.pdf
  • 6Reynar JC.Topic segmentation:Algorithms and application[Ph.D.Thesis].Pennsylvania:University of Pennsylvania,1998.
  • 7Ponte JM,Croft WB.Text segmentation by topic.In:Peters C,Thanos C,eds.Proc.of the 1st European Conf.on Research and Advanced Technology for Digital Libraries.Berlin,Heidelberg:Springer-Verlag,1997.120-129.
  • 8Reynar JC.Statistical models for topic segmentation.In:Proc.Of the 37th Annual Meeting of the Association for Computational Linguistics.1999.357-364.Http://acl.ldc.upenn.edu/P/P99/P991046.pdf
  • 9Kauchak D,Chen F.Feature-Based segmentation of narrative documents.In:Proc.Of the 43rd Annual Meeting of the Association for Computational Linguistics.2005.32-39.Http://acl.ldc.upenn.edu/W/W05/W05-04.pdf
  • 10Choi FYY.Advances in domain independent linear text segmentation.In:Proc.Of the North American Chapter of the Association for Computational Linguistics Annual Meeting.Seattle:Association for Computational Linguistics.2000.http://acl.ldc.upenn.edu/A/A00/A002004.pdf

共引文献32

同被引文献2

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部