期刊文献+

基于主题词对的文档重排方法 被引量:2

Re-ranking based on topic word pairs
在线阅读 下载PDF
导出
摘要 提出了一种基于主题词对的文档重排方法,使得检索结果在保持召回率的前提下提高精确率。主题词对意指能够共同表征同一主题的两个词语,其中一个来自于查询,另一个来自于文档,两者之间具有紧密的联系。主题词对的选择采用概率潜在语义索引的方法,并根据主题词对在文档中的分布状况对其进行重排。对NTCIR-5中文信息检索的文档集合进行测试,采用trec标准评估方法,结果表明采用该方法使得精确率在rigid和relax结果集上分别提高了53.6%和55.8%。 In this paper,a re-ranking approach based on topic word pairs is proposed to improve precision while recall is preserved.The topic word pairs contain two correlated words ,one of which is the original query word and the other comes from the documents.The selection is based on Probabilistic Latent Semantic Indexing(PLSI).Then,the distribution of the word pairs is used to re-rank documents.Results show a 53.6% and 55.8% improvement comparing to the initial retrieval without any re-ranking or query expansion on NTCIR-5 document collection for SLIR.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第11期161-163,共3页 Computer Engineering and Applications
基金 国家自然科学基金(the National Natural Science Foundation of China under Grant No.60442005 No.60673040) 国家社科基金(No.06BYY029) 教育部科学技术研究重点项目(No.105117)。
关键词 主题词对 概率潜在语义索引 文档重排 topic word pair Probabilistic Latent Semantic Indexing (PLSI) document re-ranking
  • 相关文献

参考文献5

  • 1Tsang T F,Luk R W P,Wong.K F.Hybrid term indexing using words and bigrams[C]//Proceedings of IRAL 1999,Academia Sinica,Taiwan,1999:112-117.
  • 2Carmel D,Farchi E,Petruschka Y,et al.Automatic query refinement using lexical affinities with maximal information gain[C]//Proceedings of the ACM SIGIR'02 Conference,Tampere,Finland,2002:11-15.
  • 3Hofmann T.Probabilistic latent semantic analysis[C]//Proceedings of the 15th Conference on Uncertainty in AI(1999).
  • 4Vechtomova O,Karamuftuoglu M.Approaches to high accuracy retrieval:phrase-based search experiments in the HARD track[C]//Proceedings of Text REtrieval Conference (TREC),Gaithersburg,Maryland,November,2004:16-19.
  • 5Yang L P,Ji D H.12R at NTCIR5[C]//Proceedings of NTCIR-5Workshop Meeting,Tokyo,Japan,December 6-9,2005.

同被引文献48

  • 1刘剑兰,朱东华.信息抽取技术在情报监测中的应用[J].情报学报,2004,23(6):661-666. 被引量:6
  • 2朱大明.学术论文引言中的参考文献简析[J].编辑学报,2005,17(3):190-191. 被引量:37
  • 3温有奎 ,温浩 ,徐端颐 ,潘龙法 .基于创新点的知识元挖掘[J].情报学报,2005,24(6):663-668. 被引量:37
  • 4温有奎,温浩.关键词与创新点词句群分布分析[J].情报学报,2007,26(1):50-55. 被引量:8
  • 5胡军伟,秦奕青,张伟.正则表达式在Web信息抽取中的应用[J].北京信息科技大学学报,2011,26(6):86-89.
  • 6Grishman R. Information extraction: Techniques and challenges [ R ]. New York: New York University Press, 1997.
  • 7Message Understanding Conference (MUC) [ EB/OL]. [ 2012 - 12 - 16 ]. http://www, itl. nist. gov/iaui/894. 02/related _projects/muc/.
  • 8Automatic Content Extraction (ACE) evaluation [ EB/OL ]. [ 2012 - 12 - 16 ]. http ://www. itl. nist. gov/iad/mig//tests/ace/.
  • 9Text Analysis Conference[EB/OL]. [2012 - 12 - 16]. http:// www. hist. gov/tac/.
  • 10Appelt D E, Onyshkevych B. The Common Pattern Specification Language [ C ]//Association for Computational Linguistics. Proceedings of a Workshop on TIPSTER. Stroudsburg :ACM, 1998 :23 - 30.

引证文献2

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部