期刊文献+

2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

Technology Report of HIT.IRLab for Evaluation 2005 of 863 information Retrieval
在线阅读 下载PDF
导出
摘要 首先用向量空间模型工具Lucene从全部网页正文信息中检索,再用语言模型工具Lemur对结果集进行重排序,然后将两次的结果进行融合,远回融合结果的前1000篇文档作为最终结果集。构造查询输入时,从主题的〈title〉字段和〈dese〉字段选择关键词,并依据tf*idf的思想对关键词赋予权值。时正式评测的50个主题集检索,获得的三项评价指标为:程序自动构造查询时,MAP=0.3107,P@10=0.624,R-Preeision=0.3672;人工构造查询时,MAP=0.3538,P@10=0.684,R-Preelsion=0.4078。 A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff * idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.
出处 《中文信息学报》 CSCD 北大核心 2006年第B03期83-90,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60435020,60575042,60503072)
关键词 查询构造 向量空间模型 语言模型 结果融合 query formulation vector space model language model result combination
  • 相关文献

参考文献6

  • 12005年度863信息检索评测大纲[EB].http://www.863data.org.cn(2005).
  • 2Apache Lucene [CP], http://lucene. apache. org/java/docs/index. html (2005).
  • 3Lemur project [CP], http://www.lemurproject.org/.
  • 4C. Zhai and Lafferty. , Model-based feedback in the KL-divergence retrieval model [C]. In:Tenth International Conference on Information and Knowledge Management (CIKM 2001). 2001,403-410.
  • 5C. Zhai and Lafferty. A study of Smoothing methods for language models applied to ad hoc information retrieval[C]. In:Proceedings of SIGIR'01, Sep. 2001,334-342.
  • 6张敏,高剑峰,马少平.基于链接描述文本及其上下文的Web信息检索[J].计算机研究与发展,2004,41(1):221-226. 被引量:22

二级参考文献11

  • 1[1]R Botafogo, E Rivlin, B Shneiderman. Structural analysis of hypertext: Identifying hierarchies and useful metrics. ACM Trans on Information System, 1992, 10(2): 142~180
  • 2[2]J Carriere, R Kazman. WebQuery: Searching and visualizing the Web through connectivity. The 6th Int'l WWW Conf (WWW6), Santa Clara, 1997
  • 3[3]Jon M Kleinberg. Authoritative sources in a hyperlinked environment. The 9th Annual ACM-SIAM Symp on Discrete Algorithms, California, 1997
  • 4[4]K Bharat, M R Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. The 21st Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR 98), Melbourne, 1998
  • 5[5]S Brin, L Page. The anatomy of a large-scale hypertextual web search engine. The 7th Int'l WWW Conf (WWW7), Brisbane, Australia, 1998
  • 6[6]L Page, S Brin .et al.. The pagerank citation ranking: Bringing order to the web. 1998. http://dbpubs.stanford.edu:8090/pub/1999-66
  • 7[7]N Craswell, D Hawking, S E Robertson. Effective site finding using link anchor information. The SIGIR 2001, Louisiana, 2001
  • 8[8]Gao Jianfeng .et al.. TREC-10 Web track experiments at MSRA. The 10th Text Retrieval Conf, Gaithersburg, 2001
  • 9[9]S Chakrabarti, B Dom, D Gibson .et al.. Automatic resource compilation by analyzing hyperlink structure and associated text. The 7th Int'l WWW Conf (WWW7), Brisbane, 1998
  • 10[10]B D Davison. Topic locality in the web. The 23rd Int'l ACM SIGIR Conf on Research and Development in Information Retrieval(SIGIR 2000), Athens, 2000

共引文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部