2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

Technology Report of HIT.IRLab for Evaluation 2005 of 863 information Retrieval

下载PDF

导出

摘要首先用向量空间模型工具Lucene从全部网页正文信息中检索，再用语言模型工具Lemur对结果集进行重排序，然后将两次的结果进行融合，远回融合结果的前1000篇文档作为最终结果集。构造查询输入时，从主题的〈title〉字段和〈dese〉字段选择关键词，并依据tf＊idf的思想对关键词赋予权值。时正式评测的50个主题集检索，获得的三项评价指标为：程序自动构造查询时，MAP=0．3107，P@10=0．624，R-Preeision=0．3672；人工构造查询时，MAP=0．3538，P@10=0．684，R-Preelsion=0．4078。 A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff ＊ idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.

作者张志昌张宇高立琦袁新成胡晓光刘挺李生

机构地区哈尔滨工业大学信息检索研究室

出处《中文信息学报》 CSCD 北大核心 2006年第B03期83-90,共8页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目（60435020,60575042,60503072）

关键词查询构造向量空间模型语言模型结果融合 query formulation vector space model language model result combination

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

12005年度863信息检索评测大纲[EB]．http://www．863data．org.cn(2005)．
2Apache Lucene [CP], http://lucene. apache. org/java/docs/index. html (2005).
3Lemur project [CP], http://www.lemurproject.org/.
4C. Zhai and Lafferty. , Model-based feedback in the KL-divergence retrieval model [C]. In:Tenth International Conference on Information and Knowledge Management (CIKM 2001). 2001,403-410.
5C. Zhai and Lafferty. A study of Smoothing methods for language models applied to ad hoc information retrieval[C]. In:Proceedings of SIGIR'01, Sep. 2001,334-342.
6张敏,高剑峰,马少平.基于链接描述文本及其上下文的Web信息检索[J].计算机研究与发展,2004,41(1):221-226. 被引量：22

二级参考文献11

1[1]R Botafogo, E Rivlin, B Shneiderman. Structural analysis of hypertext: Identifying hierarchies and useful metrics. ACM Trans on Information System, 1992, 10(2): 142～180
2[2]J Carriere, R Kazman. WebQuery: Searching and visualizing the Web through connectivity. The 6th Int'l WWW Conf (WWW6), Santa Clara, 1997
3[3]Jon M Kleinberg. Authoritative sources in a hyperlinked environment. The 9th Annual ACM-SIAM Symp on Discrete Algorithms, California, 1997
4[4]K Bharat, M R Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. The 21st Int'l ACM SIGIR Conf on Research and Development in Information Retrieval (SIGIR 98), Melbourne, 1998
5[5]S Brin, L Page. The anatomy of a large-scale hypertextual web search engine. The 7th Int'l WWW Conf (WWW7), Brisbane, Australia, 1998
6[6]L Page, S Brin .et al.. The pagerank citation ranking: Bringing order to the web. 1998. http://dbpubs.stanford.edu:8090/pub/1999-66
7[7]N Craswell, D Hawking, S E Robertson. Effective site finding using link anchor information. The SIGIR 2001, Louisiana, 2001
8[8]Gao Jianfeng .et al.. TREC-10 Web track experiments at MSRA. The 10th Text Retrieval Conf, Gaithersburg, 2001
9[9]S Chakrabarti, B Dom, D Gibson .et al.. Automatic resource compilation by analyzing hyperlink structure and associated text. The 7th Int'l WWW Conf (WWW7), Brisbane, 1998
10[10]B D Davison. Topic locality in the web. The 23rd Int'l ACM SIGIR Conf on Research and Development in Information Retrieval(SIGIR 2000), Athens, 2000

共引文献21

1苏铓,史国振,李凤华,申莹,黄琼,王苗苗.细粒度超媒体描述模型及其使用机制[J].通信学报,2013,34(S1):223-229. 被引量：1
2张娜,张化祥.基于超链接和内容相关度的检索算法[J].计算机应用,2006,26(5):1171-1173. 被引量：6
3宋玲玲,李村合.基于链接结构分析的Web信息检索方法研究[J].现代情报,2007,27(2):133-135. 被引量：7
4朱红灿,龙朝阳.基于熵的新闻网页抽取方法的研究[J].现代图书情报技术,2007(4):48-51. 被引量：2
5张泊平,张得喜.基于网页结构相关性的隐马尔可夫预取技术研究[J].计算机与数字工程,2007,35(5):88-90.
6张泊平,王睿.基于网页结构相关性的个性化推荐技术研究[J].许昌学院学报,2007,26(5):90-94. 被引量：1
7杜光芹,张化祥,赵瑞东.主题Web挖掘研究[J].计算机技术与发展,2008,18(2):94-97. 被引量：3
8李村合,吕克强.一种改进PageRank的新方法[J].计算机系统应用,2008,17(3):81-83.
9周翔.基于Websphinx网络爬虫的研究与改进[J].电脑知识与技术,2008(10):75-77.
10郑国良,叶飞跃,张滨,林国俊.基于网页内容和链接价值的相关度方法的实现[J].计算机工程与设计,2008,29(23):6020-6022. 被引量：4

1张扬嵩.递归查询构造树形结果集的通用方法[J].电脑编程技巧与维护,2011(15):18-20.
2吕碧波,王根,赵军.863信息检索评测——自动化所[J].中文信息学报,2006,20(B03):78-82.
3林海伦,杨晓刚,熊锦华,王元卓,贾岩涛,程学旗.Deep Web数据采集查询构造方法研究[J].计算机科学与探索,2015,9(9):1025-1033. 被引量：2
4张俊林,刘洋,孙乐,刘群.2005年度863信息检索评测方法研究和实施[J].中文信息学报,2006,20(B03):19-24. 被引量：3
5赵乐,岑荣伟,王灿辉,齐伟,金奕江,张敏,马少平.清华THUIR2005年863信息检索评测[J].中文信息学报,2006,20(B03):91-95.
6王彪,高光来.界模型信息检索及其参数优化[J].计算机工程与应用,2012,48(1):153-156.
7曹馨宇,曹存根.从Web获取部分整体关系语料的方法[J].中文信息学报,2011,25(5):17-23. 被引量：4
8张洁,卢德唐.异构数据集成方案的优化设计与实现[J].计算机辅助工程,2008,17(1):77-80. 被引量：6
9梁俊杰,熊亚军,余敦辉.一种基于本体的视频检索技术研究[J].计算机工程与科学,2015,37(10):1940-1946. 被引量：4
10乔亚男,齐勇,侯迪.文本信息检索实验方法研究[J].中国科技论文在线,2009,4(2):126-129. 被引量：1

中文信息学报

2006年第B03期

浏览历史

内容加载中请稍等...

2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

参考文献6

二级参考文献11

共引文献21

相关作者

相关机构

相关主题

浏览历史