摘要
首先用向量空间模型工具Lucene从全部网页正文信息中检索,再用语言模型工具Lemur对结果集进行重排序,然后将两次的结果进行融合,远回融合结果的前1000篇文档作为最终结果集。构造查询输入时,从主题的〈title〉字段和〈dese〉字段选择关键词,并依据tf*idf的思想对关键词赋予权值。时正式评测的50个主题集检索,获得的三项评价指标为:程序自动构造查询时,MAP=0.3107,P@10=0.624,R-Preeision=0.3672;人工构造查询时,MAP=0.3538,P@10=0.684,R-Preelsion=0.4078。
A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff * idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.
出处
《中文信息学报》
CSCD
北大核心
2006年第B03期83-90,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金资助项目(60435020,60575042,60503072)
关键词
查询构造
向量空间模型
语言模型
结果融合
query formulation
vector space model
language model
result combination