1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文...1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。展开更多
Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to me...Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.展开更多
文摘1 引言 World Wide Web是目前全球最大的信息系统,在WWW上查询Web文档主要依赖于Internet上的索引信息系统,如Yahoo、Infoseek、AltaVista、WebCrawler、Excite、Lycos等等。由于WWW太大又没有良好的结构且Web服务器的自治性,所以Web文档的查询难以做到全面而精确。衡量Web文档查询的质量主要有两个方面:①是否能把所有相关的文档资源找出来,不要有所遗漏。
基金supported by the National Natural Science Foundation of China(No.60605020)the National High-Tech R&D Program (863) of China(Nos.2006AA01Z320 and 2006AA010105)
文摘Recently,we designed a new experimental system MSearch,which is a cross-media meta-search system built on the database of the WikipediaMM task of ImageCLEF 2008.For a meta-search engine,the kernel problem is how to merge the results from multiple member search engines and provide a more effective rank list.This paper deals with a novel fusion model employing supervised learning.Our fusion model employs ranking SVM in training the fusion weight for each member search engine. We assume the fusion weight of each member search engine as a feature of a result document returned by the meta-search engine. For a returned result document,we first build a feature vector to represent the document,and set the value of each feature as the document's score returned by the corresponding member search engine.Then we construct a training set from the documents returned from the meta-search engine to learn the fusion parameter.Finally,we use the linear fusion model based on the overlap set to merge the results set.Experimental results show that our approach significantly improves the performance of the cross-media meta-search(MSearch) and outperforms many of the existing fusion methods.