期刊文献+

基于网页内容评价和Web图的启发式垂直搜索策略的设计 被引量:3

Design of Enlightening Vertical Search Strategy Based on Web Page Content Evaluation and Web Graph
原文传递
导出
摘要 针对传统的基于Web图的垂直搜索策略Authorities and Hubs,提出了一种融合了网页内容评价和Web图的启发式垂直搜索策略,此外,引入向量空间模型进行针对网页内容的主题相关度判断,进一步提高主题网页下载的准确率。实验表明,文中算法有效地提高了主题网页的聚合程度,且随着网页下载数量的增加,垂直搜索引擎的准确率逐渐递增,并在下载网页达到一定数量后,准确率趋于稳定,算法具有较好的鲁棒性,可以应用到相关垂直搜索引擎系统中。 In accordance with the traditional Web graph-based vertical search strategy, Authorities and Hubs, this paper puts forward an enlightening vertical search strategy which combines Web page content evaluation with Web graph. Moreover, Vector Space Model is used to judge the topic correlation of Web page content to further enhance the precision of the downloaded Web pages. The experiments show that the algorithm increases the topic correlation of Web page content effectively, and with the increase of the number of the downloaded Web pages, the precision of the vertical search engine increases gradually. The precision tends to be stable after the number of the downloaded Web pages reaches a certain number. The algorithm has good robustness and can be used in relevant vertical search engine systems.
作者 李广丽
出处 《情报理论与实践》 CSSCI 北大核心 2009年第9期121-124,共4页 Information Studies:Theory & Application
基金 江西省教育厅基金项目(赣教技字[2006]177号) 华东交通大学校立基金(项目编号:08xx05)资助的成果之一
关键词 垂直搜索引擎 网页 内容评价 Web图 vertical search engine Web page content evaluation Web graph
  • 相关文献

参考文献6

  • 1张岭,叶允明,宋晖,于水,马范援.一种高性能分布式Web Crawler的设计与实现[J].上海交通大学学报,2004,38(1):59-61. 被引量:6
  • 2GILES L S. Accessibility and distribution of information on the Web [J]. Nature, 1999 : 400.
  • 3DRITF M C. Indexing aids at corporate Websites : the use of robots. txt and METE1 tags [ J ]. Information Processing and Management, 2002, 38: 209-219.
  • 4GHANI R, IONES R, MLADENIC D. Building minority language corpora by learning to generate Web search queries [ J]. Knowledge and Information Systems, 2005, 7 ( 1 ) : 56-83.
  • 5MUKHERJEA S. WTMS: a system for collecting and analysing topic--specific Web information [C] //Proceedings ofthegth International World Wide Web Conference. New York: ACM Press, 2000: 245-253.
  • 6PAGE L, BRIN S. The anatomy of a large-scale hyper textual Web search engine [ C ] //Proceedings of the Seventh International World-Wide Web Conference, Brisbane, Australia, 1998: 107-117.

二级参考文献5

  • 1[1]Heydon A, Najork M. Mercator: A scalable, extensible Web Crawler[J]. World Wide Web, 1999, 2(4):219-229.
  • 2[2]Pinkerton B. Web Crawler: Finding what people want [D]. Washington: University of Washington, 2000.
  • 3[3]Fredkin E. Trie memory [J]. Communication of ACM, 1960, 26(3):490-500.
  • 4[4]IETF. Robot Exclusion Protocol [EB/OL]. http://www. robotstxt. org/wc/exclusion. html, 2001-10.
  • 5[5]Brin S, Page L. the anatomy of a large-scale hypertexual web search engine [A]. Proceeding of the WWW7 Conference [C]. Australia: Elsevier, 1998.107-117.

共引文献5

同被引文献25

  • 1罗方芳,陈国龙,郭文忠.基于改进的Fish-search算法的信息检索研究[J].福州大学学报(自然科学版),2006,34(2):184-188. 被引量:9
  • 2苏祺,项锟,孙斌.基于链接聚类的Shark-Search算法[J].山东大学学报(理学版),2006,41(3):139-143. 被引量:8
  • 3黄文蓓,杨静,顾君忠.基于分块的网页正文信息提取算法研究[J].计算机应用,2007,27(B06):24-26. 被引量:32
  • 4LI Guangli, ZHANG Hongbin. Design of a distributed spiders system based on Web service [ C ] //Proceeding of the Second Asia Conference on Web Mining and Web-based Application. Los Alamitos, USA, IEEE Computer Society, 2009 : 167-170.
  • 5Wikipediz. HeritrixHomepage [ EB/OL]. http//en, wikipedia, org/ wiki/Heritrix.
  • 6HALSE J E. Heritr/x developer documentation [ EB/OL ]. http: //crawler. archive, org/articles/developer _ manual. html.
  • 7SIGURDSSO K. Heritrix user manual [ EB/OL]. http: // crawler, archive, org/articles/user_ manual/index, html.
  • 8Vector space model [ EB/OL]. http: //en. wikipedia, org/wiki/Vector_ space_ model.
  • 9Li G, Zhang H. Design of a Distributed Spiders System Based on Web Service [ C ]. In : Proceedings of the 2rid Asia Conference on Web Mining and Web - based Application. Washington, DC, USA : IEEE Computer Society, 2009 : 167 - 170.
  • 10Zhang H, Liu J. Search Engine Design Based on Web Service andl,ucene [ C ]. In : Proceedings of the 2009 WASE lnternatiorml Conferertce on Information. Engineering. Washington, DC, USA:IEEE Computer Sociely, 2009:458 -461.

引证文献3

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部