期刊文献+

主题爬虫相关度算法研究综述 被引量:6

Reviews of Relevance Algorithm in Focused Crawler
在线阅读 下载PDF
导出
摘要 首先阐述主题爬虫相关度算法目标和相关度的计算内涵;然后根据信息处理的进化观点,以信息特征项的处理为线索,分别从字符层、语言层、语义层3个层次系统分析当前主题爬虫相关度的计算方法,并比较不同层次间各个算法的优缺点;最后总结现有的研究成果,并给出进一步的研究方向。 This paper describes the goal of relevance algorithm and relevance calculation connotation in focused crawler. Then, according to the evolutionary point of view of information processing, it systematically analyzes the current relevance calculation method of focused crawler in three levels: character layer, language layer, semantic layer, and compares the advantages/disad- vantages among algorithms from different levels. Finally, it summarizes the current research results and indicates the direction in future works.
出处 《计算机与现代化》 2013年第4期27-30,39,共5页 Computer and Modernization
基金 公益性科研院所基本科研业务费专项资金资助项目(2012-J-06)
关键词 相关度 算法 主题爬虫 概念 relevance algorithm focused crawler concept
  • 相关文献

参考文献25

  • 1Murray B H, Brian H. Sizing the Internet[ R/OL]. http;//www. cyveillance. com/web/downloads/Sizing.the..Internet.pdf, 2000-07-10.
  • 2Mizzaro S. Relevance: The whole history [ J] . Journal ofthe American Society for Information Science, 1997,48(9):810-832.
  • 3Draper S. Mizzaro* s Framework for Relevance[ EB/OL].http://www. psy. gla. ac. uk/ ,steve/stefano. html, 1998-08-16.
  • 4Borlund P. The concept of relevance in IR[ J] . Journal ofthe American Society for Information Science and Technolo-gy, 2003,54(10):913-925.
  • 5Hjorland B. The foundation of the concept of relevance[J]. Journal of the American Society for Information Sci-ence and Technology, 2010,61 (2) :217-237.
  • 6Saracevic T. Relevance: A review of and a framework forthe thinking on the notion in information science[ J] . Jour-nal of the American Society for Information Science, 1975,26(6) :321-343.
  • 7Srinivasan P, Menczer F, Pant G. A general evaluationframework for topical crawlers[J]. Information Retrieval,2005,8(3) :417^47.
  • 8Noh S, Choi Y, Seo H, et al. An intelligent topic-specificcrawler using degree of relevance[ C]// Proceedings of theIntelligent Data Engineering and Automated Leaming-IDE-AL 2004. 2004:491-498.
  • 9Ahmadi-Abkenari F, Selamat A. An architecture for a fo-cused trend parallel Web crawler with the application ofclickstream analysis [ J ] , Information Sciences, 2012,184(1):266-281.
  • 10Ingwersen P, Jarvelin K. The Turn: Integration of Informa-tion Seeking and Retrieval in Context [ M ]. Springer,2005.

二级参考文献9

  • 1刘林,汪涛,樊孝忠.主题爬虫的解决方案[J].华南理工大学学报(自然科学版),2004,32(z1):137-141. 被引量:10
  • 2龙宇巍,王永成,许欢庆.定题搜索引擎Robot的设计与算法[J].计算机仿真,2004,21(4):69-72. 被引量:9
  • 3[5]Page L, Brin S, Motwani R, et al. The PageRank citation ranking: Bringing order to the Web[EB/OL]. http:∥www-db.stanford.edu/~backrub/pageranksub.ps,1998-01-20/2003-03-25.
  • 4Marc Ehring, Mexander maedche. Ontology-focused crawling of Web documents[J], Proceedings of the 2003 ACM Symposium on Applied Computing, 2003, 1(3) :624 - 626.
  • 5董振东,董强.Ontology和HowNet[EB/OL].http://www.keenage.com/html/c-index.html., 2003-08/2006-02.
  • 6Cutler M, Shih Y, Meng W. Using the structure of HTML documents to improve retrieval [A]. Proceedings of the USENIX Symposium on Intemet Technologies and Systems Monterey[C]. California: California Press, 1997. 241 - 251.
  • 7Mdiligenti F Coetzee. Focused crawling using context graphs[A]. Proceedings of the 26th International Conference on Very Large Data Bases[C]. Cairo: Cairo Press, 2000. 527 - 534.
  • 8Ricardo Baeza-yates, Berthier Ribeiro-neto. Modem Information Retrieval[M]. Beijing: China Machine Press, 2005.
  • 9曹军.Google的PageRank技术剖析[J].情报杂志,2002,21(10):15-18. 被引量:71

共引文献19

同被引文献58

引证文献6

二级引证文献149

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部