期刊文献+

基于可信度模型的HITS算法的改进

Improvements of the HITS Algorithm Based on the Model of Credibility
在线阅读 下载PDF
导出
摘要 HITS是一种经典的Web链接分析算法,其主要问题是容易发生主题漂移和互相加强。针对这些问题,提出了一种改进的算法T-HITS。通过一种网络结构图来映射垃圾链接集与其对应的网站,并结合链接文本将垃圾链接排除,最后利用可信度模型来修正结果,实验数据表明改进后的算法提高了查询结果的相关度,减少了主题漂移的发生。 HITS is one of the classical Web link analysis algorithms. The main problem is the topic drift and mutual reinforcement. According to these problems, an improved algorithm T-HITS is proposed , using a network structure to map spam collection with corresponding website and exclude the spam links with link text. At last the model of credibility is used to fix the results, and the experimental data shows that the improved algorithm has a big improvement about the degree of correlation of the results, and can decreases the probability of the topic drift.
出处 《科学技术与工程》 2009年第21期6390-6394,共5页 Science Technology and Engineering
关键词 HITS算法 可信度模型 搜索引擎 HITS algorithm the model of credibility search engine
  • 相关文献

参考文献13

  • 1Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web, http:// www-db. Stanford. edu/- baekrub/pageranksub, ps, 1998.
  • 2Kleinberg J. Authoritative sources in a hyperlinked environment. Proc 9th ACM-SIAM Symposium on Discrete Algorithms, 1998:668-677.
  • 3Bharat K, Henzinger M R. Improved algorithms for topic distillation in a hyperlinked environment. Proc 21 st ACM SIGIR Conference, 1998 : 104-111.
  • 4Lempel R, Moran S. The stochastic approach for link-structure analysis (SALSA) and the tkc effect. Proc 9th WWW Conference, 2000; 387-401.
  • 5Li L , Shang Y. Zhang W. Improvement of HITS-based algorithms on Web documents. Proc 11 th WWW Conference,2002:527-535.
  • 6Wang X , Lu Z Zhou A. Topic exploration and distillation for Web search by a similarity-based analysis. Proc 3rd WAIM Conference, 2002:316-327.
  • 7Costa Carvalho A, Chirita P, Moura E. Site level noise removal for search engines. Proc 15th WWW Conference, 2006:73-82.
  • 8Fetterly D, Manasse M, Najork M. Spam, damn spam, and statistics: Using statistical analysis to locate spare Web pages. Proc 7th International Workshop on the Web and Databases, 2004:1-6.
  • 9Fetterly D , Manasse M , Najork M, et al. Detecting spam Web pages through content analysis. Proc 15th WWW Conference,2006:83-92.
  • 10Gyongyi Z , Garcia-Molina H, Pedersen J. Combating Web spam with TrustRank. Proc 30th VLDB Conference, 2004:576-587.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部