摘要
HITS是一种经典的Web链接分析算法,其主要问题是容易发生主题漂移和互相加强。针对这些问题,提出了一种改进的算法T-HITS。通过一种网络结构图来映射垃圾链接集与其对应的网站,并结合链接文本将垃圾链接排除,最后利用可信度模型来修正结果,实验数据表明改进后的算法提高了查询结果的相关度,减少了主题漂移的发生。
HITS is one of the classical Web link analysis algorithms. The main problem is the topic drift and mutual reinforcement. According to these problems, an improved algorithm T-HITS is proposed , using a network structure to map spam collection with corresponding website and exclude the spam links with link text. At last the model of credibility is used to fix the results, and the experimental data shows that the improved algorithm has a big improvement about the degree of correlation of the results, and can decreases the probability of the topic drift.
出处
《科学技术与工程》
2009年第21期6390-6394,共5页
Science Technology and Engineering
关键词
HITS算法
可信度模型
搜索引擎
HITS algorithm
the model of credibility
search engine