摘要
经典链接分析方法 (如PageRank和HITS)更多地关注的是网页的权威度 ,而不是其主题相关度 ,所以在引导主题搜索的过程中 ,很快就发生主题漂移 为此 ,在构建主题关联拓扑模型的基础上 ,提出了Inherit/Feedback方法 ,以用于Web主题挖掘 基本思想是 :在搜索路径上 ,一个结点继承其父辈结点的主题相关度 ,并且将其主题相关度反馈给父辈结点 同时 ,提出了基于Inherit/Feedback的主题搜索算法 (IFC) 实验结果表明 ,这种方法能有效地引导主题搜索 。
Classical hyperlink analysis algorithms (such as PageRank, HITS) focus on the authority of Web page rather than its topic Thus the crawler based on these algorithms would rapidly drift away in the course of crawling In this paper a new hyperlink analysis method called Inherit/Feedback is presented The key idea is that a page inherits the topic specific correlation from its ancestors and gets the feedback from its descendants There are various applications that can be enhanced by the Inherit/Feedback method, such as pages ranking and topic specific crawling A new topic specific crawling algorithm based on Inherit/Feedback (IFC) is also proposed The experiments show that IFC performs quite well while guiding the topic specific crawling agent and it can be applied to the further discovery and mining from topic specific website
出处
《计算机研究与发展》
EI
CSCD
北大核心
2004年第5期807-811,共5页
Journal of Computer Research and Development
基金
广东省科技攻关基金项目 (C10 2 0 1
A10 2 0 10 3)
关键词
链接分析
主题搜索
WEB挖掘
hyperlink analysis
topic specific crawling
Web mining