摘要
网页去重处理是提高检索质量的有效途径,本文给出了一个基于特征码的网页去重算法,介绍了算法的具体实现步骤,采用二叉排序树实现。算法有较高的判断正确率,在信息检索中有较好的应用前景。
Duplicated webpages deletion can improve quality of information retrieval. A duplicated webpages deletion algorithm based on feature code is given , the main steps of algorithm are introduced, the algorithm is realized on binary sort tree. The algorithm's precision is high, has better application in information retrieval.
出处
《微计算机信息》
北大核心
2006年第03X期113-115,共3页
Control & Automation
基金
广西区科技攻关项目(桂科攻0428002-1)
关键词
网页去重
网页特征码
二叉排序树
Duplicated webpages deletion
feature code of webpages
binary sort tree