摘要
针对目前搜索引擎返回候选信息过多从而使用户不能准确查找与主题有关结果的问题,提出了基于超链接信息的搜索引擎检索结果聚类方法,通过对网页的超链接锚文档和网页文档内容挖掘,最终将网页聚成不同的子类别。这种方法在依据网页内容进行聚类的同时,充分利用了Web结构和超链接信息,比传统的结构挖掘方法更能体现网站文档的内容特点,从而提高了聚类的准确性。
As for the problem that users spend much time to select the results from that returned from search engine and can not get the exact one, this paper presents a new approach for clustering the search engine searching results based on hyperlink information. Through mining the hyperlink anchor file of web page ,this method clusters the web page into the different little kinds, in which the contents of web page are clustered and Web structure and hyperlink information are made a best use,giving a more characteristics of Web document contents than the traditional structure mining method and improving the veracity of clustering.
出处
《电脑开发与应用》
2007年第5期16-17,20,共3页
Computer Development & Applications
关键词
搜索引擎
超链接
结构挖掘
聚类
search engines ,hyperlink, structure mining, clustering