摘要
随着web上信息的急剧增长,如何有效地从web上获得高质量的web信息已经成为很多研究领域里的热门研究主题之一,比如在数据库,信息检索等领域。在信息检索里,web搜索引擎是最常用的工具,然而现今的搜索引擎还远不能达到满意的要求,使用链接分析,提出了一种新的方法用来聚类web搜索结果,不同于信息检索中基于文本之间共享关键字或词的聚类算法,该文的方法是应用文献引用和匹配分析的方法,基于两web页面所共享和匹配的公共链接,并且扩展了标准的K-means聚类算法,使它更适合于处理噪音页面,并把它应用于web结果页面的聚类,为验证它的有效性,进行了初步实验。
With information proliferation on the Web,how to obtain high-quality information from the Web has been one of hot research topics in many fields like Database as well as IR.Web search engine is the most commonly used tool for information retrieval;however,its current status is far from satisfaction.we propose a new approach to cluster search results returned from Web search engine using link analysis.Unlike document clustering algorithms in IR that based on common words /phrases shared between documents,our approach is base on common links shared by pages using co-citation and coupling analysis.We also extend standard clustering algorithm K-means to make it more natural to handle noises and apply it to web search results..Preliminary experiments are conducted to investigate its effective-ness.The experiment results show that clustering on web search results via link analysis is promising
出处
《计算机工程与应用》
CSCD
北大核心
2005年第2期179-183,共5页
Computer Engineering and Applications
基金
湖南省自然科学基金项目(编号:03092)
国家教育部重点科研项目