摘要
随着Internet技术的高速发展,如何从海量的Web信息中快速而有效地获得所需信息也就成为一项重要课题,而数据挖掘技术是解决这一难题的有效办法。其中数据挖掘中的聚类方法是用来发现数据分布的一项重要方法。本文首先阐述了Web挖掘的有关理论,然后针对Web挖掘中的分层聚类法进行了较为详细的论述,最后使用该算法并结合改进的特征权值计算方法和文本相似度的计算方法,建立了训练文本库。
With the development of Internet technique,How to acquire the useful information quickly and effectively from information-sea has become a very important problem,but data mining is effective method to solve this problem.Therein,Clustering is an important technology in Data Mining for the discovery of data distribution.This paper described the theory of Web mining and web mining for and hierarchical clustering method,a more detailed discussion,the final use of the algorithm is combined with the improved feature weight calculation method and text similarity calculation method established a training text database.
出处
《网络安全技术与应用》
2010年第7期61-62,共2页
Network Security Technology & Application
关键词
数据挖掘
WEB文本挖掘
分层聚类算法
web text mining
data mining
Hierarchical Clustering Algorithm