摘要
针对日益增长的对Web数据挖掘的现状,本文提出了一种基于支持向量机和聚类的Web挖掘新方法,根据支持向量机中支持向量不会出现在两类样本集间隔以外的正确划分区的理论,通过引入聚类中的类质心、类半径、类质心距等概念,从而较好地解决快速而准确地删除非支持向量的问题,保证算法的泛化性。实验表明,采用这种改进的算法既能快速精确地对训练样本进行删减又较好地解决了泛化性问题。
For the growing of the status of Web data mining, this paper proposes a new approach of Web mining based on support vector machines and clustering, and taps new methods, according to the theory of correct divided areas that support vector will only appear in the interval of two types of sample collection, through the introduction of concepts of such as clustering center of mass, clustering radius and clustering eentroid distance, thus resolves better the problem of fast and accurately remove non-support vector to ensure the generalization of algorithm. The experimental results show that this improved algorithm not only can fast and precisely delete the training samples but also has a better solution to the issue of generalization.
出处
《计算机与现代化》
2009年第12期33-35,163,共4页
Computer and Modernization
关键词
WEB挖掘
支持向量机
聚类
Web mining
support vector machine
clustering