摘要
Web日志中含有大量的用户浏览信息,从中将相似用户及相关页面进行聚类是建立自适应网站的必要前提。通过基本的预处理,实现了日志的数据净化、用户识别会话识别及数据规约,形成了用户访问页面的序列数据库,同时通过离散化技术计算出用户访问页面频度。在这些数据准备工作的基础上,构造了用户-页面关联矩阵,作为改进的模糊C均值聚类算法的输入,实现了相似用户及相关页面的聚类。实验表明改进的FCM算法的有效性。
Web logs contain a lot of user browsing information. Clustering of similar customers and relative pages is necessary for creating adaptive web sites. Implements the web log's cleaning, user- recognizing, session - recognizing and data convention by means of preprocessing technology. Then a user- page sequence database can be achieved. Simultaneously, the frequency of the user's visit is added to the database. After all these preparation work, can get the associated matrix which is also the input of the improved fuzzy c- means algorithm. Finally realize the clustering of similar customers and relative pages. The result of experiment shows the validity of the algorithm.
出处
《计算机技术与发展》
2008年第6期32-35,共4页
Computer Technology and Development
关键词
模糊C均值聚类
Web日志预处理
关联矩阵
用户聚类
页面聚类
fuzzy c-means algorithm
Web log's data preparation
associated matrix
customer-clustering
page-clustering