摘要
文章基于日志挖掘,提出一种在用户兴趣空间中进行Web页面聚类的算法。算法的基础是用户访问频率矩阵A。A的行对应页面向量,列对应用户向量,A中元素是用户对页面的访问频率。对A中的行做聚类可以对页面进行相关聚类,对A中的列做聚类可以对兴趣相似的用户进行聚类。文章认为A中的这两种聚类是一对对偶问题。文章基于A和A中这两种聚类在权重之间的对偶关系,提出了用户兴趣空间的概念。用户兴趣空间突出了用户的共同兴趣,是一个正交空间。实验结果表明,与在A中直接做页面聚类相比较,用户兴趣空间中的页面聚类取得了较好的效果。
This paper provided an algorithm to clustering Web pages in user-interest space. The algorithm is based on the users'access-frequency matrix. In the matrix,URL is taken as row and UserID is taken as column,and each element's value is the user's access-frequency. Clustering between row vectors discovers rele-vant Web pages,and users with similar interests are obtained by clustering between column vectors. In this paper,it proposed that there is a duplex phenomenon between the row clustering and column clustering,and the concept 'user-interest space'is also proposed based on the matrix and the duplex phenomenon. Ex-perimental results show that contrast to clustering Web pages in the users'access-frequency matrix,clustering in the user-inter-est space can get better results.
出处
《微电子学与计算机》
CSCD
北大核心
2003年第8期10-14,68,共6页
Microelectronics & Computer
基金
中国科学院计算技术研究所领域前沿青年基金资助(20026180-24)。