期刊文献+

用户兴趣空间的Web页面聚类 被引量:7

Clustering Web Pages in User-Interest Space
在线阅读 下载PDF
导出
摘要 文章基于日志挖掘,提出一种在用户兴趣空间中进行Web页面聚类的算法。算法的基础是用户访问频率矩阵A。A的行对应页面向量,列对应用户向量,A中元素是用户对页面的访问频率。对A中的行做聚类可以对页面进行相关聚类,对A中的列做聚类可以对兴趣相似的用户进行聚类。文章认为A中的这两种聚类是一对对偶问题。文章基于A和A中这两种聚类在权重之间的对偶关系,提出了用户兴趣空间的概念。用户兴趣空间突出了用户的共同兴趣,是一个正交空间。实验结果表明,与在A中直接做页面聚类相比较,用户兴趣空间中的页面聚类取得了较好的效果。 This paper provided an algorithm to clustering Web pages in user-interest space. The algorithm is based on the users'access-frequency matrix. In the matrix,URL is taken as row and UserID is taken as column,and each element's value is the user's access-frequency. Clustering between row vectors discovers rele-vant Web pages,and users with similar interests are obtained by clustering between column vectors. In this paper,it proposed that there is a duplex phenomenon between the row clustering and column clustering,and the concept 'user-interest space'is also proposed based on the matrix and the duplex phenomenon. Ex-perimental results show that contrast to clustering Web pages in the users'access-frequency matrix,clustering in the user-inter-est space can get better results.
作者 郭岩
出处 《微电子学与计算机》 CSCD 北大核心 2003年第8期10-14,68,共6页 Microelectronics & Computer
基金 中国科学院计算技术研究所领域前沿青年基金资助(20026180-24)。
关键词 WEB页面 聚类 用户兴趣空间 日志挖掘 数据挖掘 INTERNET Web-Log mining,clustering,user-interest space,duplex
  • 相关文献

参考文献8

  • 1宋擒豹,沈钧毅.Web日志的高效多能挖掘算法[J].计算机研究与发展,2001,38(3):328-333. 被引量:115
  • 2苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类[J].软件学报,2002,13(1):99-104. 被引量:29
  • 3侯自新 等.线性代数及其应用[M].南开大学出版社,1990.373,325-326.
  • 4卜东波,白硕,李国杰.文本聚类中权重计算的对偶性策略[J].软件学报,2002,13(11):2083-2089. 被引量:20
  • 5黄松,刘晓明,宋自林.基于归纳化会话的网络用户的聚类[J].计算机研究与发展,2001,38(10):1224-1228. 被引量:8
  • 6Zhong Su, Qiang Yang, Hongjiang Zhang, Xiaowei Xu,Yuhen Hu. Correlation-based Document Clustering using Web Logs. http..//ifsc.ualr.edu/xwxu/publications/hicssclus-ter.pdf.
  • 7Jon M. Kleinberg. Authoritative sources in a hyperlinkvd environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.http://www.cs.comell.edu/home/kleinber/auth.pdf.
  • 8Robert Cooley, Bamshad Mobasher, Jaideep Srivastava."Data preparation for mining world wide web browsing patterns" .in:the Journal of Knowledge and Information Systems, Vol. 1, No. 1, 1999.http://maya.cs.depaul.edu/-mobasher/papers/webminer-kais.ps.

二级参考文献18

  • 1[1]Nasraoui O, Frigui H, Joshi A et al. Mining Web access logs using relational competitive fuzzy clustering. In: Proc of the 8th Int'l Fuzzy Systems Association Congress. Taiwan, 1999
  • 2[2]Nasraoui O, Krishnapuram R, Joshi A. Mining Web access logs using a relational clustering algorithm based on a robust estimator. In: Proc of the 8th Int'l World Wide Web Conference. Toronto, 1999
  • 3[3]Han J, Cai Y, Cercone N. Knowledge discovery in databases: An attribute-oriented approach. In: Proc of the 18th Int'l Conf Very Lage Data Bases. Vancouver, Canada, 1992. 547~559
  • 4[4]Hathaway R J, Bezdek J C. NERF c-means: Non-Euclidean relational fuzzy clustering. Pattern Recognition, 1994, 27(3): 429~437
  • 5Zaiane O R,Proc Advances Digital Libraries Conf,1998年,19页
  • 6Chen M S,Proc of the 16th Int Conf Distributed Computing Systems,1996年,385页
  • 7Mobasher B,Tech Rep:TR96,1996年
  • 8Ng, R., Han, J. Efficient and effective clustering methods for data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB'94). Santiago, Chile: Morgan Kaufmann, 1994. 144~155.
  • 9Ester, M., Kriegal, H.P, Sander, J. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, Evangelos, Han, Jia-wei, Fayyad, U.M., eds. KDD'96--Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
  • 10Kaufman, L., Rousseeuw, P. J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

共引文献161

同被引文献43

  • 1钱雪忠,王创伟.一种扩展Web服务体系构架下的服务发现技术[J].微计算机信息,2008,24(12):170-172. 被引量:3
  • 2马晓春,高翔,高德远.聚类分析在入侵检测系统中的应用研究[J].微电子学与计算机,2005,22(4):134-136. 被引量:13
  • 3Yan-Bo Han Zhi-Wei Xu Hai Zhuge.Preface[J].Journal of Computer Science & Technology,2006,21(4):465-465. 被引量:18
  • 4Chen M S, Park J S, Yu P S. Data mining for path traversalpattems in a Web environment[J]. In: Proc of the 16th Int'l Conf on Distributed Computing Systems. Hong Kong: [s.n. ], 1996:385-392.
  • 5孙吉贵,刘杰,赵连宇.聚类算法研究[J].Journal of Software,2008,19(1):48-61.
  • 6Ester M, Kriegal H P, Sander J. A density- based algorithm for discovering dusters in large spatial databases with noise[ C]// In: Simoudis, Evangelos, Han Jia - wei, Fayyad U M. KDD' 96 Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. [ s. l. ] : AAAI Press, 1996.
  • 7Sen Shubho, Wang Jia.Analyzing P2P traffic across large networks[C].In:Proceedings of the ACM SIGCOMM Intemet Measurement Workshop (IMW) .Marseilles, France,2002.
  • 8Pazzani M, Muramatsu J, Billsus D, Syskill&Webert:Identifying Interesting Websites, AAAI Spring Symposium on Machine Learning in Information Access, Stanford, March 1996 and Proceedings of the 13th National Conference on Artificial Intelligence AAAI 96, pp.54-61, 1996.
  • 9Billsus D. ,Pazzani M. Revising User Profiles: The Search for Interesting Websites,In Proceedings of the Third International Workshop on Muhistrategy Learning, AAAI Press,, 1994: 181-19.
  • 10Lieberman H.. Letizia: An Agent that Assists Web Browsing, International Joint Conference on Artificial Intelligence, 1995.

引证文献7

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部