用户兴趣空间的Web页面聚类被引量：7

Clustering Web Pages in User-Interest Space

下载PDF

导出

摘要文章基于日志挖掘,提出一种在用户兴趣空间中进行Web页面聚类的算法。算法的基础是用户访问频率矩阵A。A的行对应页面向量,列对应用户向量,A中元素是用户对页面的访问频率。对A中的行做聚类可以对页面进行相关聚类,对A中的列做聚类可以对兴趣相似的用户进行聚类。文章认为A中的这两种聚类是一对对偶问题。文章基于A和A中这两种聚类在权重之间的对偶关系,提出了用户兴趣空间的概念。用户兴趣空间突出了用户的共同兴趣,是一个正交空间。实验结果表明,与在A中直接做页面聚类相比较,用户兴趣空间中的页面聚类取得了较好的效果。 This paper provided an algorithm to clustering Web pages in user-interest space. The algorithm is based on the users'access-frequency matrix. In the matrix,URL is taken as row and UserID is taken as column,and each element's value is the user's access-frequency. Clustering between row vectors discovers rele-vant Web pages,and users with similar interests are obtained by clustering between column vectors. In this paper,it proposed that there is a duplex phenomenon between the row clustering and column clustering,and the concept 'user-interest space'is also proposed based on the matrix and the duplex phenomenon. Ex-perimental results show that contrast to clustering Web pages in the users'access-frequency matrix,clustering in the user-inter-est space can get better results.

作者郭岩

机构地区中国科学院计算技术研究所

出处《微电子学与计算机》 CSCD 北大核心 2003年第8期10-14,68,共6页 Microelectronics & Computer

基金中国科学院计算技术研究所领域前沿青年基金资助(20026180-24)。

关键词 WEB页面聚类用户兴趣空间日志挖掘数据挖掘 INTERNET Web-Log mining,clustering,user-interest space,duplex

分类号 TP393.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1宋擒豹,沈钧毅.Web日志的高效多能挖掘算法[J].计算机研究与发展,2001,38(3):328-333. 被引量：115
2苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类[J].软件学报,2002,13(1):99-104. 被引量：29
3侯自新等.线性代数及其应用[M].南开大学出版社,1990.373,325-326.
4卜东波,白硕,李国杰.文本聚类中权重计算的对偶性策略[J].软件学报,2002,13(11):2083-2089. 被引量：20
5黄松,刘晓明,宋自林.基于归纳化会话的网络用户的聚类[J].计算机研究与发展,2001,38(10):1224-1228. 被引量：8
6Zhong Su, Qiang Yang, Hongjiang Zhang, Xiaowei Xu,Yuhen Hu. Correlation-based Document Clustering using Web Logs. http..//ifsc.ualr.edu/xwxu/publications/hicssclus-ter.pdf.
7Jon M. Kleinberg. Authoritative sources in a hyperlinkvd environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.http://www.cs.comell.edu/home/kleinber/auth.pdf.
8Robert Cooley, Bamshad Mobasher, Jaideep Srivastava."Data preparation for mining world wide web browsing patterns" .in:the Journal of Knowledge and Information Systems, Vol. 1, No. 1, 1999.http://maya.cs.depaul.edu/-mobasher/papers/webminer-kais.ps.

二级参考文献18

1[1]Nasraoui O, Frigui H, Joshi A et al. Mining Web access logs using relational competitive fuzzy clustering. In: Proc of the 8th Int'l Fuzzy Systems Association Congress. Taiwan, 1999
2[2]Nasraoui O, Krishnapuram R, Joshi A. Mining Web access logs using a relational clustering algorithm based on a robust estimator. In: Proc of the 8th Int'l World Wide Web Conference. Toronto, 1999
3[3]Han J, Cai Y, Cercone N. Knowledge discovery in databases: An attribute-oriented approach. In: Proc of the 18th Int'l Conf Very Lage Data Bases. Vancouver, Canada, 1992. 547～559
4[4]Hathaway R J, Bezdek J C. NERF c-means: Non-Euclidean relational fuzzy clustering. Pattern Recognition, 1994, 27(3): 429～437
5Zaiane O R，Proc Advances Digital Libraries Conf，1998年，19页
6Chen M S，Proc of the 16th Int Conf Distributed Computing Systems，1996年，385页
7Mobasher B，Tech Rep:TR96，1996年
8Ng, R., Han, J. Efficient and effective clustering methods for data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB'94). Santiago, Chile: Morgan Kaufmann, 1994. 144～155.
9Ester, M., Kriegal, H.P, Sander, J. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, Evangelos, Han, Jia-wei, Fayyad, U.M., eds. KDD'96--Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
10Kaufman, L., Rousseeuw, P. J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

共引文献161

1吕佳.Web日志挖掘技术应用研究[J].重庆师范大学学报（自然科学版）,2006,23(4):39-44. 被引量：15
2赵娜,臧景才.多标记传播聚类算法在电子商务中的应用[J].青海大学学报（自然科学版）,2009,27(1):85-88.
3蒋宗礼,李宪雷,徐学可.基于主题Hub值的元搜索[J].北京工业大学学报,2009,35(3):397-402. 被引量：1
4薛昌春.浅谈电子商务中客户购物信息挖掘研究[J].科技经济市场,2007(11):32-33. 被引量：1
5蔡猷花,张岐山.Web日志挖掘及其在电子商务领域的应用[J].管理学报,2005,2(z1):133-135.
6刘海峰,王元元,张学仁.基于VSM的模糊标引文本检索若干问题研究[J].图书情报工作,2006,50(S2):127-130.
7朱丽红,赵燕平.Web挖掘研究综述[J].情报杂志,2004,23(7):2-5. 被引量：16
8朱克斌,唐菁,杨炳儒.Web文本挖掘系统及聚类分析算法[J].计算机工程,2004,30(13):138-139. 被引量：7
9张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法[J].计算机研究与发展,2004,41(10):1748-1753. 被引量：16
10严华云.Web挖掘在网络教育中的应用研究[J].湖州师范学院学报,2003,25(6):72-75. 被引量：10

同被引文献43

1钱雪忠,王创伟.一种扩展Web服务体系构架下的服务发现技术[J].微计算机信息,2008,24(12):170-172. 被引量：3
2马晓春,高翔,高德远.聚类分析在入侵检测系统中的应用研究[J].微电子学与计算机,2005,22(4):134-136. 被引量：13
3Yan-Bo Han Zhi-Wei Xu Hai Zhuge.Preface[J].Journal of Computer Science & Technology,2006,21(4):465-465. 被引量：18
4Chen M S, Park J S, Yu P S. Data mining for path traversalpattems in a Web environment[J]. In: Proc of the 16th Int'l Conf on Distributed Computing Systems. Hong Kong: [s.n. ], 1996:385-392.
5孙吉贵,刘杰,赵连宇.聚类算法研究[J].Journal of Software,2008,19(1):48-61.
6Ester M, Kriegal H P, Sander J. A density- based algorithm for discovering dusters in large spatial databases with noise[ C]// In: Simoudis, Evangelos, Han Jia - wei, Fayyad U M. KDD' 96 Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. [ s. l. ] : AAAI Press, 1996.
7Sen Shubho, Wang Jia.Analyzing P2P traffic across large networks[C].In:Proceedings of the ACM SIGCOMM Intemet Measurement Workshop (IMW) .Marseilles, France,2002.
8Pazzani M, Muramatsu J, Billsus D, Syskill&Webert:Identifying Interesting Websites, AAAI Spring Symposium on Machine Learning in Information Access, Stanford, March 1996 and Proceedings of the 13th National Conference on Artificial Intelligence AAAI 96, pp.54-61, 1996.
9Billsus D. ,Pazzani M. Revising User Profiles: The Search for Interesting Websites,In Proceedings of the Third International Workshop on Muhistrategy Learning, AAAI Press,, 1994: 181-19.
10Lieberman H.. Letizia: An Agent that Assists Web Browsing, International Joint Conference on Artificial Intelligence, 1995.

引证文献7

1战立强,刘大昕.基于网页模糊分类的用户兴趣度分析方法[J].计算机工程与应用,2005,41(15):188-190. 被引量：2
2高利军,王辉,张望.个性化服务中自适应聚类算法的研究[J].微电子学与计算机,2007,24(8):89-91. 被引量：3
3石磊,姚瑶.马尔可夫预测模型中转移概率矩阵的压缩与应用[J].计算机应用,2007,27(11):2746-2749. 被引量：3
4陈基漓,牛秦洲.基于用户反馈的兴趣模型在信息检索中的应用[J].信息技术,2008,32(2):19-21.
5方杰,张结魁,周军.基于有向带权图的页面聚类算法研究[J].计算机技术与发展,2009,19(9):49-53. 被引量：2
6明德廷,李娟,邱晓红,杨珺.基于Web Services的用户节点聚类[J].微计算机信息,2009,25(36):207-208.
7王德荣,李卫华.网络号百用户兴趣模型挖掘算法[J].现代计算机,2010,16(4):44-48. 被引量：1

二级引证文献11

1程书强.网络经济时代企业的CI策划[J].管理现代化,2006,26(1):32-33. 被引量：5
2刘莉,翟登辉,姜新丽.电力系统不良数据检测与辨识方法的现状与发展[J].电力系统保护与控制,2010,38(5):143-147. 被引量：44
3王宝石,段志强,翟登辉.基于有效指数k-means算法在电力系统不良数据辨识中应用[J].东北电力技术,2010,31(3):16-18. 被引量：3
4曾丽芳,朱征宇,陈烨.基于web日志和网页特征内容的用户兴趣度计算[J].微处理机,2010,31(4):86-90. 被引量：5
5杨通辉,高玲,臧丽.基于相似性的商品陈列研究[J].微型机与应用,2012,31(5):59-61.
6贾佳.基于网格密度的带有层次因子的聚类算法[J].计算机技术与发展,2012,22(6):10-13. 被引量：1
7邱德红,李源,李浩,徐秀.一种面向加权双向图的聚类发掘方法[J].小型微型计算机系统,2012,33(7):1568-1571. 被引量：2
8杜振龙,杨凡,李晓丽,沈钢纲.基于复合特征的复制粘贴伪造图像盲检测[J].计算机工程与设计,2012,33(11):4264-4267. 被引量：9
9王芳,秦永彬,许道云.有限随机系统的极限概率分解[J].贵州大学学报（自然科学版）,2013,30(2):55-59.
10梁弼,蒲国林,肖丽利.一种改进的用户兴趣模型构建及应用[J].软件导刊,2014,13(9):141-143. 被引量：1

1郭岩,白硕.因子分析在基于用户兴趣的Web文档聚类中的应用[J].模式识别与人工智能,2005,18(1):81-88. 被引量：2
2谢红薇,颜小林,余雪丽.基于本体的Web页面聚类研究[J].计算机科学,2008,35(9):153-155. 被引量：10
3赵歆波,邹晓春.基于正交投影的主动外观模型匹配算法[J].西北工业大学学报,2008,26(2):168-172. 被引量：1
4蒋盛益.基于投票机制的融合聚类算法[J].小型微型计算机系统,2007,28(2):306-309. 被引量：7
5周祥,郑应平,王令群.基于Web的数据挖掘技术研究及其在电子商务中的应用[J].电脑知识与技术,2005(11):18-20. 被引量：4
6贾冬艳,张付志.基于双重邻居选取策略的协同过滤推荐算法[J].计算机研究与发展,2013,50(5):1076-1084. 被引量：60
7林国平.基于聚类的Web序列模式挖掘[J].漳州师范学院学报（自然科学版）,2005,18(4):21-27. 被引量：1
8李广都,李勇.基于Web挖掘的个性化服务研究[J].情报理论与实践,2004,27(1):72-76. 被引量：10
9张卫丰,徐宝文,许蕾,陈振强,赵凯华.利用Agent个性化搜索结果[J].小型微型计算机系统,2001,22(6):724-727. 被引量：20
10刘寿强,祁明.基于Hadoop云平台的社交大数据协同过滤个性化推荐的研究与实现[J].现代计算机（中旬刊）,2016(11):76-80. 被引量：3

微电子学与计算机

2003年第8期

浏览历史

内容加载中请稍等...

用户兴趣空间的Web页面聚类被引量：7

参考文献8

二级参考文献18

共引文献161

同被引文献43

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

用户兴趣空间的Web页面聚类 被引量：7

参考文献8

二级参考文献18

共引文献161

同被引文献43

引证文献7

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

用户兴趣空间的Web页面聚类被引量：7