期刊文献+

网络日志规模分析和用户兴趣挖掘 被引量:62

Analyzing Scale of Web Logs and Mining Users’ Interests
在线阅读 下载PDF
导出
摘要 文章围绕网络日志中是否蕴含用户访问Web的规律性特性以及如何利用这些特性,研究了日志规模与用户数、Web文档数以及单位用户访问的Web文档数的关系;通过用户对Web访问动机的分析得出结论:一定时间段的Web访问日志中蕴含了用户的稳定兴趣;利用日志中蕴含的用户稳定兴趣,提出了一个基于用户行为的相关文档检索模型和搜索引擎系统SISI.SISI的实际检索性能与分析检索模型所得结论一致:检索准确率和检索时间主要依赖于用户数,检索返回的记录数主要依赖于文档数. The work in this paper focuses on Web-log mining. Are there really some characteristics of user access existing in Web logs? And if yes, can these characteristics be described clearly? And how to use the characteristics? To try to answer these questions, this paper analyzes real Web logs. The work in this paper include: As scale of Web logs increasing, the changes of users' count, Web documents' count and the average of Web documents' count accessed by one user are analyzed. A conclusion is drawn that user's accessing on Web is more driven by stable interests than casual ones, and user's stable interests must be contained in Web logs. To make use of user's stable interests in Web logs, this paper provides a model and a search engine, SISI (Similar Interests, Similar access on Internet), which tries to mine related pages by making use of latent human judgment in related pages contained in Web logs. The performance of SISI is consistent with the analysis result of model: The accuracy and time cost of retrieval mainly rely on users' count, and count of result records mainly rely on Web documents' count.
出处 《计算机学报》 EI CSCD 北大核心 2005年第9期1483-1496,共14页 Chinese Journal of Computers
基金 中国科学院计算技术研究所领域前沿青年基金(2002618024)资助
关键词 WEB日志挖掘 日志规模 兴趣 用户行为 Web-log mining scale of Web logs interest users action
  • 相关文献

参考文献7

  • 1Perkowitz M., Etzioni O.. Towards adaptive Web sites: Conceptual framework and case study. Artificial Intelligence, 2000, 118: 245~275.
  • 2Schechter S., Krishnan M., Smith M.D.. Using path profiles to predict HTTP requests. In: Proceedings of the 7th International World Wide Web Conference Computer, Networks and ISDN Systems, Brisbane, Australia, 1998, 30: 457~467.
  • 3宋擒豹,沈钧毅.Web日志的高效多能挖掘算法[J].计算机研究与发展,2001,38(3):328-333. 被引量:115
  • 4Cooley R., Mobasher B., Srivastava J.. Data preparation for mining world wide Web browsing patterns. Knowledge and Information Systems, 1999, 1(1): 5~32.
  • 5叶弈乾 孔克勤.个性心理学[M].上海:华东师范大学出版社,1993.349,181.
  • 6郭岩.基于网络用户行为的相关页面挖掘模型[J].微电子学与计算机,2003,20(5):76-82. 被引量:11
  • 7郭岩.基于网络用户行为的搜索引擎系统SISI[J].计算机工程,2004,30(16):9-11. 被引量:1

二级参考文献12

  • 1Resnick P, Iacovou N, Suchak M, Bergstrom P, RiedI J.GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proceedings of 1994 Conference on Computer Supported Collaborative Work, 1994: 175-186.
  • 2Borchers AI, Herlocker Jon, Konstan Joseph, Riedl John.Ganging up on Information Overload. Interact WatchComputer, 1998(4): 106-108.
  • 3Zaiane O R,Proc Advances Digital Libraries Conf,1998年,19页
  • 4Chen M S,Proc of the 16th Int Conf Distributed Computing Systems,1996年,385页
  • 5Mobasher B,Tech Rep:TR96,1996年
  • 6Perkowitz M,Etzioni O.Towards Adaptive Web Sites:Conceptual Framework and Case Study. Artificial Intelligence,2000,118:245-275,http ://www. perkowitz.net/research/papers/aij 99. ps
  • 7Schechter S,Krishnan M,Smith M D.Using Path Profiles to Predict HTTP Requests.http://www7.scu.edu.au/programme/fullpapers/1917/com 1917.htm
  • 8Cooley R,Mobasher B,Srivastava J.Data Preparation for Mining World Wide Web Browsing Patterns.The Journal of Knowledge and Information Systems,http://maya.cs.depaul.edu/~mobasher/papers/webminer-kais.ps, 1999, 1(1)
  • 9王家钺.信息检索中''''相关性''''概念的研究[EB/OL].http://www.in2in.com/jywang/pbbl/mfl0l 02.htm,.
  • 10Netscape Communications Corporation. What′s Related. http://wp.netscape.com/escapes/related/

共引文献126

同被引文献517

引证文献62

二级引证文献475

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部