摘要
文章围绕网络日志中是否蕴含用户访问Web的规律性特性以及如何利用这些特性,研究了日志规模与用户数、Web文档数以及单位用户访问的Web文档数的关系;通过用户对Web访问动机的分析得出结论:一定时间段的Web访问日志中蕴含了用户的稳定兴趣;利用日志中蕴含的用户稳定兴趣,提出了一个基于用户行为的相关文档检索模型和搜索引擎系统SISI.SISI的实际检索性能与分析检索模型所得结论一致:检索准确率和检索时间主要依赖于用户数,检索返回的记录数主要依赖于文档数.
The work in this paper focuses on Web-log mining. Are there really some characteristics of user access existing in Web logs? And if yes, can these characteristics be described clearly? And how to use the characteristics? To try to answer these questions, this paper analyzes real Web logs. The work in this paper include: As scale of Web logs increasing, the changes of users' count, Web documents' count and the average of Web documents' count accessed by one user are analyzed. A conclusion is drawn that user's accessing on Web is more driven by stable interests than casual ones, and user's stable interests must be contained in Web logs. To make use of user's stable interests in Web logs, this paper provides a model and a search engine, SISI (Similar Interests, Similar access on Internet), which tries to mine related pages by making use of latent human judgment in related pages contained in Web logs. The performance of SISI is consistent with the analysis result of model: The accuracy and time cost of retrieval mainly rely on users' count, and count of result records mainly rely on Web documents' count.
出处
《计算机学报》
EI
CSCD
北大核心
2005年第9期1483-1496,共14页
Chinese Journal of Computers
基金
中国科学院计算技术研究所领域前沿青年基金(2002618024)资助