期刊文献+

网页搜索引擎查询日志的Session划分研究 被引量:16

Session Segmentation Based on Query Logs of Web Search
在线阅读 下载PDF
导出
摘要 搜索引擎查询日志中的session(以下简称session)是指某特定用户为得到某个信息需求而在一段时间内的搜索行为的连续序列。Session的正确划分是进行用户搜索行为分析等一系列工作的重要基础,目前尚没有关于session的系统研究工作。本文针对相关研究工作的问题重新统一定义了session的概念并进行探索和比较研究,得出结论:(1)统计语言模型因数据稀疏问题不适合做session划分;(2)利用多种属性的决策树方法可以得到比较理想的结果,以session为单位进行评价,F值达到了78.6%。 The session in query logs of web search denotes a sequential series of queries from a user when he is searching for certain information during a period of time. Correct session segmentation is a fundamental work for various researches such as searching activities analysis. Due to the unsystematic research on session at present, this paper redefines the conception of session and does several comparative studies. We conclude that (1) the statistical language model is not suitable for session segmentation because of the heavy data sparseness and (2) the decision tree method using multiple attributes can obtain very promising results. Evaluated at the session level, the decision tree based method achieves a F-measure up to 78.6%.
出处 《中文信息学报》 CSCD 北大核心 2009年第2期54-61,共8页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60603094) 北京市自然科学基金资助项目(4082030) 国家863资助项目(2006AA010105)
关键词 计算机应用 中文信息处理 网络信息检索 查询日志 session划分 computer application Chinese information processing web information retrieval search logs session segmentation
  • 相关文献

参考文献16

  • 1Bin Tan, Fuchun Peng. Unsupervised query segmentation using generative language models and Wikipedia[C]//Proceeding of the 17th international conference on World Wide Web. Beijing, China, 2008:347-356.
  • 2Craig Silverstein, Monika Henzinger, Hannes Marais, et al. Analysis of a very large Web search engine query log[J]. In SIGIR Forum, fall 1998, 33(1):6-12.
  • 3Daqing He, Ays, e Goker. Detecting session boundaries from Web user logs[C]//Proceedings of the 22nd annual colloquium on information, 2000.
  • 4H. Cenk Ozmutlu , Fatih cavdur, Application of automatic topic identification on excite web search engine data logs.[J]Information Processing and Management: an International Journal, 2005, 41(5) : 1243-1262.
  • 5Jing Bai, Jian-Yun Nie, Guihong Cao, Hugues Bouchard. Using query contexts in information retrieval[J]. SIGIR'07, July 23-27, 2007.
  • 6Jinhui Yuan, Huiyi Wang, Lan Xiao, Wujie Zheng, Jianmin Li, Fuzong Lin, and Bo Zhang. A Formal Study of Shot Boundary Detection. [C]//IEEE transactions on circuits and systems for video technology, VOL. 17, NO. 2, pp. 168-186. February 2007.
  • 7Qingsong Yao, Xiangji Huang and Aijun An. Applying Language Modeling to Session Identification from Database Trace Logs[C]//Knowledge and Information Systems, 2006-Springer.
  • 8S Ozmutlu, F Cavdur. Neural network applications for automatic new topic identification[J]. Online Information Review,2005, 29(1):34-53.
  • 9Seda Ozmutlu, H. Cenk Ozmutlu, Amanda Spink. Automatic New Topic Identification in Search Engine Transaction Logs using Multiple Linear Regression [C]//Proceedings of the 41st Hawaii International Conference on System Sciences. 2008: 140.
  • 10Seda Ozmutlu, Huseyin C. Ozmutlu, Buket Buyuk. Using Monte-Carlo Simulation for Automatic New Topic Identification of Search Engine Transaction Logs[C]//Proceedings of the 2007 Winter Simulation Conference. 2007: 2306-2314.

二级参考文献17

  • 1Cockburn,A.,& Jones,S.Which way now? Analyzing and easing inadequacies in WWW navigation[J].International Journal of Human-Computer Studies,1996,45,105-129.
  • 2Catledge,L.D.,& Pitkow,J.E.Characterizing Browsing Strategies in the World-Wide Web[J].Computer Networks and ISDN Systems,1995,27,1065-1073.
  • 3Tauscher,L.,& Greenberg,S.How people revisit web pages:Empirical findings and implications for the design of history systems[J].International Journal of Human-Computer Studies,1997,47,97-137.
  • 4Craig Silverstein,Monika Henzinger,Hannes Marais,et al.Analysis of a very large Web search engine query log[J].In SIGIR Forum,fall 1998,Volume 33:Number 1,6-12.
  • 5Jansen,B.J.,Spink,A.,Bateman,J.,& Saracevic,T.Real life information retrieval:A study of user queries on the Web[J].SIGIR Forum,1998,32(1):5-17.
  • 6第14次中国互联网络发展状况统计报告[R].中国互联网络信息中心(CNNIC),2004年7月.
  • 7第15次中国互联网络发展状况统计报告[R].中国互联网络信息中心(CNNIC),2005年1月.
  • 8第17次中国互联网络发展状况统计报告[R].中国互联网络中心(CNNIC),2006年1月.
  • 9Danny Sullivan,Search Engine Sizes.In search engine watch website[J],http://searchenginewatch.com/reports/article.php/2156481.
  • 10Andrei Broder,A taxonomy of web search[J].In SIGIR Forum,fall 2002,Volume 36 Number2.

共引文献120

同被引文献152

引证文献16

二级引证文献81

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部