期刊文献+

基于用户查询日志的查询聚类 被引量:4

Query clustering using user-query logs
原文传递
导出
摘要 基于用户查询日志提出了新的查询聚类算法.用户查询日志数据量大,比通常用于查询聚类的查询展现日志和查询点击日志更加稠密,不易产生聚类小的问题,但噪声多,不容易处理.为发现相似查询并减少噪声影响,同一用户同一时段的多次查询(共现查询)之间认为具有较高相似概率.在这一假设基础上,利用查询共现关系建立查询的邻居查询向量空间.将查询用邻居查询向量表示,邻居查询向量的相似度作为聚类中的查询相似度.应用改进的基于密度聚类算法完成聚类.实验证明,95 262个查询组成数据集上,聚类算法实现查准率79.77%、查全率48.21%,平均聚类大小达到51. A new query clustering method on user-query log was presented.Traditional clustering techniques focused on queries and click-through logs,which are often sparse.The average cluster size is often small.In contrast,the user-query log is much denser as well as noisier.To reduce the influence of the noises and discover similar queries,queries visited by the same user at the same session were assumed to be mostly similar.Based on the assumption,a new similarity measure using query co-occurrence relations was calculated to create query neighbor vector space.The queries were represented by vectors consisting of their neighbors.The similarity function for clustering was calculated based on the query neighbor vectors.An adjusted clustering method of density-based spatial clustering of applications with noise(DBSCAN) was applied to generate the clusters.Experiments on a real dataset of 95262 queries show that 79.77% precision and 48.21% recall is achieved and the average cluster size achieves 51.
出处 《北京航空航天大学学报》 EI CAS CSCD 北大核心 2010年第4期500-503,共4页 Journal of Beijing University of Aeronautics and Astronautics
基金 国家863计划资助项目(2007AA010302) 国家自然科学基金资助项目(60603039 90718018)
关键词 聚类算法 搜索引擎 日志挖掘 clustering algorithms search engines data mining
  • 相关文献

参考文献10

  • 1Wen Jirong, Nie Jianyun, Zhang Hongjiang. Query clustering using user logs [J]. ACM Transactions on Information Systems, 2002,20( 1 ) :59 - 81.
  • 2Fonseca B M,Golgher P B,De Moura E S,et al. Using association rules to discovery search engines related queries [ C ]//1st Latin American Web Congress. Santiago : Citeseer,2003:66 - 71.
  • 3Beeferman D, Berger A L. Agglomerative clustering of a search engine query log[ C ]//Proceedings of the 6th ACM S1GKDD International conference on Knowledge discovery and data mining. New York : ACM Press ,2000:407 - 416.
  • 4Baeza-Yates R A, Tiberi A. Extracting semantic relations from query logs[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York : ACM Press ,2007:76 - 85.
  • 5Chan W, Leung W, Lee D. Clustering search engine query log containing noisy clickthroughs [ C ]//Proceedings of SAINT Conference 2004. Tokyo : IEEE Computer Society ,2004:305 - 308.
  • 6张辉,谢科,庞斌,吴辉.一种基于关键特征的搜索引擎结果聚类算法[J].北京航空航天大学学报,2007,33(6):739-742. 被引量:4
  • 7张刚,刘悦,郭嘉丰,程学旗.一种层次化的检索结果聚类方法[J].计算机研究与发展,2008,45(3):542-547. 被引量:15
  • 8Yi J, Maghoul F. Query clustering using click-through graph [ C ]//Proceedings of the 18th International Conference on World Wide Web. Madrid:ACM Press,2009:1055 - 1056.
  • 9Deshpande M,Karypis G. Item-based top-n recommendation algorithms [ J ]. ACM Transactions on Information Systems, 2004, 22 ( 1 ) : 143 - 177.
  • 10Ester M,Kriegel H P,Sander J,et al. A density-based algorithm for discovering clusters in large spatial databases with noise [C ]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996:226 - 231.

二级参考文献16

  • 1王志梅,张俊林,李秋山.Web检索结果快速聚类方法的研究与实现[J].计算机工程与设计,2004,25(12):2231-2233. 被引量:2
  • 2耿玉良,陈家琪,王咏梅.中文Web检索中聚类算法的改进[J].计算机工程与设计,2005,26(10):2685-2687. 被引量:9
  • 3Yitong Wang,Masaru Kitsuregawa.Use link-based clustering to improve web search results[C]//Proceedings of the 2nd International Conference on Web Information Systems Engineering.Washington:WISE,2001:119-128
  • 4Zeng Huajun,He Qicai,Che Zheng,et al.Learning to cluster web search results[C]//Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval.New York:ACM Press,2004:210-217
  • 5Andreas Hotho,Alexander Maedche,Steffen Staab.Ontologybased text document clustering[C]//Klopotek MA,Wierzchon ST,Trojanowski K.Proc of the Conf on Intelligent Information Systems.Zakopane:Springer-Verlag,2003
  • 6Wang Po-Hsiang,Wang Jung-Ying,Lee Hahn-Ming.Queryfind:search ranking based on users' feedback and expert's agreement[C]//IEEE International Conference on e-Technology,e-Commerce,and e-Service.[S.I.]:IEEE,2004:299-304
  • 7Hiroyuki Toda, Ryoji Kataoka. A search result clustering method using informatively named entities [C]. In: Proc of the ACM Workshop on Web Information and Data Management. New York: ACM Press, 2005. 81-86.
  • 8M A Hearst, J O Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results [C]. In: Proc of the ACM Special Interest Group on Information Retrieval Conf. New York: ACM Press, 1996. 76-84.
  • 9F C-iannotti, M Nanni, D Pedreschi, Webcat: Automatic categorization of Web search results [C]. In: Proc of the 11th Italian Syrup on Advanced Database Systems. Italian: Rubbettino Editore, 2003. 507-518.
  • 10Oren Zamir, Oren Etzioni. Web document clustering: A feasibility demonstration [C]. In: Proc of the ACM Special Interest Group on Information Retrieval Conf. New York: ACM Press, 1998. 46-54.

共引文献16

同被引文献23

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部