期刊文献+

网页排序中的随机模型及算法 被引量:2

Research on stochastic models and algorithms for web page ranking
原文传递
导出
摘要 随着互联网规模的日益增长,搜索引擎已经成为互联网上有效的信息获取工具.而在众多搜索引擎的背后,是信息检索技术,也即网页排序算法在起作用.网页排序包括重要性排序和相关性排序.通过我们研究发现,尽管这两类排序所依据的准则不同,但是都可以通过建立适当的随机过程模型来研究.对于网页重要性排序,我们通过分析用户浏览网页的行为建立了Markov骨架过程的框架.基于该框架我们分析了三种不同的随机过程模型对用户行为模拟的合理程度,并设计了名为BrowseRank的一组新算法,该算法可以根据用户上网行为来计算网页的重要性.在网页相关性排序中,我们主要针对排序结果联合问题建立了一个基于Markov链的监督学习框架.通过将传统方法的监督化,使原来难于解决的问题变的易于学习,将原来的NP-难问题转化为一个半正定规划问题,提高了效率. As the World Wide Web grows rapidly, search engines have become the most popular tools to access the large volume of information from it. And the key factor of the search engine is the ranking model of web pages, which contains two types: static rank and dynamic rank. In past, different approaches have been designed to solve these two problems separately. In this thesis, we analyze them on the same base—stochastic process model, and design new algorithms to solve them effciently and effectively. Firstly, we establish a framework on Markov skeleton process to compute the page importance by investigating real browsing behaviors of users. Within this framework, we design a group of eight new novel algorithms all referred to as BrowseRank, to compute the page importance based on the continuous-time time-homogeneous Markov process, which is one of three special cases of the Markov skeleton process. And from the experimental results, we flnd BrowseRank outperforms other baseline algorithms, such as PageRank and TrustRank. Secondly, we build a supervised framework for rank aggregation based on Markov chain. Within this framework, we not only generalize some unsupervised algorithms to supervised ones, but also design a new approach referred to as Supervised MC2 for rank aggregation, which transform the original NP-hard problem to a semi-deffned programme.
出处 《中国科学:数学》 CSCD 北大核心 2011年第12期1095-1103,共9页 Scientia Sinica:Mathematica
基金 国家自然科学基金(批准号:11001010)资助项目
关键词 信息检索 排序联合问题 MARKOV骨架过程 BrowseRank算法 information retrieval rank aggregation Markov skeleton process BrowseRank
  • 相关文献

参考文献2

二级参考文献11

  • 1Brin S,Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine.Computer Networks and ISDN Systems,1998,30(1-7):107-117.
  • 2Page L,Brin S,Motwani R,Winograd T.The Pagerank Citation Flanking:Bringing Order to the Web.Technical Report 1999-66,Stanford InfoLab,1999.
  • 3Liu Y,Gao B,Liu T-Y,Zhang Y,Ma Z,He S,Li H.Browserank:Letting Web Users Vote for Page Importance.In SIGIR '08:Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development In Information Retrieval,2008,451-458.
  • 4Liu Y,Liu T-Y,Gao B,Ma Z,Li H.A Framework to Compute Page Importance Based on User Behaviors.Information Retrieval,2010,13(1):22-15.
  • 5Langville A N,Meyer C D.Deeper Inside Pagerank.Internet Mathematics,2004,1(3):335-400.
  • 6Qian M P,Gong G L.Stochastic Process.Second Version.Beijing:Peking University Press (in Chinese),1997.
  • 7Gyongyi Z,Garcia-Molina H,Pedersen J.Combating Web Spam with Trustrank.In VLDB '04:Proceedings of the Thirtieth international conference on Very large data bases,2004,576-587.
  • 8Stroock D W.An Introduction to Markov Processes (Graduate Texts in Mathematics).New York:Springer-Verlag,2006.
  • 9Stewart W J.Introduction to the Numerical Solution of Markov Chains.Princeton:Princeton University Press,1994.
  • 10Wang Z K,Yang X Q.Birth and Death Processes and Markov Chains.New York:Springer-Verlag,1992.

共引文献8

同被引文献29

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部