摘要
随着互联网规模的日益增长,搜索引擎已经成为互联网上有效的信息获取工具.而在众多搜索引擎的背后,是信息检索技术,也即网页排序算法在起作用.网页排序包括重要性排序和相关性排序.通过我们研究发现,尽管这两类排序所依据的准则不同,但是都可以通过建立适当的随机过程模型来研究.对于网页重要性排序,我们通过分析用户浏览网页的行为建立了Markov骨架过程的框架.基于该框架我们分析了三种不同的随机过程模型对用户行为模拟的合理程度,并设计了名为BrowseRank的一组新算法,该算法可以根据用户上网行为来计算网页的重要性.在网页相关性排序中,我们主要针对排序结果联合问题建立了一个基于Markov链的监督学习框架.通过将传统方法的监督化,使原来难于解决的问题变的易于学习,将原来的NP-难问题转化为一个半正定规划问题,提高了效率.
As the World Wide Web grows rapidly, search engines have become the most popular tools to access the large volume of information from it. And the key factor of the search engine is the ranking model of web pages, which contains two types: static rank and dynamic rank. In past, different approaches have been designed to solve these two problems separately. In this thesis, we analyze them on the same base—stochastic process model, and design new algorithms to solve them effciently and effectively. Firstly, we establish a framework on Markov skeleton process to compute the page importance by investigating real browsing behaviors of users. Within this framework, we design a group of eight new novel algorithms all referred to as BrowseRank, to compute the page importance based on the continuous-time time-homogeneous Markov process, which is one of three special cases of the Markov skeleton process. And from the experimental results, we flnd BrowseRank outperforms other baseline algorithms, such as PageRank and TrustRank. Secondly, we build a supervised framework for rank aggregation based on Markov chain. Within this framework, we not only generalize some unsupervised algorithms to supervised ones, but also design a new approach referred to as Supervised MC2 for rank aggregation, which transform the original NP-hard problem to a semi-deffned programme.
出处
《中国科学:数学》
CSCD
北大核心
2011年第12期1095-1103,共9页
Scientia Sinica:Mathematica
基金
国家自然科学基金(批准号:11001010)资助项目