网页排序中的随机模型及算法被引量：2

Research on stochastic models and algorithms for web page ranking

导出

摘要随着互联网规模的日益增长,搜索引擎已经成为互联网上有效的信息获取工具.而在众多搜索引擎的背后,是信息检索技术,也即网页排序算法在起作用.网页排序包括重要性排序和相关性排序.通过我们研究发现,尽管这两类排序所依据的准则不同,但是都可以通过建立适当的随机过程模型来研究.对于网页重要性排序,我们通过分析用户浏览网页的行为建立了Markov骨架过程的框架.基于该框架我们分析了三种不同的随机过程模型对用户行为模拟的合理程度,并设计了名为BrowseRank的一组新算法,该算法可以根据用户上网行为来计算网页的重要性.在网页相关性排序中,我们主要针对排序结果联合问题建立了一个基于Markov链的监督学习框架.通过将传统方法的监督化,使原来难于解决的问题变的易于学习,将原来的NP-难问题转化为一个半正定规划问题,提高了效率. As the World Wide Web grows rapidly, search engines have become the most popular tools to access the large volume of information from it. And the key factor of the search engine is the ranking model of web pages, which contains two types： static rank and dynamic rank. In past, different approaches have been designed to solve these two problems separately. In this thesis, we analyze them on the same base—stochastic process model, and design new algorithms to solve them effciently and effectively. Firstly, we establish a framework on Markov skeleton process to compute the page importance by investigating real browsing behaviors of users. Within this framework, we design a group of eight new novel algorithms all referred to as BrowseRank, to compute the page importance based on the continuous-time time-homogeneous Markov process, which is one of three special cases of the Markov skeleton process. And from the experimental results, we flnd BrowseRank outperforms other baseline algorithms, such as PageRank and TrustRank. Secondly, we build a supervised framework for rank aggregation based on Markov chain. Within this framework, we not only generalize some unsupervised algorithms to supervised ones, but also design a new approach referred to as Supervised MC2 for rank aggregation, which transform the original NP-hard problem to a semi-deffned programme.

作者刘玉婷马志明

机构地区北京交通大学理学院数学系中国科学院数学与系统科学研究院

出处《中国科学：数学》 CSCD 北大核心 2011年第12期1095-1103,共9页 Scientia Sinica：Mathematica

基金国家自然科学基金(批准号:11001010)资助项目

关键词信息检索排序联合问题 MARKOV骨架过程 BrowseRank算法 information retrieval rank aggregation Markov skeleton process BrowseRank

分类号 O211.6 [理学—概率论与数理统计] TP393.092 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1刘玉婷.两类网页重要性排序算法的概率对比[J].应用数学学报,2010,33(3):432-442. 被引量：1
2Hou Zhenting,Liu Zaiming,Zou Jiezhong.Markov skeleton processes[J].Chinese Science Bulletin,1998,43(11):881-889. 被引量：9

二级参考文献11

1Brin S,Page L.The Anatomy of a Large-scale Hypertextual Web Search Engine.Computer Networks and ISDN Systems,1998,30(1-7):107-117.
2Page L,Brin S,Motwani R,Winograd T.The Pagerank Citation Flanking:Bringing Order to the Web.Technical Report 1999-66,Stanford InfoLab,1999.
3Liu Y,Gao B,Liu T-Y,Zhang Y,Ma Z,He S,Li H.Browserank:Letting Web Users Vote for Page Importance.In SIGIR '08:Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development In Information Retrieval,2008,451-458.
4Liu Y,Liu T-Y,Gao B,Ma Z,Li H.A Framework to Compute Page Importance Based on User Behaviors.Information Retrieval,2010,13(1):22-15.
5Langville A N,Meyer C D.Deeper Inside Pagerank.Internet Mathematics,2004,1(3):335-400.
6Qian M P,Gong G L.Stochastic Process.Second Version.Beijing:Peking University Press (in Chinese),1997.
7Gyongyi Z,Garcia-Molina H,Pedersen J.Combating Web Spam with Trustrank.In VLDB '04:Proceedings of the Thirtieth international conference on Very large data bases,2004,576-587.
8Stroock D W.An Introduction to Markov Processes (Graduate Texts in Mathematics).New York:Springer-Verlag,2006.
9Stewart W J.Introduction to the Numerical Solution of Markov Chains.Princeton:Princeton University Press,1994.
10Wang Z K,Yang X Q.Birth and Death Processes and Markov Chains.New York:Springer-Verlag,1992.

共引文献8

1刘娟.有负顾客的M/G/1重试可修排队系统的极限分布[J].华东交通大学学报,2007,24(1):165-167. 被引量：1
2刘娟.一类有负顾客的GI/G/1重试可修排队系统的极限分布[J].浙江大学学报（理学版）,2008,35(2):133-137. 被引量：1
3刘卫国,刘再明.马尔可夫骨架过程在多Web服务器建模中的应用[J].铁道科学与工程学报,2008,5(6):92-96.
4LI Jun-ping HOU Zhen-ting.The Stability of Logistic Model with Random Impulse[J].Chinese Quarterly Journal of Mathematics,2009,24(1):1-9.
5张立欣,李占光,王会英,蒋青松.个人消费模型的马氏骨架方法[J].塔里木大学学报,2009,21(2):41-43. 被引量：1
6Zhen-ting HouSchool of Mathematical Sciences and Computing Technology, Central South University. Changsha 410075, China.Markov Skeleton Processes and Applications to Queueing Systems[J].Acta Mathematicae Applicatae Sinica,2002,18(4):537-552. 被引量：5
7贾兆丽,张帆,张曙光.基于马氏骨架过程下几种金融衍生品的定价问题研究[J].应用概率统计,2015,31(4):357-366. 被引量：1
8ZHANGDe-ran,MAOShi-song.On the Insurance Risk Models of CeneralArrival of Claims with Constant Interest Force[J].应用数学,2004,17(2):192-196.

同被引文献29

1朱礼军,陶兰,刘慧.领域本体中的概念相似度计算[J].华南理工大学学报（自然科学版）,2004,32(z1):147-150. 被引量：48
2李荣,杨冬,刘磊.基于本体的概念相似度计算方法研究[J].计算机研究与发展,2011,48(S3):312-317. 被引量：12
3曹犟,邬晓钧,夏云庆,郑方.基于拼音索引的中文模糊匹配算法[J].清华大学学报（自然科学版）,2009(S1):1328-1332. 被引量：14
4郑健珍,林坤辉,周昌乐,康恺.基于本体语义的定题爬虫[J].山东大学学报（理学版）,2006,41(3):106-110. 被引量：11
5肖明军,黄刘生,罗永龙.SHITS:一种基于超链接和内容的网页排序方法[J].小型微型计算机系统,2006,27(12):2177-2182. 被引量：6
6陈杰,蒋祖华.领域本体的概念相似度计算[J].计算机工程与应用,2006,42(33):163-166. 被引量：34
7PAGE L,BRIN S,MOTWANI R,et al.The PageRank Ci-tation Ranking:Bringing order to the Web[EB/OL].(1998-12-19)http://ilpubs.stanford.edu:8090/422,1998.
8TAHER H.Haveliwala.Topic-Sensitive PageRank[EB/OL].World Wide Web Conference Series-WWW,2002:517-526.
9MATTHEW RICHARDSON,PEDRO DOMINGOS.TheIntelligent Surfer:Probabilistic Combination of Link andContent Information in PageRank[EB/OL].Neural In-formation Processing Systems-NIPS,2001:1441-1448.
10TAHER H,HAVELIWALA.Efficient Computation ofPageRank[EB/OL].http://www.stanford.edu/～taherh/papers/efficient-pr.pdf,1999.

引证文献2

1李瑞,郭小溪.PageRank算法权威值均分的改进[J].大连交通大学学报,2013,34(2):109-112. 被引量：3
2张健,冯飞,刘宇,马红烨.基于本体概念相似度的网页排序算法研究[J].情报学报,2013,32(11):1174-1183. 被引量：1

二级引证文献4

1洪婕,张健,胡亮.基于领域本体知识库的专业搜索引擎查询推荐算法研究--以盐湖化工领域为例[J].情报学报,2014,33(10):1091-1098. 被引量：5
2王锐,何聚厚.基于领域本体学习资源库自动构建模型研究[J].电子设计工程,2015,23(24):32-35. 被引量：10
3王冲,王凯+.基于用户反馈特征聚合的网页排序算法[J].计算机工程与设计,2017,38(8):2020-2024. 被引量：3
4聂永丹,王斌,张岩.基于改进PageRank算法的文献相关度排序方法[J].吉林大学学报（信息科学版）,2022,40(3):464-470. 被引量：3

1陈静,朱巧明,贡正仙.基于Ontology的信息抽取研究综述[J].计算机技术与发展,2007,17(10):84-86. 被引量：10
2WANG.增加BT搜索引擎，让IE更智能[J].计算机应用文摘,2007(11X):108-108.
3邹华军,张爱强,曾育星.基于网络编程技术实现INTERNET上多搜索引擎信息的获取[J].电脑编程技巧与维护,1999(6):40-42. 被引量：1
4郑瑾,王斌,陈松乔.Java Bean构件搜索引擎[J].计算机工程,2003,29(20):45-46.
5王江涤.多搜索引擎的设计与实现[J].哈尔滨理工大学学报,2004,9(3):125-127. 被引量：1
6邹华军,张爱强,曾育星.基于网络编程技术实现Internet上多搜索引擎信息的获取[J].微型机与应用,1999,18(9):30-32. 被引量：1
7胡晟,季志远,程晓荣.基于数据挖掘的主题种子站点提取器的研究[J].软件,2013,34(2):56-57. 被引量：6
8赵贻竹,鲁宏伟,郭俊甫.Google硬件体系结构分析[J].计算机工程与科学,2007,29(9):45-48.
9引火虫.为IE8 Beta2添加自己喜爱的搜索引擎[J].电脑迷,2008,0(19):69-69.
10Chrome浏览器怎么用？[J].计算机应用文摘,2009(10):58-58.

中国科学：数学

2011年第12期

浏览历史

内容加载中请稍等...

网页排序中的随机模型及算法被引量：2

参考文献2

二级参考文献11

共引文献8

同被引文献29

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

网页排序中的随机模型及算法 被引量：2

参考文献2

二级参考文献11

共引文献8

同被引文献29

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

网页排序中的随机模型及算法被引量：2