期刊文献+

Nutch中PageRank的并行实现 被引量:4

Parallel realization of PageRank algorithm on Nutch
在线阅读 下载PDF
导出
摘要 针对目前Nutch搜索引擎中没有实现PageRank计算的缺憾,在分析和研究经典PageRank算法的基础上,通过设置控制站外与站内链接的比重因子对该算法进行了改进。利用MapReduce处理大数据集的优势,在Nutch机群系统上设计并实现了基于MapReduce的PageRank分布式并行算法。实验结果表明,处理的数据量越大,机群中的节点越多,计算PageRank的效率越高;另外,该分布式并行算法具有较好的可扩展性。 Presently,in view of Nutch search engine disappointment which has not realized the PageRank computation,after the classical PageRank algorithm is analyzed and studied,this algorithm is improved through establishing factor which controls the outside links and inside links proportion.Using the superiority of processing the big data set on MapReduce,the MapReduce-based PageRank distributional parallel algorithm is designed and implemented on Nutch compute clusters.Experiments show that the more processing data and cluster nodes are,the higher efficiency of computing PageRank is;moreover,this distributional parallel algorithm has good scalability.
作者 梁正友 潘涛
出处 《计算机工程与设计》 CSCD 北大核心 2010年第20期4354-4356,4409,共4页 Computer Engineering and Design
基金 广西科学基金项目(桂科自0832059)
关键词 Nutch搜索引擎 PAGERANK算法 MAPREDUCE模型 机群 并行计算 Nutch search engine PageRank algorithm MapReduce model compute clusters parallel computation
  • 相关文献

参考文献8

  • 1Hadoop.What is Hadoop[EB/OL].http://hadoop.apache.org,2009-07-25.
  • 2Tyrell Perera.Nutch-the Java search engine[EB/OL].http://wiki.apache.org/nutch,2009-08-25.
  • 3Dean J,Ghemawat S.MapReduce:Simplified data processing onlarge clusters[R].San Francisco:Google,2004.
  • 4Ralf Lammel.Google's MapReduce programming model-revisited[J].Science of Computer Programming,2008,70(1):168-172.
  • 5HDFS.The hadoop distributed file system:Architecture and design[EB/OL].http://hadoop.apache.org/common/docs/hdfs_design.html,2009-08-26.
  • 6Will Pugh.How to setup Nutch over a cluster of machines[EB/OL].http://wiki.apache.org/nutch,2009-08-28.
  • 7黄德才,戚华春.PageRank算法研究[J].计算机工程,2006,32(4):145-146. 被引量:68
  • 8王冬,雷景生,李壮.基于PageRank的页面排序改进算法[J].计算机工程与设计,2008,29(22):5921-5923. 被引量:11

二级参考文献18

共引文献73

同被引文献45

  • 1赵德平,刘阳,李鹏.基于Lucene的房产信息垂直搜索引擎的研究[J].沈阳建筑大学学报(自然科学版),2011,27(1):178-183. 被引量:6
  • 2[美]怀特.Hadoop权威指南[M].周傲英,曾大聃,译.北京:清华大学出版社,2010.
  • 3Haveliwala T H. Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15 (4): 784-796.
  • 4Erdinc Uzun, Hayri Volkan Agun, Tarlk Yerlikaya. A hybrid approach for extracting informative content from web pages [J]. Information Processing & Management, 2013 (4) : 928-944.
  • 5ZHANG Bin, WANG Xiaofei. Content extraction from Chinese web page based on title and content dependency tree [J]. Journal of China Universities of Posts and Telecommunications, 2012 (2): 147-151.
  • 6Wang G Alan, Jian Jiao, Alan S Abrahams, et al. ExpertRank:A topick-aware expert finding algorithm for online knowledge communities [J]. Decision Support System, 2013 (3):1442-1451.
  • 7Conesa J, Storey VC, Sugumaran V. Improving web query processing through semantic know-ledge [J]. Date& Know ledge Engineering, 2008, 66: 18-34.
  • 8Li Ding, Tim Finin, Anupam Joshi. Swoogle a semantic web search and metadata engine [C] //Washington DC: Procee- dings of the 13th ACM Conference on Information and Know- ledge Management, 2004: 652-659.
  • 9Nicola Guarino, Claudio Masolo, Guido Vetere. OntoSeek: Content-based access to the web [J]. IEEE Intelligent Systems, 2006, 14 (3): 70-80.
  • 10lawrence Page,Sergey Brin,Rajeev Motwani,et al. The PageRank Citation Ranking Bringing Order to the Web[R]. Techical Report, Stanford InfoLab, 1999.

引证文献4

二级引证文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部