期刊文献+

增量式迭代计算模型研究与实现 被引量:8

Research and Implementation Incremental Iterative Model
在线阅读 下载PDF
导出
摘要 不动点迭代广泛存在于数据挖掘和机器学习算法中,这些算法已应用到诸如社会网络分析、高性能计算、推荐系统、搜索引擎、模式识别等诸多领域中.在云计算环境中,利用MapReduce编程模型所带来的便利,通过普通的PC集群运行相应的迭代算法,可以提高迭代算法的执行效率.但由于数据的快速变化,每当数据发生改变,整个迭代算法也需要重新运行,这将会导致大量的运算资源浪费和性能损失.文中研究基于原始迭代结果和新增数据的增量迭代计算DELTA(Delta data based incrEmentaL iTerAtive computing),并提出DELTA模型以解决上述问题.文中理论证明了DELTA模型的正确性,阐述了其适用范围,并列举了PageRank、K-means和Descendant Query算法在DELTA模型中的运用.文中还扩展HaLoop为ΔHaLoop框架,使其支持增量式的迭代计算.通过一系列的测试用例,对DELTA模型功能、性能进行了分析和讨论,实验结果表明DELTA模型在获得准确的迭代结果的基础上性能优势明显.文中提出的DELTA模型能够适应多数迭代算法,对云计算环境下的迭代计算的应用和优化起到推动作用. The fixed point iterative algorithms widely exist in the area of data mining and machine learning, which have been applied in many fields, such as social network analysis, high- performance computing and recommended system. In cloud computing environment, we can utilize the convenience brought by MapReduce to improve the efficiency of iterative algorithms on big data through running the algorithm on larger PC-cluster. However, the entire iterative algorithm has to be re-executed when new data is introduced, which cause large amount of computing resource wastes and performance losses. In this paper, the original iterative results new data based incremental iterative computing, which is named as DELTA(Delta data based incrEmentaL iTerAtive computing), is well studied, and the corresponding DELTA model is proposed. We prove the correctness of the model, and describe the application scope. Then, the application cases of DELTA model applying on the iterative algorithms are enumerated, such like PageRank,K-means and Descendant Query. Finally, AHaLoop is implemented by extending HaLoop to support DELTA model. A series of test cases are designed to analyze the DELTA model on functionality and performance. The results show that the model improves the iteration performance without any loss of accuracy. The DELTA model proposed in this paper can adapt many iterative algorithms, which promotes the application and optimization of iterative algorithms in cloud computing environment.
出处 《计算机学报》 EI CSCD 北大核心 2016年第1期109-125,共17页 Chinese Journal of Computers
基金 国家自然科学基金(61433008 61202088 61272179 61173028) 教育部博士点基金(20130042120006) 教育部-中国移动科研基金项目(MCM20125021) 中国博士后科学基金面上基金(2013M540232) 中央高校基本科研业务费专项资金(N130417001) 辽宁省博士启动基金(201403314)资助
关键词 云计算 大数据 MAPREDUCE 迭代计算 增量迭代 cloud computing big data MapReduce iterative computing incremental iterative
  • 相关文献

参考文献30

  • 1Page L, Sergey B, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, USA: Technical Report 422, 1999.
  • 2Kiri W, Claire C, Seth R, Stefan S. Constrained K-means clustering with background knowledge//Proceedings of the 18th International Conference on Machine Learning. Williamstown, USA, 2001:577-584.
  • 3Breese J S, Heckerman D, Kadie C. Empirical analysis of predictive algorithms for collaborative filtering:/Proceedings of the Uncertainty in Artificial Intelligence. San Francisco, USA, 1998: 43-52.
  • 4Mingmin C. Bruzzone L. A novel transductive SVM for semisupervised classification of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 2006, 44(11) : 3363-3373.
  • 5孟小峰,慈祥.大数据管理:概念、技术与挑战[J].计算机研究与发展,2013,50(1):146-169. 被引量:2439
  • 6王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望[J].计算机学报,2011,34(10):1741-1752. 被引量:621
  • 7Dean J, Sanjay G. MapReduce: Simplified data processing on large elusters//Proceedings of the 6th Symposium on Operating Systems Design and Implementation. Berkeley, USA, 2004: 137-149.
  • 8Bu Yingyi, Howe B, Balazinska M, Ernst M D. The HaLoop approach to large-scale iterative data analysis. The International Journal on Very Large Data Bases, 2012, 21 (2) : 169-190.
  • 9Daniel P, Frank D. Large-scale incremental processing using distributed transactions and notifications//Proceedings of the 9th Symposium on Operating Systems Design and Implemen- tation. Berkeley, USA, 2010:137-149.
  • 10Bhatotia P, Wieder A, Rodrigues R, et al. Ineoop: MapReduce for incremental computations//Proceedings of the 2nd ACM Symposium on Cloud Computing. Cascais, Portugal, 2011, 7.

二级参考文献209

  • 1[OL].<http://hadoop.apache.org.>.
  • 2WinterCorp: 2005 TopTen Program Summary. http:// www. wintercorp, com/WhitePapers/WC TopTenWP. pdf.
  • 3TDWI Checklist Report: Big Data Analytics. http://tdwi. org/research/2010/08/Big-Data-Analytics, aspx.
  • 4Chaudhuri S, Dayal U. An overview of data warehousing and OLAP technology. SIGMOD Rec, 1997,26(1): 65-74.
  • 5Madden S, DeWitt D J, Stonebraker M. Database parallelism choices greatly impact scalability. DatabaseColumn Blog. http://www, databasecolumn, com/2007/10/database-parallelism-choices, html.
  • 6Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters//Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI ' 04). San Francisco, California, USA, 2004: 137-150.
  • 7DeWitt D J, Gerber R H, Graefe G, Heytens M L, Kumar K B, Muralikrishna M. GAMMA--A high performance dataflow database machine//Proceedings of the 12th International Conference on Very Large Data Bases (VLDB' 86). Kyoto, Japan, 1986:228-237.
  • 8Fushimi S, Kitsuregawa M, Tanaka H. An overview of the system software of a parallel relational database machine// Proceedings of the 12th International Conference on Very Large DataBases(VLDB'86). Kyoto, Japan, 1986:209-219.
  • 9Brewer E A. Towards robust distributed systems//Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC' 00). Portland, Oregon, USA, 2000:7.
  • 10http: //www. dbms2, com/2008/08/26/known-applications of mapreduce/.

共引文献2861

同被引文献84

引证文献8

二级引证文献118

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部