摘要
大数据的挖掘是当今的研究热点,也有着巨大的商业价值。新型框架Spark部署在Hadoop平台上,它的机器学习算法几乎可以完全替代传统的Mahout Map Reduce的编程模式,但由于Spark的内存模型特点,执行速度快。该文研究了Spark中的机器学习中的聚类算法KMeans,先分析了算法思想,再通过实验分析其应用的方法,然后通过实验结果分析其应用场景和不足。
Mining big data is current research hotspot, also have a huge commercial value.A new framework of Spark is deployed on the Hadoop platform, in which machine learning algorithms can be almost completely replace the traditional Mahout Map Reduce programming mode. But the characteristics of Spark memory model, efficiency of execution is high. This paper studies the KMeans clustering algorithm in Spark machine learning。The first analyze the idea of the algorithm, and then through the experimental analyze method and its application, and then through results of experimental analyze its application scenarios and lacks.
作者
陈虹君
CHEN Hong-jun (ChengDu College of University Of Electronic Science And Technology of China, Chengdu 611731, China)
出处
《电脑知识与技术》
2015年第2期56-57,60,共3页
Computer Knowledge and Technology