摘要
智能电网环境下数据密集型应用往往涉及跨数据中心的数据传输和数据中心内的数据迁移,这对数据分布提出了新的挑战。为了充分利用计算存储资源,满足智能电网大规模数据的可靠存储和高效处理的实际需求,提出了基于云计算的数据密集型存储方法,该方法将数据集映射成数据空间的点集。设计了两阶段分类过程:第1阶段基于传统的K均值算法实现点集的初始分类;第2阶段针对各数据集与初始聚类的隶属关系,引入数据迁移的代价函数,对初始分类进行调节,实现数据集到数据中心的布局方案。实验结果表明,该算法能够有效提高数据存取效率并兼顾全局负载均衡。
In a distributed environment of the smart grid, data-intensive applications often involve complex transmissions between and within the data centers which may have to use large amounts of datasets. An application may need several datasets located in different data centers facing great challenges including the high cost of data movement between data centers and data dependency within the same center. Considering the efficient storage and management of large scale data in the smart grid, a two-stage strategy is proposed for the data placement. In the first stage, an initial classification is achieved by the K means; while in the second, datasets are placed in different centers by a clustering scheme based on the data dependency. Simulations show that the algorithm can effectively reduce the cost of data movement while performing an even data distribution.
出处
《电力系统自动化》
EI
CSCD
北大核心
2012年第12期66-70,100,共6页
Automation of Electric Power Systems
基金
国家电网公司科技项目(SG11075-1)~~
关键词
智能电网
云计算
数据分布
数据迁移
一致性哈希算法
smart grid
cloud computing
data distribution
data movement
consistent Hashing algorithm