摘要
作为描述电网运营状态的重要依据,用户实时用电量数据在各时间段内的均值、方差及离散傅里叶变换(DFT)分量等统计变换结果是用电行为特征提取的重要建模参数.基于密度聚类的DBSCAN算法能够对空间内样本点进行更加准确可靠的类别划分.层次聚类的子域划分、域内聚类和聚类合并的过程为DBSCAN算法的分布式实现提供重要参考.根据DBSCAN算法中的密度参数,保留各子簇的边界特征样本,使子簇合并过程的计算效率进一步提高.以Spark为代表的分布式内存计算系统将数据处理的中间结果存入内存,降低读写开销,为大规模数据的迭代分析提供快捷高效的处理环境.实验结果证明,在内存计算系统中实现的分布式DBSCAN聚类算法能够准确高效的实现大规模用户用电行为分析.
As an important basis for the description of grid operation state,the statistical transform results of user real-time consumption data in each period of time,including mean value,variance and the discrete Fourier transform( DFT) component,are important parameters of electricity behavior feature extraction. Based on the density clustering,the DBSCAN algorithm can be used to label the sample points more accurately and reliably. The process of hierarchical cluster,such as sub domain clustering and cluster merging,provides important reference for the distributed implementation of DBSCAN algorithm. According to the density parameter of the DBSCAN algorithm,the boundary feature samples of each sub cluster are reserved to improve the computation efficiency of cluster merging. Distributed memory computing system represented by Spark saves the intermediate results of data processing into memory to reduce the cost of reading and writing,which provides the fast and efficient processing environment for the iterative analysis of large-scale data.The experimental results show that the distributed DBSCAN clustering algorithm operating in the memory computing system can accurately and efficiently implement the analysis of large scale user's power consumption.
作者
赵永彬
陈硕
刘明
王佳楠
贲驰
ZHAO Yong-bin;CHEN Shuo;LIU Ming;WANG Jia-nan;BEN Chi(Information & Telecommunication Branch, State Grid Liaoning Electric Power Company, Shenyang 110004, China;Shenyang Institute of Computing Technology, Chinese Academy of Science, Shenyang 110168, China;University of Chinese Academy of Science, Beijing 100049, China;State Grid Electric Power Control Northeast Branch Center, Shenyang 110180, China)
出处
《小型微型计算机系统》
CSCD
北大核心
2018年第5期1108-1112,共5页
Journal of Chinese Computer Systems
基金
辽宁电力公司科技项目(SGLNXT00DKJS1600242)资助
关键词
行为提取
层次聚类
簇边界特征
分布式内存计算
behavior extraction
hierarchical clustering
cluster boundary feature
distributed memory computing