How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data cente...How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.展开更多
Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in...Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning.展开更多
以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD...以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD)框架。使用融合信息重排序技术预测旅游需求,具体根据图引导结构模拟历史变量对未来变量的滞后影响。每个变量通过时间维度上的卷积神经网络(Convolutional Neural Network,CNN)进行独立编码,利用二分图动态建模滞后效应,通过图聚合进行挖掘,实现对旅游需求的精准预测。基于上述技术,构建旅游需求预测系统,旅游者能够根据需求检索不同景点的信息。在真实数据集上进行大量实验,结果表明所提出的MTIABD框架在一步和多步预测方面均优于现有方法。在平均绝对百分比误差(Mean Absolute Percentage Error,MAPE)指标下,相较于基于实例的多变量时间序列图预测框架(Instance-wise Graph-rased Framework for Multivariate Time Series Forecasting,IGMTF),MTIABD在HK-2021数据集上的性能提高了16.75%,在MO-2021数据集上的性能提高了19.79%。展开更多
基金supported by the National Natural Science Foundation of China(6120200461272084)+9 种基金the National Key Basic Research Program of China(973 Program)(2011CB302903)the Specialized Research Fund for the Doctoral Program of Higher Education(2009322312000120113223110003)the China Postdoctoral Science Foundation Funded Project(2011M5000952012T50514)the Natural Science Foundation of Jiangsu Province(BK2011754BK2009426)the Jiangsu Postdoctoral Science Foundation Funded Project(1102103C)the Natural Science Fund of Higher Education of Jiangsu Province(12KJB520007)the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions(yx002001)
文摘How to effectively reduce the energy consumption of large-scale data centers is a key issue in cloud computing. This paper presents a novel low-power task scheduling algorithm (L3SA) for large-scale cloud data centers. The winner tree is introduced to make the data nodes as the leaf nodes of the tree and the final winner on the purpose of reducing energy consumption is selected. The complexity of large-scale cloud data centers is fully consider, and the task comparson coefficient is defined to make task scheduling strategy more reasonable. Experiments and performance analysis show that the proposed algorithm can effectively improve the node utilization, and reduce the overall power consumption of the cloud data center.
文摘Data quality has exerted important influence over the application of grain big data, so data cleaning is a necessary and important work. In MapReduce frame, parallel technique is often used to execute data cleaning in high scalability mode, but due to the lack of effective design, there are amounts of computing redundancy in the process of data cleaning, which results in lower performance. In this research, we found that some tasks often are carried out multiple times on same input files, or require same operation results in the process of data cleaning. For this problem, we proposed a new optimization technique that is based on task merge. By merging simple or redundancy computations on same input files, the number of the loop computation in MapReduce can be reduced greatly. The experiment shows, by this means, the overall system runtime is significantly reduced, which proves that the process of data cleaning is optimized. In this paper, we optimized several modules of data cleaning such as entity identification, inconsistent data restoration, and missing value filling. Experimental results show that the proposed method in this paper can increase efficiency for grain big data cleaning.
文摘以旅游大数据为基础,考虑长时间范围内的滞后效应以及不同搜索强度指数(Search Intensity Index,SII)之间的多任务影响,提出一种基于大数据的多任务旅游信息分析(Multi-tasking Tourism Information Analysis Based on Big Data,MTIABD)框架。使用融合信息重排序技术预测旅游需求,具体根据图引导结构模拟历史变量对未来变量的滞后影响。每个变量通过时间维度上的卷积神经网络(Convolutional Neural Network,CNN)进行独立编码,利用二分图动态建模滞后效应,通过图聚合进行挖掘,实现对旅游需求的精准预测。基于上述技术,构建旅游需求预测系统,旅游者能够根据需求检索不同景点的信息。在真实数据集上进行大量实验,结果表明所提出的MTIABD框架在一步和多步预测方面均优于现有方法。在平均绝对百分比误差(Mean Absolute Percentage Error,MAPE)指标下,相较于基于实例的多变量时间序列图预测框架(Instance-wise Graph-rased Framework for Multivariate Time Series Forecasting,IGMTF),MTIABD在HK-2021数据集上的性能提高了16.75%,在MO-2021数据集上的性能提高了19.79%。