期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
Energy Cost Minimization Using String Matching Algorithm in Geo-Distributed Data Centers
1
作者 Muhammad Imran Khan Khalil Syed Adeel Ali Shah +3 位作者 Izaz Ahmad Khan Mohammad Hijji Muhammad Shiraz Qaisar Shaheen 《Computers, Materials & Continua》 SCIE EI 2023年第6期6305-6322,共18页
Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due ... Data centers are being distributed worldwide by cloud service providers(CSPs)to save energy costs through efficient workload alloca-tion strategies.Many CSPs are challenged by the significant rise in user demands due to their extensive energy consumption during workload pro-cessing.Numerous research studies have examined distinct operating cost mitigation techniques for geo-distributed data centers(DCs).However,oper-ating cost savings during workload processing,which also considers string-matching techniques in geo-distributed DCs,remains unexplored.In this research,we propose a novel string matching-based geographical load balanc-ing(SMGLB)technique to mitigate the operating cost of the geo-distributed DC.The primary goal of this study is to use a string-matching algorithm(i.e.,Boyer Moore)to compare the contents of incoming workloads to those of documents that have already been processed in a data center.A successful match prevents the global load balancer from sending the user’s request to a data center for processing and displaying the results of the previously processed workload to the user to save energy.On the contrary,if no match can be discovered,the global load balancer will allocate the incoming workload to a specific DC for processing considering variable energy prices,the number of active servers,on-site green energy,and traces of incoming workload.The results of numerical evaluations show that the SMGLB can minimize the operating expenses of the geo-distributed data centers more than the existing workload distribution techniques. 展开更多
关键词 String matching OPTIMIZATION geo-distributed data centers geographical load balancing green energy
在线阅读 下载PDF
Carbon-Aware Energy Cost Optimization of Data Analytics Across Geo-Distributed Data Centers
2
作者 Yi-Ting Chen Lai-Long Luo +1 位作者 De-Ke Guo Qian He 《Journal of Computer Science & Technology》 2025年第3期654-670,共17页
The amount and scale of worldwide data centers grow rapidly in the era of big data,leading to massive energy consumption and formidable carbon emission.To achieve the efficient and sustainable development of informati... The amount and scale of worldwide data centers grow rapidly in the era of big data,leading to massive energy consumption and formidable carbon emission.To achieve the efficient and sustainable development of information technology(IT)industry,researchers propose to schedule data or data analytics jobs to data centers with low electricity prices and carbon emission rates.However,due to the highly heterogeneous and dynamic nature of geo-distributed data centers in terms of resource capacity,electricity price,and the rate of carbon emissions,it is quite difficult to optimize the electricity cost and carbon emission of data centers over a long period.In this paper,we propose an energy-aware data backup and job scheduling method with minimal cost(EDJC)to minimize the electricity cost of geo-distributed data analytics jobs,and simultaneously ensure the long-term carbon emission budget of each data center.Specifically,we firstly design a cost-effective data backup algorithm to generate a data backup strategy that minimizes cost based on historical job requirements.After that,based on the data backup strategy,we utilize an online carbon-aware job scheduling algorithm to calculate the job scheduling strategy in each time slot.In this algorithm,we use the Lyapunov optimization to decompose the long-term job scheduling optimization problem into a series of real-time job scheduling optimization subproblems,and thereby minimize the electricity cost and satisfy the budget of carbon emission.The experimental results show that the EDJC method can significantly reduce the total electricity cost of the data center and meet the carbon emission constraints of the data center at the same time. 展开更多
关键词 data analytics geo-distributed data center carbon emission energy cost
原文传递
Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers 被引量:3
3
作者 Jinghui Zhang Jian Chen +1 位作者 Junzhou Luo Aibo Song 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第5期471-481,共11页
Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have ... Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost. 展开更多
关键词 data placement geo-distributed data center Lagrangian relaxation
原文传递
CRL: Efficient Concurrent Regeneration Codes with Local Reconstruction in Geo-Distributed Storage Systems 被引量:1
4
作者 Quan-Qing Xu Wei-Ya Xi +1 位作者 Khai Leong Yong Chao Jin 《Journal of Computer Science & Technology》 SCIE EI CSCD 2018年第6期1140-1151,共12页
As a typical erasure coding choice, Reed-Solomon (RS) codes have such high repair cost that there is a penaltyfor high reliability and storage efficiency, thereby they are not suitable in geo-distributed storage sys... As a typical erasure coding choice, Reed-Solomon (RS) codes have such high repair cost that there is a penaltyfor high reliability and storage efficiency, thereby they are not suitable in geo-distributed storage systems. We present anovel family of concurrent regeneration codes with local reconstruction (CRL) in this paper. The CRL codes enjoy threebenefits. Firstly, they are able to minimize the network bandwidth for node repair. Secondly, they can reduce the numberof accessed nodes by calculating parities from a subset of data chunks and using an implied parity chunk. Thirdly, they arefaster than existing erasure codes for reconstruction in geo-distributed storage systems. In addition, we demonstrate howthe CRL codes overcome the limitations of the Reed-Solomon codes. We also illustrate analytically that they are excellent inthe trade-off between chunk locality and minimum distance. Furthermore, we present theoretical analysis including latencyanalysis and reliability analysis for the CRL codes. By using quantity comparisons, we prove that CRL(6, 2, 2) is only0.657x of Azure LRC(6, 2, 2), where there are six data chunks, two global parities, and two local parities, and CRL(10,4, 2) is only 0.656x of HDFS-Xorbas(10, 4, 2), where there are 10 data chunks, four local parities, and two global paritiesrespectively, in terms of data reconstruction times. Our experimental results show the performance of CRL by conductingperformance evaluations in both two kinds of environments: 1) it is at least 57.25% and 66.85% more than its competitorsin terms of encoding and decoding throughputs in memory, and 2) it has at least 1.46x and 1.21x higher encoding anddecoding throughputs than its competitors in JBOD (Just a Bunch Of Disks). We also illustrate that CRL is 28.79% and30.19% more than LRC on encoding and decoding throughputs in a geo-distributed environment. 展开更多
关键词 CONCURRENT REGENERATION CODE local reconstruction geo-distributed storage system
原文传递
Training Large Models on Heterogeneous and Geo-Distributed Resource with Constricted Networks
5
作者 Zan Zong Minkun Guo +3 位作者 Mingshu Zhai Yinan Tang Jianjiang Li Jidong Zhai 《Big Data Mining and Analytics》 2025年第4期966-980,共15页
As the computational demands driven by large model technologies continue to grow rapidly,leveraging GPU hardware to expedite parallel training processes has emerged as a commonly-used strategy.When computational resou... As the computational demands driven by large model technologies continue to grow rapidly,leveraging GPU hardware to expedite parallel training processes has emerged as a commonly-used strategy.When computational resources within a single cluster are insufficient for large-model training,the hybrid utilization of heterogeneous acceleration hardware has emerged as a promising technical solution.The utilization of heterogeneous acceleration hardware and scheduling of diverse cloud resources have become a focal point of considerable interest.However,these computing resources are often geographically distributed.Due to the lack of awareness of heterogeneous devices and network topologies,existing parallel training frameworks struggle to leverage mixed GPU resources across constrained networks effectively.To boost the computing capability of the connected heterogeneous clusters,we propose HGTrainer,an optimizer designed to plan heterogeneous parallel strategies across distributed clusters for large model training.HGTrainer can adaptively saturate heterogeneous clusters because of the expanded tunable parallelism space for heterogeneous accelerators,with the awareness of relatively lower inter-cluster bandwidth.To achieve this goal,we formulate the model partitioning problem among heterogeneous hardware and introduce a hierarchical searching algorithm to solve the optimization problem.Besides,a mixed-precision pipeline method is used to reduce the cost of inter-cluster communications.We evaluate HGTrainer on heterogeneous connected clusters with popular large language models.The experimental result shows that HGTrainer effectively improves 1.49×training throughput on average for the mixed heterogeneous cluster compared with the state-of-the-art Metis. 展开更多
关键词 deep learning system large model training heterogeneous geo-distributed clusters
原文传递
Wide Area Analytics for Geographically Distributed Datacenters 被引量:1
6
作者 Siqi Ji Baochun Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2016年第2期125-135,共11页
Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single d... Big data analytics, the process of organizing and analyzing data to get useful information, is one of the primary uses of cloud services today. Traditionally, collections of data are stored and processed in a single datacenter. As the volume of data grows at a tremendous rate, it is less efficient for only one datacenter to handle such large volumes of data from a performance point of view. Large cloud service providers are deploying datacenters geographically around the world for better performance and availability. A widely used approach for analytics of gee-distributed data is the centralized approach, which aggregates all the raw data from local datacenters to a central datacenter. However, it has been observed that this approach consumes a significant amount of bandwidth, leading to worse performance. A number of mechanisms have been proposed to achieve optimal performance when data analytics are performed over geo-distributed datacenters. In this paper, we present a survey on the representative mechanisms proposed in the literature for wide area analytics. We discuss basic ideas, present proposed architectures and mechanisms, and discuss several examples to illustrate existing work. We point out the limitations of these mechanisms, give comparisons, and conclude with our thoughts on future research directions. 展开更多
关键词 big data ANALYTICS geo-distributed datacenters
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部