A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process u...A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process under a series of complex constraints,which is important for enhancing the matching between resources and requirements.A complex algorithm is not available because that the LEO on-board resources is limi-ted.The proposed genetic algorithm(GA)based on two-dimen-sional individual model and uncorrelated single paternal inheri-tance method is designed to support distributed computation to enhance the feasibility of on-board application.A distributed system composed of eight embedded devices is built to verify the algorithm.A typical scenario is built in the system to evalu-ate the resource allocation process,algorithm mathematical model,trigger strategy,and distributed computation architec-ture.According to the simulation and measurement results,the proposed algorithm can provide an allocation result for more than 1500 tasks in 14 s and the success rate is more than 91%in a typical scene.The response time is decreased by 40%com-pared with the conditional GA.展开更多
Infrastructure as a Service(IaaS)in cloud computing enables flexible resource distribution over the Internet,but achieving optimal scheduling remains a challenge.Effective resource allocation in cloud-based environmen...Infrastructure as a Service(IaaS)in cloud computing enables flexible resource distribution over the Internet,but achieving optimal scheduling remains a challenge.Effective resource allocation in cloud-based environments,particularly within the IaaS model,poses persistent challenges.Existing methods often struggle with slow opti-mization,imbalanced workload distribution,and inefficient use of available assets.These limitations result in longer processing times,increased operational expenses,and inadequate resource deployment,particularly under fluctuating demands.To overcome these issues,a novel Clustered Input-Oriented Salp Swarm Algorithm(CIOSSA)is introduced.This approach combines two distinct strategies:Task Splitting Agglomerative Clustering(TSAC)with an Input Oriented Salp Swarm Algorithm(IOSSA),which prioritizes tasks based on urgency,and a refined multi-leader model that accelerates optimization processes,enhancing both speed and accuracy.By continuously assessing system capacity before task distribution,the model ensures that assets are deployed effectively and costs are controlled.The dual-leader technique expands the potential solution space,leading to substantial gains in processing speed,cost-effectiveness,asset efficiency,and system throughput,as demonstrated by comprehensive tests.As a result,the suggested model performs better than existing approaches in terms of makespan,resource utilisation,throughput,and convergence speed,demonstrating that CIOSSA is scalable,reliable,and appropriate for the dynamic settings found in cloud computing.展开更多
Cloud computing is an advance computing model using which several applications,data and countless IT services are provided over the Internet.Task scheduling plays a crucial role in cloud computing systems.The issue of...Cloud computing is an advance computing model using which several applications,data and countless IT services are provided over the Internet.Task scheduling plays a crucial role in cloud computing systems.The issue of task scheduling can be viewed as the finding or searching an optimal mapping/assignment of set of subtasks of different tasks over the available set of resources so that we can achieve the desired goals for tasks.With the enlargement of users of cloud the tasks need to be scheduled.Cloud’s performance depends on the task scheduling algorithms used.Numerous algorithms have been submitted in the past to solve the task scheduling problem for heterogeneous network of computers.The existing research work proposes different methods for data intensive applications which are energy and deadline aware task scheduling method.As scientific workflow is combination of fine grain and coarse grain task.Every task scheduled to VM has system overhead.If multiple fine grain task are executing in scientific workflow,it increase the scheduling overhead.To overcome the scheduling overhead,multiple small tasks has been combined to large task,which decrease the scheduling overhead and improve the execution time of the workflow.Horizontal clustering has been used to cluster the fine grained task further replication technique has been combined.The proposed scheduling algorithm improves the performance metrics such as execution time and cost.Further this research can be extended with improved clustering technique and replication methods.展开更多
In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot ...In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n^3/N), the memory cost is O(n^2/N), the I/O cost is O(n^2/N), and the com- munication cost is O(Nn ), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 10^6 × 10^6 effectively.展开更多
Goud computing is a new paradigm in which dynamic and virtualized computing resources are provided as services over the Internet. However, because cloud resource is open and dynamically configured, resource allocation...Goud computing is a new paradigm in which dynamic and virtualized computing resources are provided as services over the Internet. However, because cloud resource is open and dynamically configured, resource allocation and scheduling are extremely important challenges in cloud infrastructure. Based on distributed agents, this paper presents trusted data acquisition mechanism for efficient scheduling cloud resources to satisfy various user requests. Our mechanism defines, collects and analyzes multiple key trust targets of cloud service resources based on historical information of servers in a cloud data center. As a result, using our trust computing mechanism, cloud providers can utilize their resources efficiently and also provide highly trusted resources and services to many users.展开更多
Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demand...Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demands of resource utilization and performance optimization.Scheduling strategy optimization,as a key direction for improving Spark’s execution efficiency,has attracted widespread attention.This paper first introduces the basic theories of Spark,compares several default scheduling strategies,and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency.Subsequently,existing scheduling optimization schemes are summarized based on three scheduling modes:load characteristics,cluster characteristics,and matching of both,and representative algorithms are analyzed in terms of performance indicators and applicable scenarios,comparing the advantages and disadvantages of different scheduling modes.The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments.Finally,the limitations of the existing schemes are analyzed,and prospects are envisioned.展开更多
Hadoop is a well-known parallel computing system for distributed computing and large-scale data processes.“Straggling”tasks,however,have a serious impact on task allocation and scheduling in a Hadoop system.Speculat...Hadoop is a well-known parallel computing system for distributed computing and large-scale data processes.“Straggling”tasks,however,have a serious impact on task allocation and scheduling in a Hadoop system.Speculative Execution(SE)is an efficient method of processing“Straggling”Tasks by monitoring real-time running status of tasks and then selectively backing up“Stragglers”in another node to increase the chance to complete the entire mission early.Present speculative execution strategies meet challenges on misjudgement of“Straggling”tasks and improper selection of backup nodes,which leads to inefficient implementation of speculative executive processes.This paper has proposed an Optimized Resource Scheduling strategy for Speculative Execution(ORSE)by introducing non-cooperative game schemes.The ORSE transforms the resource scheduling of backup tasks into a multi-party non-cooperative game problem,where the tasks are regarded as game participants,whilst total task execution time of the entire cluster as the utility function.In that case,the most benefit strategy can be implemented in each computing node when the game reaches a Nash equilibrium point,i.e.,the final resource scheduling scheme to be obtained.The strategy has been implemented in Hadoop-2.x.Experimental results depict that the ORSE can maintain the efficiency of speculative executive processes and improve fault-tolerant and computation performance under the circumstances of Normal Load,Busy Load and Busy Load with Skewed Data.展开更多
In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input...In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input data of these tasks will be preloaded before the tasks are scheduled. During the execution, the input data can be read from local nodes. Therefore, the delay can be hidden. The technique has been implemented in Hadoop-0. 20.1. The experiment results have shown that the technique reduces map tasks causing delay, and improves the performance of Hadoop MapRe- duce by 20%.展开更多
Development of computational agent organizations or “societies” has become the domiant computing paradigm in the arena of Distributed Artificial Intelligence, and many foreseeable future applications need agent orga...Development of computational agent organizations or “societies” has become the domiant computing paradigm in the arena of Distributed Artificial Intelligence, and many foreseeable future applications need agent organizations, in which diversified agents cooperate in a distributed manner, forming teams. In such scenarios, the agents would need to know each other in order to facilitate the interactions. Moreover, agents in such an environment are not statically defined in advance but they can adaptively enter and leave an organization. This begs the question of how agents locate each other in order to cooperate in achieving organizational goals. Locating agents is a quite challenging task, especially in organizations that involve a large number of agents and where the resource avaiability is intermittent. The authors explore here an approach based on self organization map (SOM) which will serve as a clustering method in the light of the knowledge gathered about various agents. The approach begins by categorizing agents using a selected set of agent properties. These categories are used to derive various ranks and a distance matrix. The SOM algorithm uses this matrix as input to obtain clusters of agents. These clusters reduce the search space, resulting in a relatively short agent search time.展开更多
Task scheduling is a key problem for the distributed computation. This thesis analyzes receiver initiated(RI) task scheduling algorithm, finds its weakness and presents an improved algorithm PRI algorithm. This algo...Task scheduling is a key problem for the distributed computation. This thesis analyzes receiver initiated(RI) task scheduling algorithm, finds its weakness and presents an improved algorithm PRI algorithm. This algorithm schedules the concurrent tasks onto network of workstation dynamically at runtime, and initiates task scheduling by the node of low load. The threshold on each node can be modified according to the system information which is periodically detected. Meanwhile, the detecting period can be adjusted in terms of the change of the system state. The result of the experiments shows that the PRI algorithm is superior to the RI algorithm.展开更多
Edge machine learning creates a new computational paradigm by enabling the deployment of intelligent applications at the network edge.It enhances application efficiency and responsiveness by performing inference and t...Edge machine learning creates a new computational paradigm by enabling the deployment of intelligent applications at the network edge.It enhances application efficiency and responsiveness by performing inference and training tasks closer to data sources.However,it encounters several challenges in practice.The variance in hardware specifications and performance across different devices presents a major issue for the training and inference tasks.Additionally,edge devices typically possess limited network bandwidth and computing resources compared with data centers.Moreover,existing distributed training architectures often fail to consider the constraints of resources and communication efficiency in edge environments.In this paper,we propose DSparse,a method for distributed training based on sparse update in edge clusters with various memory capacities.It aims at maximizing the utilization of memory resources across all devices within a cluster.To reduce memory consumption during the training process,we adopt sparse update to prioritize the updating of selected layers on the devices in the cluster,which not only lowers memory usage but also reduces the data volume of parameters and the time required for parameter aggregation.Furthermore,DSparse utilizes a parameter aggregation mechanism based on multi-process groups,subdividing the aggregation tasks into AllReduce and Broadcast types,thereby further reducing the communication frequency for parameter aggregation.Experimental results using the MobileNetV2 model on the CIFAR-10 dataset demonstrate that DSparse reduces memory consumption by an average of 59.6%across seven devices,with a 75.4%reduction in parameter aggregation time,while maintaining model precision.展开更多
High penetration of renewable energies enlarge the peak-valley difference of the net load of the distribution system,which puts forward higher requirements for the operation scheduling of the distribution system.From ...High penetration of renewable energies enlarge the peak-valley difference of the net load of the distribution system,which puts forward higher requirements for the operation scheduling of the distribution system.From the perspective of leveraging demand-side adjustment capabilities,an optimal scheduling method of the distribution system with edge computing and data-driven modeling of price-based demand response(PBDR)is proposed.By introducing the edge computing paradigm,a collaborative interaction framework between the control center and the edge nodes is designed for the optimization of the distribution system.At the edge nodes,a classified XGBoost-based PBDR modeling method is proposed for large-scale differentiated users.At the control center,a two-stage optimization method integrating pre-scheduling and re-scheduling is proposed based on demand response results from all edge nodes.Through the information interaction between the control center and edge nodes,the optimized scheduling of the distribution system with large-scale users is realized.Finally,a case study is implemented on the modified IEEE 33-node system,which verifies that the proposed classified modeling method has lower errors,and it is beneficial to improve the economics of the system operation.Moreover,the simulation results show that the application of edge computing can significantly reduce the calculation time of the optimal scheduling problem with PBDR modeling of large-scale users.展开更多
This paper proposes a prediction engine designed for non-dedicated clusters, which is able to estimate the turnaround time for parallel applications, even in the presence of serial workload of the workstation owner. T...This paper proposes a prediction engine designed for non-dedicated clusters, which is able to estimate the turnaround time for parallel applications, even in the presence of serial workload of the workstation owner. The prediction engine can be configured to work with three different estimation kernels: a Historical kernel, a Simulation kernel based on analytical models and an integration of both, named Hybrid kernel. These estimation proposals were integrated into a scheduling system, named CISNE, which can be executed in an on-line or off-line mode. The accuracy of the proposed estimation methods was evaluated in relation to different job scheduling policies in a real and a simulated cluster environment. In both environments, we observed that the Hybrid system gives the best results because it combines the ability of a simulation engine to capture the dynamism of a non-dedicated environment together with the accuracy of the historical methods to estimate the application runtime considering the state of the resources.展开更多
In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the chal- lenges on resource utilization and manageability to data cen- ters. Many resource and runtime management systems a...In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the chal- lenges on resource utilization and manageability to data cen- ters. Many resource and runtime management systems are developed or evolved to address these challenges and rele- vant problems from different perspectives. This paper tries to identify the main motivations, key concerns, common fea- tures, and representative solutions of such systems through a survey and analysis. A typical kind of these systems is gener- alized as the consolidated cluster system, whose design goal is identified as reducing the overall costs under the quality of service premise. A survey on this kind of systems is given, and the critical issues concerned by such systems are sum- marized as resource consolidation and runtime coordination. These two issues are analyzed and classified according to the design styles and external characteristics abstracted from the surveyed work. Five representative consolidated cluster systems from both academia and industry are illustrated and compared in detail based on the analysis and classifications. We hope this survey and analysis to be conducive to both de- sign implementation and technology selection of this kind of systems, in response to the constantly emerging challenges on infrastructure and application management in data centers.展开更多
Load-time series data in mobile cloud computing of Internet of Vehicles(IoV)usually have linear and nonlinear composite characteristics.In order to accurately describe the dynamic change trend of such loads,this study...Load-time series data in mobile cloud computing of Internet of Vehicles(IoV)usually have linear and nonlinear composite characteristics.In order to accurately describe the dynamic change trend of such loads,this study designs a load prediction method by using the resource scheduling model for mobile cloud computing of IoV.Firstly,a chaotic analysis algorithm is implemented to process the load-time series,while some learning samples of load prediction are constructed.Secondly,a support vector machine(SVM)is used to establish a load prediction model,and an improved artificial bee colony(IABC)function is designed to enhance the learning ability of the SVM.Finally,a CloudSim simulation platform is created to select the perminute CPU load history data in the mobile cloud computing system,which is composed of 50 vehicles as the data set;and a comparison experiment is conducted by using a grey model,a back propagation neural network,a radial basis function(RBF)neural network and a RBF kernel function of SVM.As shown in the experimental results,the prediction accuracy of the method proposed in this study is significantly higher than other models,with a significantly reduced real-time prediction error for resource loading in mobile cloud environments.Compared with single-prediction models,the prediction method proposed can build up multidimensional time series in capturing complex load time series,fit and describe the load change trends,approximate the load time variability more precisely,and deliver strong generalization ability to load prediction models for mobile cloud computing resources.展开更多
基金This work was supported by the National Key Research and Development Program of China(2021YFB2900603)the National Natural Science Foundation of China(61831008).
文摘A dynamic multi-beam resource allocation algorithm for large low Earth orbit(LEO)constellation based on on-board distributed computing is proposed in this paper.The allocation is a combinatorial optimization process under a series of complex constraints,which is important for enhancing the matching between resources and requirements.A complex algorithm is not available because that the LEO on-board resources is limi-ted.The proposed genetic algorithm(GA)based on two-dimen-sional individual model and uncorrelated single paternal inheri-tance method is designed to support distributed computation to enhance the feasibility of on-board application.A distributed system composed of eight embedded devices is built to verify the algorithm.A typical scenario is built in the system to evalu-ate the resource allocation process,algorithm mathematical model,trigger strategy,and distributed computation architec-ture.According to the simulation and measurement results,the proposed algorithm can provide an allocation result for more than 1500 tasks in 14 s and the success rate is more than 91%in a typical scene.The response time is decreased by 40%com-pared with the conditional GA.
文摘Infrastructure as a Service(IaaS)in cloud computing enables flexible resource distribution over the Internet,but achieving optimal scheduling remains a challenge.Effective resource allocation in cloud-based environments,particularly within the IaaS model,poses persistent challenges.Existing methods often struggle with slow opti-mization,imbalanced workload distribution,and inefficient use of available assets.These limitations result in longer processing times,increased operational expenses,and inadequate resource deployment,particularly under fluctuating demands.To overcome these issues,a novel Clustered Input-Oriented Salp Swarm Algorithm(CIOSSA)is introduced.This approach combines two distinct strategies:Task Splitting Agglomerative Clustering(TSAC)with an Input Oriented Salp Swarm Algorithm(IOSSA),which prioritizes tasks based on urgency,and a refined multi-leader model that accelerates optimization processes,enhancing both speed and accuracy.By continuously assessing system capacity before task distribution,the model ensures that assets are deployed effectively and costs are controlled.The dual-leader technique expands the potential solution space,leading to substantial gains in processing speed,cost-effectiveness,asset efficiency,and system throughput,as demonstrated by comprehensive tests.As a result,the suggested model performs better than existing approaches in terms of makespan,resource utilisation,throughput,and convergence speed,demonstrating that CIOSSA is scalable,reliable,and appropriate for the dynamic settings found in cloud computing.
文摘Cloud computing is an advance computing model using which several applications,data and countless IT services are provided over the Internet.Task scheduling plays a crucial role in cloud computing systems.The issue of task scheduling can be viewed as the finding or searching an optimal mapping/assignment of set of subtasks of different tasks over the available set of resources so that we can achieve the desired goals for tasks.With the enlargement of users of cloud the tasks need to be scheduled.Cloud’s performance depends on the task scheduling algorithms used.Numerous algorithms have been submitted in the past to solve the task scheduling problem for heterogeneous network of computers.The existing research work proposes different methods for data intensive applications which are energy and deadline aware task scheduling method.As scientific workflow is combination of fine grain and coarse grain task.Every task scheduled to VM has system overhead.If multiple fine grain task are executing in scientific workflow,it increase the scheduling overhead.To overcome the scheduling overhead,multiple small tasks has been combined to large task,which decrease the scheduling overhead and improve the execution time of the workflow.Horizontal clustering has been used to cluster the fine grained task further replication technique has been combined.The proposed scheduling algorithm improves the performance metrics such as execution time and cost.Further this research can be extended with improved clustering technique and replication methods.
文摘In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n^3/N), the memory cost is O(n^2/N), the I/O cost is O(n^2/N), and the com- munication cost is O(Nn ), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 10^6 × 10^6 effectively.
基金supported by the National Basic Research Program of China (973 Program) (No. 2012CB821200 (2012CB821206))the National Nature Science Foundation of China (No.61003281, No.91024001 and No.61070142)+1 种基金Beijing Natural Science Foundation (Study on Internet Multi-mode Area Information Accurate Searching and Mining Based on Agent, No.4111002)the Chinese Universities Scientific Fund under Grant No.BUPT 2009RC0201
文摘Goud computing is a new paradigm in which dynamic and virtualized computing resources are provided as services over the Internet. However, because cloud resource is open and dynamically configured, resource allocation and scheduling are extremely important challenges in cloud infrastructure. Based on distributed agents, this paper presents trusted data acquisition mechanism for efficient scheduling cloud resources to satisfy various user requests. Our mechanism defines, collects and analyzes multiple key trust targets of cloud service resources based on historical information of servers in a cloud data center. As a result, using our trust computing mechanism, cloud providers can utilize their resources efficiently and also provide highly trusted resources and services to many users.
基金supported in part by the Key Research and Development Program of Shaanxi under Grant 2023-ZDLGY-34.
文摘Spark performs excellently in large-scale data-parallel computing and iterative processing.However,with the increase in data size and program complexity,the default scheduling strategy has difficultymeeting the demands of resource utilization and performance optimization.Scheduling strategy optimization,as a key direction for improving Spark’s execution efficiency,has attracted widespread attention.This paper first introduces the basic theories of Spark,compares several default scheduling strategies,and discusses common scheduling performance evaluation indicators and factors affecting scheduling efficiency.Subsequently,existing scheduling optimization schemes are summarized based on three scheduling modes:load characteristics,cluster characteristics,and matching of both,and representative algorithms are analyzed in terms of performance indicators and applicable scenarios,comparing the advantages and disadvantages of different scheduling modes.The article also explores in detail the integration of Spark scheduling strategies with specific application scenarios and the challenges in production environments.Finally,the limitations of the existing schemes are analyzed,and prospects are envisioned.
基金This work has received funding from the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no.701697Major Program of the National Social Science Fund of China(Grant No.17ZDA092)+2 种基金Basic Research Programs(Natural Science Foundation)of Jiangsu Province(BK20180794)333 High-Level Talent Cultivation Project of Jiangsu Province(BRA2018332)333 High-Level Talent Cultivation Project of Jiangsu Province(BRA2018332)the PAPD fund.
文摘Hadoop is a well-known parallel computing system for distributed computing and large-scale data processes.“Straggling”tasks,however,have a serious impact on task allocation and scheduling in a Hadoop system.Speculative Execution(SE)is an efficient method of processing“Straggling”Tasks by monitoring real-time running status of tasks and then selectively backing up“Stragglers”in another node to increase the chance to complete the entire mission early.Present speculative execution strategies meet challenges on misjudgement of“Straggling”tasks and improper selection of backup nodes,which leads to inefficient implementation of speculative executive processes.This paper has proposed an Optimized Resource Scheduling strategy for Speculative Execution(ORSE)by introducing non-cooperative game schemes.The ORSE transforms the resource scheduling of backup tasks into a multi-party non-cooperative game problem,where the tasks are regarded as game participants,whilst total task execution time of the entire cluster as the utility function.In that case,the most benefit strategy can be implemented in each computing node when the game reaches a Nash equilibrium point,i.e.,the final resource scheduling scheme to be obtained.The strategy has been implemented in Hadoop-2.x.Experimental results depict that the ORSE can maintain the efficiency of speculative executive processes and improve fault-tolerant and computation performance under the circumstances of Normal Load,Busy Load and Busy Load with Skewed Data.
文摘In this paper, a prefetching technique is proposed to solve the performance problem caused by remote data access delay. In the technique, the map tasks which will cause the delay are predicted first and then the input data of these tasks will be preloaded before the tasks are scheduled. During the execution, the input data can be read from local nodes. Therefore, the delay can be hidden. The technique has been implemented in Hadoop-0. 20.1. The experiment results have shown that the technique reduces map tasks causing delay, and improves the performance of Hadoop MapRe- duce by 20%.
文摘Development of computational agent organizations or “societies” has become the domiant computing paradigm in the arena of Distributed Artificial Intelligence, and many foreseeable future applications need agent organizations, in which diversified agents cooperate in a distributed manner, forming teams. In such scenarios, the agents would need to know each other in order to facilitate the interactions. Moreover, agents in such an environment are not statically defined in advance but they can adaptively enter and leave an organization. This begs the question of how agents locate each other in order to cooperate in achieving organizational goals. Locating agents is a quite challenging task, especially in organizations that involve a large number of agents and where the resource avaiability is intermittent. The authors explore here an approach based on self organization map (SOM) which will serve as a clustering method in the light of the knowledge gathered about various agents. The approach begins by categorizing agents using a selected set of agent properties. These categories are used to derive various ranks and a distance matrix. The SOM algorithm uses this matrix as input to obtain clusters of agents. These clusters reduce the search space, resulting in a relatively short agent search time.
文摘Task scheduling is a key problem for the distributed computation. This thesis analyzes receiver initiated(RI) task scheduling algorithm, finds its weakness and presents an improved algorithm PRI algorithm. This algorithm schedules the concurrent tasks onto network of workstation dynamically at runtime, and initiates task scheduling by the node of low load. The threshold on each node can be modified according to the system information which is periodically detected. Meanwhile, the detecting period can be adjusted in terms of the change of the system state. The result of the experiments shows that the PRI algorithm is superior to the RI algorithm.
基金supported by the National Natural Science Foundation of China under Grant Nos.62072434 and U23B2004the Innovation Funding of Institute of Computing Technology,Chinese Academy of Sciences,under Grant Nos.E361050 and E361030.
文摘Edge machine learning creates a new computational paradigm by enabling the deployment of intelligent applications at the network edge.It enhances application efficiency and responsiveness by performing inference and training tasks closer to data sources.However,it encounters several challenges in practice.The variance in hardware specifications and performance across different devices presents a major issue for the training and inference tasks.Additionally,edge devices typically possess limited network bandwidth and computing resources compared with data centers.Moreover,existing distributed training architectures often fail to consider the constraints of resources and communication efficiency in edge environments.In this paper,we propose DSparse,a method for distributed training based on sparse update in edge clusters with various memory capacities.It aims at maximizing the utilization of memory resources across all devices within a cluster.To reduce memory consumption during the training process,we adopt sparse update to prioritize the updating of selected layers on the devices in the cluster,which not only lowers memory usage but also reduces the data volume of parameters and the time required for parameter aggregation.Furthermore,DSparse utilizes a parameter aggregation mechanism based on multi-process groups,subdividing the aggregation tasks into AllReduce and Broadcast types,thereby further reducing the communication frequency for parameter aggregation.Experimental results using the MobileNetV2 model on the CIFAR-10 dataset demonstrate that DSparse reduces memory consumption by an average of 59.6%across seven devices,with a 75.4%reduction in parameter aggregation time,while maintaining model precision.
基金This work was supported by the National Natural Science Foundation of China(No.51877076).
文摘High penetration of renewable energies enlarge the peak-valley difference of the net load of the distribution system,which puts forward higher requirements for the operation scheduling of the distribution system.From the perspective of leveraging demand-side adjustment capabilities,an optimal scheduling method of the distribution system with edge computing and data-driven modeling of price-based demand response(PBDR)is proposed.By introducing the edge computing paradigm,a collaborative interaction framework between the control center and the edge nodes is designed for the optimization of the distribution system.At the edge nodes,a classified XGBoost-based PBDR modeling method is proposed for large-scale differentiated users.At the control center,a two-stage optimization method integrating pre-scheduling and re-scheduling is proposed based on demand response results from all edge nodes.Through the information interaction between the control center and edge nodes,the optimized scheduling of the distribution system with large-scale users is realized.Finally,a case study is implemented on the modified IEEE 33-node system,which verifies that the proposed classified modeling method has lower errors,and it is beneficial to improve the economics of the system operation.Moreover,the simulation results show that the application of edge computing can significantly reduce the calculation time of the optimal scheduling problem with PBDR modeling of large-scale users.
基金supported by the MEyC under Grant No.TIN 2008-05913
文摘This paper proposes a prediction engine designed for non-dedicated clusters, which is able to estimate the turnaround time for parallel applications, even in the presence of serial workload of the workstation owner. The prediction engine can be configured to work with three different estimation kernels: a Historical kernel, a Simulation kernel based on analytical models and an integration of both, named Hybrid kernel. These estimation proposals were integrated into a scheduling system, named CISNE, which can be executed in an on-line or off-line mode. The accuracy of the proposed estimation methods was evaluated in relation to different job scheduling policies in a real and a simulated cluster environment. In both environments, we observed that the Hybrid system gives the best results because it combines the ability of a simulation engine to capture the dynamism of a non-dedicated environment together with the accuracy of the historical methods to estimate the application runtime considering the state of the resources.
文摘In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the chal- lenges on resource utilization and manageability to data cen- ters. Many resource and runtime management systems are developed or evolved to address these challenges and rele- vant problems from different perspectives. This paper tries to identify the main motivations, key concerns, common fea- tures, and representative solutions of such systems through a survey and analysis. A typical kind of these systems is gener- alized as the consolidated cluster system, whose design goal is identified as reducing the overall costs under the quality of service premise. A survey on this kind of systems is given, and the critical issues concerned by such systems are sum- marized as resource consolidation and runtime coordination. These two issues are analyzed and classified according to the design styles and external characteristics abstracted from the surveyed work. Five representative consolidated cluster systems from both academia and industry are illustrated and compared in detail based on the analysis and classifications. We hope this survey and analysis to be conducive to both de- sign implementation and technology selection of this kind of systems, in response to the constantly emerging challenges on infrastructure and application management in data centers.
基金This work was supported by Shandong medical and health science and technology development plan project(No.202012070393).
文摘Load-time series data in mobile cloud computing of Internet of Vehicles(IoV)usually have linear and nonlinear composite characteristics.In order to accurately describe the dynamic change trend of such loads,this study designs a load prediction method by using the resource scheduling model for mobile cloud computing of IoV.Firstly,a chaotic analysis algorithm is implemented to process the load-time series,while some learning samples of load prediction are constructed.Secondly,a support vector machine(SVM)is used to establish a load prediction model,and an improved artificial bee colony(IABC)function is designed to enhance the learning ability of the SVM.Finally,a CloudSim simulation platform is created to select the perminute CPU load history data in the mobile cloud computing system,which is composed of 50 vehicles as the data set;and a comparison experiment is conducted by using a grey model,a back propagation neural network,a radial basis function(RBF)neural network and a RBF kernel function of SVM.As shown in the experimental results,the prediction accuracy of the method proposed in this study is significantly higher than other models,with a significantly reduced real-time prediction error for resource loading in mobile cloud environments.Compared with single-prediction models,the prediction method proposed can build up multidimensional time series in capturing complex load time series,fit and describe the load change trends,approximate the load time variability more precisely,and deliver strong generalization ability to load prediction models for mobile cloud computing resources.