Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are bas...Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are based on the so-called list scheduling technique. The basic idea behind list scheduling is to prepare a sequence of nodes in the form of a list for scheduling by assigning them some priority measurements, and then repeatedly removing the node with the highest priority from the list and allocating it to the processor providing the earliest start time (EST). Therefore, it can be inferred that the makespans obtained are dominated by two major factors: (1) which order of tasks should be selected (sequence subproblem); (2) how the selected order should be assigned to the processors (assignment subproblem). A number of good approaches for overcoming the task sequence dilemma have been proposed in the literature, while the task assignment problem has not been studied much. The results of this study prove that assigning tasks to the processors using the traditional EST method is not optimum; in addition, a novel approach based on the ant colony optimization algorithm is introduced, which can find far better solutions.展开更多
The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation res...The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.展开更多
基金Project supported by Sama Technical and Vocational Training College,Islamic Azad University,Shoushtar Branch,Shoushtar,Iran
文摘Optimized task scheduling is one of the most important challenges to achieve high performance in multiprocessor environments such as parallel and distributed systems. Most introduced task-scheduling algorithms are based on the so-called list scheduling technique. The basic idea behind list scheduling is to prepare a sequence of nodes in the form of a list for scheduling by assigning them some priority measurements, and then repeatedly removing the node with the highest priority from the list and allocating it to the processor providing the earliest start time (EST). Therefore, it can be inferred that the makespans obtained are dominated by two major factors: (1) which order of tasks should be selected (sequence subproblem); (2) how the selected order should be assigned to the processors (assignment subproblem). A number of good approaches for overcoming the task sequence dilemma have been proposed in the literature, while the task assignment problem has not been studied much. The results of this study prove that assigning tasks to the processors using the traditional EST method is not optimum; in addition, a novel approach based on the ant colony optimization algorithm is introduced, which can find far better solutions.
基金This work was supported in part by the National Science Foundation of U.S.A.under Grant Nos.~CNS 2128378,CNS 2107014,CNS 1824440,CNS 1828363,CNS 1757533,CNS 1629746,and CNS 1651947.
文摘The volume of information that needs to be processed in big data clusters increases rapidly nowadays. It is critical to execute the data analysis in a time-efficient manner. However, simply adding more computation resources may not speed up the data analysis significantly. The data analysis jobs usually consist of multiple stages which are organized as a directed acyclic graph (DAG). The precedence relationships between stages cause scheduling challenges. General DAG scheduling is a well-known NP-hard problem. Moreover, we observe that in some parallel computing frameworks such as Spark, the execution of a stage in DAG contains multiple phases that use different resources. We notice that carefully arranging the execution of those resources in pipeline can reduce their idle time and improve the average resource utilization. Therefore, we propose a resource pipeline scheme with the objective of minimizing the job makespan. For perfectly parallel stages, we propose a contention-free scheduler with detailed theoretical analysis. Moreover, we extend the contention-free scheduler for three-phase stages, considering the computation phase of some stages can be partitioned. Additionally, we are aware that job stages in real-world applications are usually not perfectly parallel. We need to frequently adjust the parallelism levels during the DAG execution. Considering reinforcement learning (RL) techniques can adjust the scheduling policy on the fly, we investigate a scheduler based on RL for online arrival jobs. The RL-based scheduler can adjust the resource contention adaptively. We evaluate both contention-free and RL-based schedulers on a Spark cluster. In the evaluation, a real-world cluster trace dataset is used to simulate different DAG styles. Evaluation results show that our pipelined scheme can significantly improve CPU and network utilization.