Heterogeneous processors integrate very distinct compute resources such as CPUs and GPUs into the same chip,thus can exploit the advantages and avoid disadvantages of those compute units.We in this work evaluate and a...Heterogeneous processors integrate very distinct compute resources such as CPUs and GPUs into the same chip,thus can exploit the advantages and avoid disadvantages of those compute units.We in this work evaluate and analyze eight sparse matrix and graph kernels on an AMD CPU-GPU heterogeneous processor by using 956 sparse matrices.Five characteristics,i.e.,load balancing,indirect addressing,memory reallocation,atomic operations,and dynamic characteristics are our major considerations.The experimental results show that although the CPU and GPU parts access the same DRAM,very different performance behaviors are observed.For example,though the GPU part in general outperforms the CPU part,it cannot achieve the best performance in all cases given by the CPU part.Moreover,the bandwidth utilization of atomic operations on heterogeneous processors can be much higher than a high-end discrete GPU.展开更多
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ...Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.展开更多
Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very importa...Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.展开更多
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h...Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.展开更多
With the improvement of security awareness,in order to guarantee information security,more advanced and secure encryption algorithms are applied to Microsoft Office.People also set more complex encryption passwords.Ho...With the improvement of security awareness,in order to guarantee information security,more advanced and secure encryption algorithms are applied to Microsoft Office.People also set more complex encryption passwords.However,once the initial password is forgotten,the encrypted information needs to be retrieved.The conventional brute force cracking methods and password recovery programs can hardly meet the actual deciphering needs.To this end,we develop a distributed parallel password recovery program(MT-Office)for Microsoft Office on the domestic heterogeneous multi-core processor(MT-3000).MT-Office takes full advantage of the multi-core and heterogeneous features of MT-3000,and is optimized and improved in both vectorization and global computing.At the same time,MT-Office provides multiple recovery strategies in password generation to improve the recovery efficiency.Compared with other platforms(e.g.,Intel platforms and FT platforms),MT-3000 heterogeneous platform can achieve 60×–218×speedup ratio.For Office2010,we perform a strong scalability test on the new-generation supercomputer in National Supercomputer Center in Tianjin.MT-Office not only extends to 65,536 acceleration clusters on this system,shows good scalability,but also achieves almost linear speedup ratio.For Office2007,compared with other password recovery programs,MT-Office can achieve 2.5×–131.1×speedup ratio.It can be seen that MT-Office can better exploit the advantages of MT-3000,which not only has good scalability and parallelism,but also has faster deciphering speed and can be applied to practical engineering application.展开更多
基金supported by the National Natural Science Foundation of China(Grant nos.61732014,61802412,61671151)Beijing Natural Science Foundation(no.4172031)SenseTime Young Scholars Research Fund.
文摘Heterogeneous processors integrate very distinct compute resources such as CPUs and GPUs into the same chip,thus can exploit the advantages and avoid disadvantages of those compute units.We in this work evaluate and analyze eight sparse matrix and graph kernels on an AMD CPU-GPU heterogeneous processor by using 956 sparse matrices.Five characteristics,i.e.,load balancing,indirect addressing,memory reallocation,atomic operations,and dynamic characteristics are our major considerations.The experimental results show that although the CPU and GPU parts access the same DRAM,very different performance behaviors are observed.For example,though the GPU part in general outperforms the CPU part,it cannot achieve the best performance in all cases given by the CPU part.Moreover,the bandwidth utilization of atomic operations on heterogeneous processors can be much higher than a high-end discrete GPU.
文摘Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.
文摘Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.
文摘Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.
基金supported by the National Key Research and Development Program of China(Grant No.2021YFB0300101)the National Natural Science Foundation of China(Grant No.62032023,61902411,12002382)。
文摘With the improvement of security awareness,in order to guarantee information security,more advanced and secure encryption algorithms are applied to Microsoft Office.People also set more complex encryption passwords.However,once the initial password is forgotten,the encrypted information needs to be retrieved.The conventional brute force cracking methods and password recovery programs can hardly meet the actual deciphering needs.To this end,we develop a distributed parallel password recovery program(MT-Office)for Microsoft Office on the domestic heterogeneous multi-core processor(MT-3000).MT-Office takes full advantage of the multi-core and heterogeneous features of MT-3000,and is optimized and improved in both vectorization and global computing.At the same time,MT-Office provides multiple recovery strategies in password generation to improve the recovery efficiency.Compared with other platforms(e.g.,Intel platforms and FT platforms),MT-3000 heterogeneous platform can achieve 60×–218×speedup ratio.For Office2010,we perform a strong scalability test on the new-generation supercomputer in National Supercomputer Center in Tianjin.MT-Office not only extends to 65,536 acceleration clusters on this system,shows good scalability,but also achieves almost linear speedup ratio.For Office2007,compared with other password recovery programs,MT-Office can achieve 2.5×–131.1×speedup ratio.It can be seen that MT-Office can better exploit the advantages of MT-3000,which not only has good scalability and parallelism,but also has faster deciphering speed and can be applied to practical engineering application.