Ak-bitonic sort which generalizes the bitonic sort is proposed. The theorem of the bitonic sort, which merges two monotonic sequences into one order sequence, is extended into the theorem ofk-bitonic sort. Thek-bitoni...Ak-bitonic sort which generalizes the bitonic sort is proposed. The theorem of the bitonic sort, which merges two monotonic sequences into one order sequence, is extended into the theorem ofk-bitonic sort. Thek-bitonic sort merges (K (=2k or 2k?1) monotonic sequences into one order sequence in $\left\lceil {log_2 K} \right\rceil \left\lceil {log_2 N} \right\rceil - \tfrac{{\left\lceil {log_2 K} \right\rceil (\left\lceil {log_2 K} \right\rceil - 1)}}{2}$ steps, where $k = \left\lceil {\tfrac{K}{2}} \right\rceil $ is an integer andk≥1. Thek-bitonic sort is the Batcher's bitonic sort whenk=1.展开更多
Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal...Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal processors(DSPs)have become an alternative architecture in HPC systems.FT-M7032 is a CPU-DSP heterogeneous processor that integrates 16 CPU cores for running operating systems and four multi-core general purpose DSP(GPDSP)clusters for providing high performance.Sorting is a fundamental operation in computer science with numerous applications and has been studied extensively,but high-performance parallel sorting algorithms are typically architecture-specific.To our knowledge,little attention has been paid to optimizing the sorting on the low-power multicore DSPs.In this paper,we propose thSORT,an efficient bitonic sorting algorithm for FT-M7032.Our algorithm consists of two parts:single-core DSP sorting and multi-core DSP sorting,both aiming to tap the features of FT-M7032.We implement a vector micro-kernel for bitonic sort and propose a multi-level algorithm to merge the results of the micro-kernel.When compared to the CPU baseline,our implementation is 1.43×faster than the parallel sorting of the Boost C++Libraries,and is 2.15×faster than std::sort.展开更多
基金Project supported by the National 863 Foundation of China (863-306-05-01-1) and the National Natural Science Foundation of China (Grant No. 69673037).
文摘Ak-bitonic sort which generalizes the bitonic sort is proposed. The theorem of the bitonic sort, which merges two monotonic sequences into one order sequence, is extended into the theorem ofk-bitonic sort. Thek-bitonic sort merges (K (=2k or 2k?1) monotonic sequences into one order sequence in $\left\lceil {log_2 K} \right\rceil \left\lceil {log_2 N} \right\rceil - \tfrac{{\left\lceil {log_2 K} \right\rceil (\left\lceil {log_2 K} \right\rceil - 1)}}{2}$ steps, where $k = \left\lceil {\tfrac{K}{2}} \right\rceil $ is an integer andk≥1. Thek-bitonic sort is the Batcher's bitonic sort whenk=1.
基金supported by the National Natural Science Foundation of China under Grant Nos.61972415 and 61972408.
文摘Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal processors(DSPs)have become an alternative architecture in HPC systems.FT-M7032 is a CPU-DSP heterogeneous processor that integrates 16 CPU cores for running operating systems and four multi-core general purpose DSP(GPDSP)clusters for providing high performance.Sorting is a fundamental operation in computer science with numerous applications and has been studied extensively,but high-performance parallel sorting algorithms are typically architecture-specific.To our knowledge,little attention has been paid to optimizing the sorting on the low-power multicore DSPs.In this paper,we propose thSORT,an efficient bitonic sorting algorithm for FT-M7032.Our algorithm consists of two parts:single-core DSP sorting and multi-core DSP sorting,both aiming to tap the features of FT-M7032.We implement a vector micro-kernel for bitonic sort and propose a multi-level algorithm to merge the results of the micro-kernel.When compared to the CPU baseline,our implementation is 1.43×faster than the parallel sorting of the Boost C++Libraries,and is 2.15×faster than std::sort.