An signal noise ratio( SNR) adaptive sorting algorithm using the time-frequency( TF)sparsity of frequency-hopping( FH) signal is proposed in this paper. Firstly,the Gabor transformation is used as TF transformat...An signal noise ratio( SNR) adaptive sorting algorithm using the time-frequency( TF)sparsity of frequency-hopping( FH) signal is proposed in this paper. Firstly,the Gabor transformation is used as TF transformation in the system and a sorting model is established under undetermined condition; then the SNR adaptive pivot threshold setting method is used to find the TF single source. The mixed matrix is estimated according to the TF matrix of single source. Lastly,signal sorting is realized through improved subspace projection combined with relative power deviation of source. Theoretical analysis and simulation results showthat this algorithm has good effectiveness and performance.展开更多
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields.As modern reduced instruction set computing architectures like the fifth generation(RISC-V)regard multiple registers as a v...Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields.As modern reduced instruction set computing architectures like the fifth generation(RISC-V)regard multiple registers as a vector register group for wide instruction parallelism,optimizing merge sort with this vectorized property is becoming increasingly common.In this paper,we overhaul the divide-sort-merge paradigm,from its register-level sort to the cache-aware merge,to develop a finegrained RISC-V vectorized merge sort(RVMS).From the register-level view,the inline vectorized transpose instruction is missed in RISC-V,so implementing it efficiently is non-trivial.Besides,the vectorized comparisons do not always work well in the merging networks.Both issues primarily stem from the expensive data shuffle instruction.To bypass it,RVMS strides to take register data as the proxy of data shuffle to accelerate the transpose operation,and meanwhile replaces vectorized comparisons with scalar cousin for more light real value swap.On the other hand,as cache-aware merge makes larger data merge in the cache,most merge schemes have two drawbacks:the in-cache merge usually has low cache utilization,while the out-of-cache merging network remains an ineffectively symmetric structure.To this end,we propose the half-merge scheme to employ the auxiliary space of in-place merge to halve the footprint of naie merge sort,and meanwhile copy one sequence to this space to avoid the former data exchange.Furthermore,an asymmetric merging network is developed to adapt to two different input sizes.Experiments on the RISC-V processor SG2042 show that four fine-grained optimization schemes including register strided transpose,hybrid merging network,half-merge strategy,and asymmetric merging network,improve performance by 4.05%,19.88%,12.23%,and 11.04%respectively.Importantly,the overall performance is 1.34x faster than the parallel sorting in the Boost C++library,and 1.85x faster than std::sort.展开更多
Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal...Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal processors(DSPs)have become an alternative architecture in HPC systems.FT-M7032 is a CPU-DSP heterogeneous processor that integrates 16 CPU cores for running operating systems and four multi-core general purpose DSP(GPDSP)clusters for providing high performance.Sorting is a fundamental operation in computer science with numerous applications and has been studied extensively,but high-performance parallel sorting algorithms are typically architecture-specific.To our knowledge,little attention has been paid to optimizing the sorting on the low-power multicore DSPs.In this paper,we propose thSORT,an efficient bitonic sorting algorithm for FT-M7032.Our algorithm consists of two parts:single-core DSP sorting and multi-core DSP sorting,both aiming to tap the features of FT-M7032.We implement a vector micro-kernel for bitonic sort and propose a multi-level algorithm to merge the results of the micro-kernel.When compared to the CPU baseline,our implementation is 1.43×faster than the parallel sorting of the Boost C++Libraries,and is 2.15×faster than std::sort.展开更多
基金Supported by the National Natural Science Foundation of China(64601500)
文摘An signal noise ratio( SNR) adaptive sorting algorithm using the time-frequency( TF)sparsity of frequency-hopping( FH) signal is proposed in this paper. Firstly,the Gabor transformation is used as TF transformation in the system and a sorting model is established under undetermined condition; then the SNR adaptive pivot threshold setting method is used to find the TF single source. The mixed matrix is estimated according to the TF matrix of single source. Lastly,signal sorting is realized through improved subspace projection combined with relative power deviation of source. Theoretical analysis and simulation results showthat this algorithm has good effectiveness and performance.
基金supported by National Key Research and Development Program of China(Grant No.2023YFB3001903)the National Natural Science Foundation of China(Grant Nos.62032023,42104078 and 6190241).
文摘Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields.As modern reduced instruction set computing architectures like the fifth generation(RISC-V)regard multiple registers as a vector register group for wide instruction parallelism,optimizing merge sort with this vectorized property is becoming increasingly common.In this paper,we overhaul the divide-sort-merge paradigm,from its register-level sort to the cache-aware merge,to develop a finegrained RISC-V vectorized merge sort(RVMS).From the register-level view,the inline vectorized transpose instruction is missed in RISC-V,so implementing it efficiently is non-trivial.Besides,the vectorized comparisons do not always work well in the merging networks.Both issues primarily stem from the expensive data shuffle instruction.To bypass it,RVMS strides to take register data as the proxy of data shuffle to accelerate the transpose operation,and meanwhile replaces vectorized comparisons with scalar cousin for more light real value swap.On the other hand,as cache-aware merge makes larger data merge in the cache,most merge schemes have two drawbacks:the in-cache merge usually has low cache utilization,while the out-of-cache merging network remains an ineffectively symmetric structure.To this end,we propose the half-merge scheme to employ the auxiliary space of in-place merge to halve the footprint of naie merge sort,and meanwhile copy one sequence to this space to avoid the former data exchange.Furthermore,an asymmetric merging network is developed to adapt to two different input sizes.Experiments on the RISC-V processor SG2042 show that four fine-grained optimization schemes including register strided transpose,hybrid merging network,half-merge strategy,and asymmetric merging network,improve performance by 4.05%,19.88%,12.23%,and 11.04%respectively.Importantly,the overall performance is 1.34x faster than the parallel sorting in the Boost C++library,and 1.85x faster than std::sort.
基金supported by the National Natural Science Foundation of China under Grant Nos.61972415 and 61972408.
文摘Multi-core architecture has become the main trend in high performance computing(HPC)because of its powerful parallel computing capability.Due to energy efficiency constraints,energy-efficient multi-core digital signal processors(DSPs)have become an alternative architecture in HPC systems.FT-M7032 is a CPU-DSP heterogeneous processor that integrates 16 CPU cores for running operating systems and four multi-core general purpose DSP(GPDSP)clusters for providing high performance.Sorting is a fundamental operation in computer science with numerous applications and has been studied extensively,but high-performance parallel sorting algorithms are typically architecture-specific.To our knowledge,little attention has been paid to optimizing the sorting on the low-power multicore DSPs.In this paper,we propose thSORT,an efficient bitonic sorting algorithm for FT-M7032.Our algorithm consists of two parts:single-core DSP sorting and multi-core DSP sorting,both aiming to tap the features of FT-M7032.We implement a vector micro-kernel for bitonic sort and propose a multi-level algorithm to merge the results of the micro-kernel.When compared to the CPU baseline,our implementation is 1.43×faster than the parallel sorting of the Boost C++Libraries,and is 2.15×faster than std::sort.