Non-uniform sampling two-dimensional convolution (NUSC) maps spatially sampling data with irregular distribution to a regular grid by convolution. As the data scale and growth rate continue to increase, accelerating N...Non-uniform sampling two-dimensional convolution (NUSC) maps spatially sampling data with irregular distribution to a regular grid by convolution. As the data scale and growth rate continue to increase, accelerating NUSC with the heterogene-ous computing platform is a feasible way. However, the complex hardware architecture and storage hierarchy of the hetero-geneous computing platform poses a challenge to programming and performance tuning. Therefore, this paper proposes a heterogeneous parallel programming model and runtime framework named AutoNUSC. For the programming difficulties of NUSC in heterogeneous computing environments, AutoNUSC abstracts and encapsulates the parallel execution process of NUSC. Task scheduling, data division, node communication, fault-tolerant recovery, and other parallelization tasks are managed by AutoNUSC. For the performance tuning issues of NUSC, this paper implements performance optimization strategies for AutoNUSC, including vectorization, memory access optimization, data reuse, etc. The experiments show that AutoNUSC effectively reduces the workload of users in developing NUSC applications in heterogeneous computing environments. Performance acceleration of up to 339 times is achieved within a single node compared to the serial program. AutoNUSC can efficiently perform task scheduling and fault-tolerant recovery across multiple nodes, with desirable scal-ability and robustness.展开更多
文摘Non-uniform sampling two-dimensional convolution (NUSC) maps spatially sampling data with irregular distribution to a regular grid by convolution. As the data scale and growth rate continue to increase, accelerating NUSC with the heterogene-ous computing platform is a feasible way. However, the complex hardware architecture and storage hierarchy of the hetero-geneous computing platform poses a challenge to programming and performance tuning. Therefore, this paper proposes a heterogeneous parallel programming model and runtime framework named AutoNUSC. For the programming difficulties of NUSC in heterogeneous computing environments, AutoNUSC abstracts and encapsulates the parallel execution process of NUSC. Task scheduling, data division, node communication, fault-tolerant recovery, and other parallelization tasks are managed by AutoNUSC. For the performance tuning issues of NUSC, this paper implements performance optimization strategies for AutoNUSC, including vectorization, memory access optimization, data reuse, etc. The experiments show that AutoNUSC effectively reduces the workload of users in developing NUSC applications in heterogeneous computing environments. Performance acceleration of up to 339 times is achieved within a single node compared to the serial program. AutoNUSC can efficiently perform task scheduling and fault-tolerant recovery across multiple nodes, with desirable scal-ability and robustness.