The subtropical North and South Pacific Meridional Modes(NPMM and SPMM)are well known precursors of El Niño-Southern Oscillation(ENSO).However,relationship between them is not constant.In the early 1980,the relat...The subtropical North and South Pacific Meridional Modes(NPMM and SPMM)are well known precursors of El Niño-Southern Oscillation(ENSO).However,relationship between them is not constant.In the early 1980,the relationship experienced an interdecadal transition.Changes in this connection can be attributed mainly to the phase change of the Pacific decadal oscillation(PDO).During the positive phase of PDO,a shallower thermocline in the central Pacific is responsible for the stronger trade wind charging(TWC)mechanism,which leads to a stronger equatorial subsurface temperature evolution.This dynamic process strengthens the connection between NPMM and ENSO.Associated with the negative phase of PDO,a shallower thermocline over southeastern Pacific allows an enhanced wind-evaporation-SST(WES)feedback,strengthening the connection between SPMM and ENSO.Using 35 Coupled Model Intercomparison Project Phase 6(CMIP6)models,we examined the NPMM/SPMM performance and its connection with ENSO in the historical runs.The great majority of CMIP6 models can reproduce the pattern of NPMM and SPMM well,but they reveal discrepant ENSO and NPMM/SPMM relationship.The intermodal uncertainty for the connection of NPMM-ENSO is due to different TWC mechanism.A stronger TWC mechanism will enhance NPMM forcing.For SPMM,few models can simulate a good relationship with ENSO.The intermodel spread in the relationship of SPMM and ENSO owing to SST bias in the southeastern Pacific,as WES feedback is stronger when the southeastern Pacific is warmer.展开更多
Sparse compiler is a promising solution for sparse tensor algebra optimization.In compiler implementation,reduction in sparse-dense hybrid algebra plays a key role in performance.Though GPU provides various reduction ...Sparse compiler is a promising solution for sparse tensor algebra optimization.In compiler implementation,reduction in sparse-dense hybrid algebra plays a key role in performance.Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity,the central question is:how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution.Specifically,we have to tackle two main challenges:(1)there are wasted parallelism by adopting static synchronization granularity(2)static reduction strategy limits optimization space exploration.We propose Sgap:s egment g roup and a tomic p arallelism to solve these problems.Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU.It is a new optimization technique beyond current compiler-based and open-source runtime libraries.Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory.It adopts changeable group size and user-defined reduction strategy to solve challenge(1)and(2),respectively.Finally,we use GPU sparse matrix-matrix multiplication(SpMM)on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation.We achieve up to 1.2×speedup over the original TACO’s SpMM kernels.We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE.We achieve 1.6×∼2.3×speedup on the algorithm tuned with atomic parallelism.展开更多
To address the issues of sparse matrix load imbalance and parallelism degradation with increasing matrix size in the mainstream Sparse-dense matrix-matrix multiplication(SpMM)parallelization strategy row-split,we prop...To address the issues of sparse matrix load imbalance and parallelism degradation with increasing matrix size in the mainstream Sparse-dense matrix-matrix multiplication(SpMM)parallelization strategy row-split,we propose a new framework for parallel SpMM computation on DCUs(GPU-like accelerators).This framework is based on the standard CSR format,requiring no additional format conversion,and thus offers strong generality.To address the issue of load imbalance,we introduce a coarse-grained two-level binning strategy that categorizes the rows of the sparse matrix into three groups based on the number of non-zero elements.Dedicated computation kernels are designed for each category to better accommodate different types of computational tasks,thereby significantly improving load balance.To address the decline in parallelism as the matrix size increases,we design multiple optimized kernels and dynamically select the optimal configuration at runtime to maximize parallelism.Experimental results show that our proposed SpMM framework significantly outperforms two current state-of-the-art row-split based SpMM algorithms(rocSparse and GE-SpMM),achieving speedups of 5.4×and 2.28×,respectively.展开更多
基金Supported by the National Natural Science Foundation of China(NSFC)(No.41976027)。
文摘The subtropical North and South Pacific Meridional Modes(NPMM and SPMM)are well known precursors of El Niño-Southern Oscillation(ENSO).However,relationship between them is not constant.In the early 1980,the relationship experienced an interdecadal transition.Changes in this connection can be attributed mainly to the phase change of the Pacific decadal oscillation(PDO).During the positive phase of PDO,a shallower thermocline in the central Pacific is responsible for the stronger trade wind charging(TWC)mechanism,which leads to a stronger equatorial subsurface temperature evolution.This dynamic process strengthens the connection between NPMM and ENSO.Associated with the negative phase of PDO,a shallower thermocline over southeastern Pacific allows an enhanced wind-evaporation-SST(WES)feedback,strengthening the connection between SPMM and ENSO.Using 35 Coupled Model Intercomparison Project Phase 6(CMIP6)models,we examined the NPMM/SPMM performance and its connection with ENSO in the historical runs.The great majority of CMIP6 models can reproduce the pattern of NPMM and SPMM well,but they reveal discrepant ENSO and NPMM/SPMM relationship.The intermodal uncertainty for the connection of NPMM-ENSO is due to different TWC mechanism.A stronger TWC mechanism will enhance NPMM forcing.For SPMM,few models can simulate a good relationship with ENSO.The intermodel spread in the relationship of SPMM and ENSO owing to SST bias in the southeastern Pacific,as WES feedback is stronger when the southeastern Pacific is warmer.
文摘Sparse compiler is a promising solution for sparse tensor algebra optimization.In compiler implementation,reduction in sparse-dense hybrid algebra plays a key role in performance.Though GPU provides various reduction semantics that can better utilize the parallel computing and memory bandwidth capacity,the central question is:how to elevate the flexible reduction semantics to sparse compilation theory that assumes serial execution.Specifically,we have to tackle two main challenges:(1)there are wasted parallelism by adopting static synchronization granularity(2)static reduction strategy limits optimization space exploration.We propose Sgap:s egment g roup and a tomic p arallelism to solve these problems.Atomic parallelism captures the flexible reduction semantics to systematically analyze the optimization space of sparse-dense hybrid algebra on GPU.It is a new optimization technique beyond current compiler-based and open-source runtime libraries.Segment group elevates the flexible reduction semantics to suitable levels of abstraction in the sparse compilation theory.It adopts changeable group size and user-defined reduction strategy to solve challenge(1)and(2),respectively.Finally,we use GPU sparse matrix-matrix multiplication(SpMM)on the TACO compiler as a use case to demonstrate the effectiveness of segment group in reduction semantics elevation.We achieve up to 1.2×speedup over the original TACO’s SpMM kernels.We also apply new optimization techniques found by atomic parallelism to an open-source state-of-the-art SpMM library dgSPARSE.We achieve 1.6×∼2.3×speedup on the algorithm tuned with atomic parallelism.
基金funded by the National Key Research and Development Program of China(2024YFB4504103)the Major Science and Technology Special Projects in Henan Province(241111212300)the National Key Research and Development Program of China(2023ZD0120604).
文摘To address the issues of sparse matrix load imbalance and parallelism degradation with increasing matrix size in the mainstream Sparse-dense matrix-matrix multiplication(SpMM)parallelization strategy row-split,we propose a new framework for parallel SpMM computation on DCUs(GPU-like accelerators).This framework is based on the standard CSR format,requiring no additional format conversion,and thus offers strong generality.To address the issue of load imbalance,we introduce a coarse-grained two-level binning strategy that categorizes the rows of the sparse matrix into three groups based on the number of non-zero elements.Dedicated computation kernels are designed for each category to better accommodate different types of computational tasks,thereby significantly improving load balance.To address the decline in parallelism as the matrix size increases,we design multiple optimized kernels and dynamically select the optimal configuration at runtime to maximize parallelism.Experimental results show that our proposed SpMM framework significantly outperforms two current state-of-the-art row-split based SpMM algorithms(rocSparse and GE-SpMM),achieving speedups of 5.4×and 2.28×,respectively.