The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous flui...The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.展开更多
We present swRender,a new parallel rendering pipeline based on the new Sunway many-core architecture(SW26010P)for the Monte Carlo path-tracing algorithm.Previous parallel rendering schemes are unsuitable for our task ...We present swRender,a new parallel rendering pipeline based on the new Sunway many-core architecture(SW26010P)for the Monte Carlo path-tracing algorithm.Previous parallel rendering schemes are unsuitable for our task due to issues such as vast differences in hardware architectures and bottlenecks in I/O communication efficiency.To that end,we create a new two-level parallel tile rendering framework to fully utilize the Sunway computing resources,a practical tile-grouping loadbalancing method to maintain the framework’s stability,and a novel many-core acceleration optimization to improve the rendering performance at the pixel level.Our method achieves(1)an average speedup of 16x in multiple benchmarks when compared to the baseline path-tracing model on the Sunway architecture,and(2)an average speedup of 2x when compared to state-of-the-art CPU,co-processor,and GPU-based parallel rendering approaches.Moreover,we scale swRender to run on 15 million cores and obtain high scalable parallel efficiency of 92%.展开更多
基金supported by National Key Research and Development Program of China under Grant 2024YFE0210800National Natural Science Foundation of China under Grant 62495062Beijing Natural Science Foundation under Grant L242017.
文摘The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.
基金supported by the National Key R&D Program of China(#2022YFC2803805)the Fundamental Research Funds for the Central Universities(#202313035)+4 种基金the Shandong Provincial Natural Science Foundation of China(#ZR2021QF124)the China Postdoctoral Science Foundation(#2021M703031)supported by the National Key R&D Program of China(#2021YFF0704000)supported by the National Natural Science Foundation of China(#62036010)the Key R&D Program of Zhejiang(#2022C03126).
文摘We present swRender,a new parallel rendering pipeline based on the new Sunway many-core architecture(SW26010P)for the Monte Carlo path-tracing algorithm.Previous parallel rendering schemes are unsuitable for our task due to issues such as vast differences in hardware architectures and bottlenecks in I/O communication efficiency.To that end,we create a new two-level parallel tile rendering framework to fully utilize the Sunway computing resources,a practical tile-grouping loadbalancing method to maintain the framework’s stability,and a novel many-core acceleration optimization to improve the rendering performance at the pixel level.Our method achieves(1)an average speedup of 16x in multiple benchmarks when compared to the baseline path-tracing model on the Sunway architecture,and(2)an average speedup of 2x when compared to state-of-the-art CPU,co-processor,and GPU-based parallel rendering approaches.Moreover,we scale swRender to run on 15 million cores and obtain high scalable parallel efficiency of 92%.