In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine ...In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine (Fig. 1)[1] claimed this distinction, which previously had been held by China’s Sunway TaihuLight supercomputer.展开更多
High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties ...High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.展开更多
With the rapid development of supercomputers, the scale and complexity are ever increasing, and the reliability and resilience are faced with larger challenges. There are many important technologies in fault tolerance...With the rapid development of supercomputers, the scale and complexity are ever increasing, and the reliability and resilience are faced with larger challenges. There are many important technologies in fault tolerance, such as proacrive failure avoidance technologies based on fault prediction, reactive fault tolerance based on checkpoint, and scheduling technologies to improve reliability. Both qualitative and quantitative descriptions on characteristics of system faults are very critical for these technologies, This study analyzes the source of failures on two typical petascale supercomputers called Sunway BlueLight (based on multi-core CPUs) and Sunway TaihuLight (based on heterogeneous manycore CPUs). It uncovers some interesting fault characteristics and finds unknown correlation relationship among main components' faults. Finally the paper analyzes the failure time of the two supercomputers in various grains of resource and different time spans, and builds a uniform multi-dimensional failure time model for petascale supereomputers.展开更多
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the us...With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the "reliability wall", which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay- 2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system.展开更多
An analysis of real-world operational data of Tianhe-1A(TH-1A)supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load.This stud...An analysis of real-world operational data of Tianhe-1A(TH-1A)supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load.This study proposes AquaSee,a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data.This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin.Datasets with various compositions are used to construct the prediction model,which is also established using different prediction sequence lengths.Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data.The best inference sequence length is two points.Furthermore,an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.展开更多
Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a dire...Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive- based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards oifloading computations to accelerators (typically one), OpenMC alms to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.展开更多
With the Frontier supercomputer ranked first on the Top500 list,it marks the era of exascale computing power for supercomputers,employing the compute nodes with double-precision floating-point performance exceeding 10...With the Frontier supercomputer ranked first on the Top500 list,it marks the era of exascale computing power for supercomputers,employing the compute nodes with double-precision floating-point performance exceeding 100 TFLOPS.As the basic computing unit of supercomputers,the efficiency of compute nodes significantly impacts the processing efficiency of application workloads in different domains,such as scientific computing and artificial intelligence.This article systematically analyzes the architectures and key technologies of major supercomputers already built or soon to be constructed in the world,and summarizes the development trends of 100 TFLOPS compute nodes technology.This paper provides some valuable insights on how to address the challenges imposed by future 10-exascale and even zettascale supercomputers,which include improving energy efficiency,optimizing memory access,utilizing chiplet interconnection,and employing advanced packaging technology.This will help scholars in related fields to further think about future research directions,and help industrial employees to construct more practical and efficient compute nodes for the future supercomputers.展开更多
The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous flui...The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.展开更多
Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health ...Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health care,national defense and daily life.The artificial intelligence techniques are becoming useful as an alternate method of classical techniques or as a component of an integrated system.They are used to solve complicated problems in various fields and becoming increasingly popular nowadays.Especially,the investigation of human brain will promote the artificial intelligence techniques,utilizing the accumulating knowledge of neuroscience,brain-machine interface techniques,algorithms of spiking neural networks and neuromorphic supercomputers.Consequently,we provide a comprehensive survey of the research and motivations for brain-inspired artificial intelligence and its engineering over its history.The goals of this work are to provide a brief review of the research associated with brain-inspired artificial intelligence and its related engineering techniques,and to motivate further work by elucidating challenges in the field where new researches are required.展开更多
In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homoge...In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.展开更多
The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the pri...The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the principle frequence 50 MHz, the word length 64 byte,the main memory 256 Mb, two individual input / output subsystems, > 10~9 operations per sec-展开更多
China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)bet...China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)between 1978 and 1983.YH-1 played an important role in China’s national defense construction and national economic development.It made China one of the few countries in the world to successfully develop a supercomputer.Based on original archive documents,interviews with relevant personnel,and an analysis of the technological parameters of the supercomputers YH-1 in China and Cray-1 in the United States,this paper reviews in detail the historic process of the development of YH-1,analyzing its innovation and summarizing the experience and lessons learned from it.This analysis is significant for current military-civilian integration,and the commercialization of university research findings in China.展开更多
As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in m...As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in meteorology. We used field research and literature review methods to study the application of high performance computing in China’s meteorological department, and obtained the following results: 1) China Meteorological Department gradually established the first high-performance computer system since 1978. High-performance computing services can support operational numerical weather prediction models. 2) The Chinese meteorological department has always used the relatively advanced high-performance computing technology, and the business system capability has been continuously improved. The computing power has become an important symbol of the level of meteorological modernization. 3) High-performance computing technology and meteorological numerical forecasting applications are increasingly integrated, and continue to innovate and develop. 4) In the future, high-performance computing resource management will gradually transit from the current local pre-allocation mode to the local remote unified scheduling and shared use. In summary, we have come to the conclusion that the performance calculation business of the meteorological department will usher in a better tomorrow.展开更多
We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug ca...We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug candidates, the world’s fastest supercomputer picked 30 most effective and potential drugs. Twelve of them are under clinical trials outside Japan;some are being tested in Japan. The computer reduced the computation time from one year to 10 days when compared to second superfast computer of the world. Fugaku supercomputer was employed to know the behavior of airborne aerosol COVID-19 virus. 3Cs were suggested: avoid closed and crowded spaces and contacts to stop the pandemic spread. The progress in vaccine development and proper use and type of mask has also been described in this article. The article will benefit greatly to stop spreading and treating the pandemic COVID-19.展开更多
1|Introduction Achieving practical quantum computers(PQCs)each based on millions and even billons of integrated quantum bits(qubits)is essential for tackling real-world computational tasks involving quantum phenomena ...1|Introduction Achieving practical quantum computers(PQCs)each based on millions and even billons of integrated quantum bits(qubits)is essential for tackling real-world computational tasks involving quantum phenomena at atomic and molecular levels[1,2]such as drug discovery[3]and materials design[4];conventional supercomputers based on digital technology are inherently inefficient for such problems.Our recent analysis[5]of dimensional scalability for transmon qubit(i.e.,transmission line shunted plasma oscillation qubit[6]).展开更多
With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a repr...With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a representative class of supercomputing applications in parallel computing environments.Live migration is the process of transferring a running application to a different physical location with minimal downtime that enables a number of useful application management capabilities such as load balancing,resource consolidation,and fault tolerance.While several works have been studying live migration for MPI workloads,most require modifying the operating system kernel,which hinders its broader adoption in data centers.This paper uses container technology and the CRIU tool to implement checkpointing and restarting a single container in MPI containerized environments,while ensuring the continuous execution of the MPI program.The paper has validated the feasibility of live migration for MPI workloads by testing with NAS Parallel Benchmarks(NPB),LAMMPS,and GROMACS.The paper discusses the impact of migration on MPI timing functions and proposes solutions.The paper observes a slight improvement in MPI computational performance due to migration,while also noting an increase in communication latency during the iterative process.展开更多
The authors regret that the acknowledgment section in the final submitted version is unfortunately left out.The section should be``Acknowledgments This study is supported by National Natural Science Foundation of Chin...The authors regret that the acknowledgment section in the final submitted version is unfortunately left out.The section should be``Acknowledgments This study is supported by National Natural Science Foundation of China(41925017).The calculations were partly conducted at supercomputing center of University of Science and Technology of China.''展开更多
ion remains significant potential.This paper proposes an enhanced MapReduce framework for geo-distributed supercomputing Internet to minimize the necessity for data transmission across data centers.Leveraging hierarch...ion remains significant potential.This paper proposes an enhanced MapReduce framework for geo-distributed supercomputing Internet to minimize the necessity for data transmission across data centers.Leveraging hierarchical scheduling techniques,the framework optimizes data locality to mitigate network latency and bandwidth consumption during reduce operations,thereby reducing overall job execution times.The paper introduces a mathematical model for task scheduling within supercomputing Internet and formally describes the data transmission process among data centers.In the job scheduling phase,our framework facilitates efficient overlap of transferring and computing through pre-selected data centers.Meanwhile,in the data transmission phase,the framework aggregate data to reduce the frequency of transmission,thus alleviating the adverse effects on transmission of hierarchical network architecture.Comparative analysis with existing methods demonstrates the efficacy of the proposed framework in addressing similar computational challenges.Empirical evaluations underscore the effectiveness of our method in practice.展开更多
High-throughput computing tasks are a typical class of computational tasks in high-performance computing.They are commonly used for large-scale data analysis in high-energy physics,biomedicine,and other fields.These t...High-throughput computing tasks are a typical class of computational tasks in high-performance computing.They are commonly used for large-scale data analysis in high-energy physics,biomedicine,and other fields.These tasks usually include a large number of small tasks that are independent of each other but have a huge demand for computing resources.In the current HPC resource management pattern,users tend to estimate a certain amount of resource demand and assign tasks to be executed after the resources are satisfied,which often results in a long waiting time for resources.This paper proposes a non-intrusive Function-as-a-Service(FaaS)framework for the supercomputer Internet,called SuperFaaS.SuperFaaS is compatible with existing HPC resource management systems and supports elastic provisioning of HPC computing resources.SuperFaaS ensures the stable execution of tasks through resource reuse,monitoring,and fault-tolerance mechanisms.Tests show that SuperFaaS can achieve the service performance overhead of Openwhisk or even better.Using the drug screening software AutoDock-Vina to calculate 20,000 drug molecule permutations on a real supercomputing system,the results show that SuperFaaS can greatly reduce the total task completion time(including resource waiting time),and the requested resources can achieve more than 95%effective utilization.展开更多
Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities. To address the challenges of these capabilities toward exascale, we systematically analyze the major bottlenecks of paralle...Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities. To address the challenges of these capabilities toward exascale, we systematically analyze the major bottlenecks of parallel computing research from three perspectives: computational scale, computing efficiency, and programming productivity. For these bottlenecks, we propose a series of urgent key issues and coping strategies. This study will be useful in synchronizing development between the numerical computing capability and supercomputer peak performance.展开更多
文摘In June 2018, the United States claimed the No. 1 position in supercomputing according to TOP500, which ranks the top 500 most powerful computer systems in the world [1]. The US Department of Energy’s Summit machine (Fig. 1)[1] claimed this distinction, which previously had been held by China’s Sunway TaihuLight supercomputer.
基金supported by the National Natural Science Foundation of China(21688102,21803066,22003061,22173093)the Hefei National Laboratory for Physical Sciences at the Microscale(KF2020003)+6 种基金the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)the Anhui Initiative in Quantum Information Technologies(AHY090400)the CAS Project for Young Scientists in Basic Research(YSBR-005)the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)the Fundamental Research Funds for the Central Universities(WK2340000091,WK2060000018)the Hefei National Laboratory for Physical Sciences at the Microscale(SK2340002001)the Research Start-Up Grants(KY2340000094)and the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China.
文摘High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.
文摘With the rapid development of supercomputers, the scale and complexity are ever increasing, and the reliability and resilience are faced with larger challenges. There are many important technologies in fault tolerance, such as proacrive failure avoidance technologies based on fault prediction, reactive fault tolerance based on checkpoint, and scheduling technologies to improve reliability. Both qualitative and quantitative descriptions on characteristics of system faults are very critical for these technologies, This study analyzes the source of failures on two typical petascale supercomputers called Sunway BlueLight (based on multi-core CPUs) and Sunway TaihuLight (based on heterogeneous manycore CPUs). It uncovers some interesting fault characteristics and finds unknown correlation relationship among main components' faults. Finally the paper analyzes the failure time of the two supercomputers in various grains of resource and different time spans, and builds a uniform multi-dimensional failure time model for petascale supereomputers.
基金Acknowledgements This work was partially supported by National High-tech R&D Program of China (863 Program) (2012AA01A301, 2012AA010901), by Program for New Century Excellent Talents in University and by National Natural Science Foundation of China (Grant Nos. 61272142, 61103082, 61170261, and 61103193).
文摘With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the "reliability wall", which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay- 2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system.
基金The work was supported by the National Key Research and Development Program Program of China under Grant No.2016YFB0201800.
文摘An analysis of real-world operational data of Tianhe-1A(TH-1A)supercomputer system shows that chilled water data not only can reflect the status of a chiller system but also are related to supercomputer load.This study proposes AquaSee,a method that can predict the load and cooling system faults of supercomputers by using chilled water pressure and temperature data.This method is validated on the basis of real-world operational data of the TH-1A supercomputer system at the National Supercomputer Center in Tianjin.Datasets with various compositions are used to construct the prediction model,which is also established using different prediction sequence lengths.Experimental results show that the method that uses a combination of pressure and temperature data performs more effectively than that only consisting of either pressure or temperature data.The best inference sequence length is two points.Furthermore,an anomaly monitoring system is set up by using chilled water data to help engineers detect chiller system anomalies.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No.2012AA01A301the National Natural Science Foundation of China under Grant No.61170049
文摘Modern petascale and future exascale systems are massively heterogeneous architectures. Developing productive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive- based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards oifloading computations to accelerators (typically one), OpenMC alms to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asynchronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.
基金supported by National Natural Science Foundation of China-China Academy of General Technology Joint Fund for Basic Research(Grant no.62272475).
文摘With the Frontier supercomputer ranked first on the Top500 list,it marks the era of exascale computing power for supercomputers,employing the compute nodes with double-precision floating-point performance exceeding 100 TFLOPS.As the basic computing unit of supercomputers,the efficiency of compute nodes significantly impacts the processing efficiency of application workloads in different domains,such as scientific computing and artificial intelligence.This article systematically analyzes the architectures and key technologies of major supercomputers already built or soon to be constructed in the world,and summarizes the development trends of 100 TFLOPS compute nodes technology.This paper provides some valuable insights on how to address the challenges imposed by future 10-exascale and even zettascale supercomputers,which include improving energy efficiency,optimizing memory access,utilizing chiplet interconnection,and employing advanced packaging technology.This will help scholars in related fields to further think about future research directions,and help industrial employees to construct more practical and efficient compute nodes for the future supercomputers.
基金supported by National Key Research and Development Program of China under Grant 2024YFE0210800National Natural Science Foundation of China under Grant 62495062Beijing Natural Science Foundation under Grant L242017.
文摘The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.
文摘Exploring the human brain is perhaps the most challenging and fascinating scientific issue in the 21st century.It will facilitate the development of various aspects of the society,including economics,education,health care,national defense and daily life.The artificial intelligence techniques are becoming useful as an alternate method of classical techniques or as a component of an integrated system.They are used to solve complicated problems in various fields and becoming increasingly popular nowadays.Especially,the investigation of human brain will promote the artificial intelligence techniques,utilizing the accumulating knowledge of neuroscience,brain-machine interface techniques,algorithms of spiking neural networks and neuromorphic supercomputers.Consequently,we provide a comprehensive survey of the research and motivations for brain-inspired artificial intelligence and its engineering over its history.The goals of this work are to provide a brief review of the research associated with brain-inspired artificial intelligence and its related engineering techniques,and to motivate further work by elucidating challenges in the field where new researches are required.
基金This work is supported by the National Key Research and Development Plan program of the Ministry of Science and Technology of China(No.2016YFB0201100)Additionally,this work is supported by the National Laboratory for Marine Science and Technology(Qingdao)Major Project of the Aoshan Science and Technology Innovation Program(No.2018ASKJ01-04)the Open Fundation of Key Laboratory of Marine Science and Numerical Simulation,Ministry of Natural Resources(No.2021-YB-02).
文摘In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms.
文摘The first in China 10~9 sparallel supercomputer, named as Yinhe-Ⅱ, had been manufac-tured by Science-technological University of National Defence. The main feature of thesupercomputer are: 4-processor system, the principle frequence 50 MHz, the word length 64 byte,the main memory 256 Mb, two individual input / output subsystems, > 10~9 operations per sec-
文摘China’s first supercomputer capable of 100 million calculations per second was the YH-1,which was independently developed by the Institute of Computer Science at the National University of Defense Technology(NUDT)between 1978 and 1983.YH-1 played an important role in China’s national defense construction and national economic development.It made China one of the few countries in the world to successfully develop a supercomputer.Based on original archive documents,interviews with relevant personnel,and an analysis of the technological parameters of the supercomputers YH-1 in China and Cray-1 in the United States,this paper reviews in detail the historic process of the development of YH-1,analyzing its innovation and summarizing the experience and lessons learned from it.This analysis is significant for current military-civilian integration,and the commercialization of university research findings in China.
文摘As an important branch of information technology, high-performance computing has expanded its application field and its influence has been expanding. High-performance computing is always a key area of application in meteorology. We used field research and literature review methods to study the application of high performance computing in China’s meteorological department, and obtained the following results: 1) China Meteorological Department gradually established the first high-performance computer system since 1978. High-performance computing services can support operational numerical weather prediction models. 2) The Chinese meteorological department has always used the relatively advanced high-performance computing technology, and the business system capability has been continuously improved. The computing power has become an important symbol of the level of meteorological modernization. 3) High-performance computing technology and meteorological numerical forecasting applications are increasingly integrated, and continue to innovate and develop. 4) In the future, high-performance computing resource management will gradually transit from the current local pre-allocation mode to the local remote unified scheduling and shared use. In summary, we have come to the conclusion that the performance calculation business of the meteorological department will usher in a better tomorrow.
文摘We have demonstrated the application of the world’s fastest supercomputer Fugaku located in Japan to select the COVID-19 drugs and stopping the pandemic spread. Using computer simulation out of 2128 potential drug candidates, the world’s fastest supercomputer picked 30 most effective and potential drugs. Twelve of them are under clinical trials outside Japan;some are being tested in Japan. The computer reduced the computation time from one year to 10 days when compared to second superfast computer of the world. Fugaku supercomputer was employed to know the behavior of airborne aerosol COVID-19 virus. 3Cs were suggested: avoid closed and crowded spaces and contacts to stop the pandemic spread. The progress in vaccine development and proper use and type of mask has also been described in this article. The article will benefit greatly to stop spreading and treating the pandemic COVID-19.
基金financed by the Swedish Governmental Agency for Innovation Systems(Grant VINNOVA,2024-00436)the European QuantEra II Program(Grant 101017733)via the Swedish Research Council(Grant 2021-06025).
文摘1|Introduction Achieving practical quantum computers(PQCs)each based on millions and even billons of integrated quantum bits(qubits)is essential for tackling real-world computational tasks involving quantum phenomena at atomic and molecular levels[1,2]such as drug discovery[3]and materials design[4];conventional supercomputers based on digital technology are inherently inefficient for such problems.Our recent analysis[5]of dimensional scalability for transmon qubit(i.e.,transmission line shunted plasma oscillation qubit[6]).
基金supported by the National Key R&D Program of China Grant 2023YFB3002204。
文摘With supercomputing and intelligent computing convergence,the Supercomputer Internet is proposed to build,deploy,and run convergence applications using cloud-native technologies.Message Passing Interface(MPI)is a representative class of supercomputing applications in parallel computing environments.Live migration is the process of transferring a running application to a different physical location with minimal downtime that enables a number of useful application management capabilities such as load balancing,resource consolidation,and fault tolerance.While several works have been studying live migration for MPI workloads,most require modifying the operating system kernel,which hinders its broader adoption in data centers.This paper uses container technology and the CRIU tool to implement checkpointing and restarting a single container in MPI containerized environments,while ensuring the continuous execution of the MPI program.The paper has validated the feasibility of live migration for MPI workloads by testing with NAS Parallel Benchmarks(NPB),LAMMPS,and GROMACS.The paper discusses the impact of migration on MPI timing functions and proposes solutions.The paper observes a slight improvement in MPI computational performance due to migration,while also noting an increase in communication latency during the iterative process.
文摘The authors regret that the acknowledgment section in the final submitted version is unfortunately left out.The section should be``Acknowledgments This study is supported by National Natural Science Foundation of China(41925017).The calculations were partly conducted at supercomputing center of University of Science and Technology of China.''
基金supported by the National Natural Science Foundation of China(Grant Nos.62225205,92055213,62302160)the Natural Science Foundation of Hunan Province(Grant Nos.2024JJ6154)+1 种基金the Science and Technology Program of Changsha(kh2301011)Shenzhen Basic Research Project(Natural Science Foundation)(JCYJ20210324140002006).
文摘ion remains significant potential.This paper proposes an enhanced MapReduce framework for geo-distributed supercomputing Internet to minimize the necessity for data transmission across data centers.Leveraging hierarchical scheduling techniques,the framework optimizes data locality to mitigate network latency and bandwidth consumption during reduce operations,thereby reducing overall job execution times.The paper introduces a mathematical model for task scheduling within supercomputing Internet and formally describes the data transmission process among data centers.In the job scheduling phase,our framework facilitates efficient overlap of transferring and computing through pre-selected data centers.Meanwhile,in the data transmission phase,the framework aggregate data to reduce the frequency of transmission,thus alleviating the adverse effects on transmission of hierarchical network architecture.Comparative analysis with existing methods demonstrates the efficacy of the proposed framework in addressing similar computational challenges.Empirical evaluations underscore the effectiveness of our method in practice.
基金supported by National Key R&D Program of China Grant 2023YFB3002204.
文摘High-throughput computing tasks are a typical class of computational tasks in high-performance computing.They are commonly used for large-scale data analysis in high-energy physics,biomedicine,and other fields.These tasks usually include a large number of small tasks that are independent of each other but have a huge demand for computing resources.In the current HPC resource management pattern,users tend to estimate a certain amount of resource demand and assign tasks to be executed after the resources are satisfied,which often results in a long waiting time for resources.This paper proposes a non-intrusive Function-as-a-Service(FaaS)framework for the supercomputer Internet,called SuperFaaS.SuperFaaS is compatible with existing HPC resource management systems and supports elastic provisioning of HPC computing resources.SuperFaaS ensures the stable execution of tasks through resource reuse,monitoring,and fault-tolerance mechanisms.Tests show that SuperFaaS can achieve the service performance overhead of Openwhisk or even better.Using the drug screening software AutoDock-Vina to calculate 20,000 drug molecule permutations on a real supercomputing system,the results show that SuperFaaS can greatly reduce the total task completion time(including resource waiting time),and the requested resources can achieve more than 95%effective utilization.
基金Project supported by the National Natural Science Foundation of China(No.91430218)the National Key Technology R&D Program of China(Nos.2016YFB0201300 and 2017YFB0202103)
文摘Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities. To address the challenges of these capabilities toward exascale, we systematically analyze the major bottlenecks of parallel computing research from three perspectives: computational scale, computing efficiency, and programming productivity. For these bottlenecks, we propose a series of urgent key issues and coping strategies. This study will be useful in synchronizing development between the numerical computing capability and supercomputer peak performance.