This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wa...This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.展开更多
Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.O...Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.Over the past three decades,China has devoted considerable efforts on the development of supercomputing technologies,and made tremendous and remarkable achievements in this field.China’s supercomputing systems now rank among the world’s most powerful supercomputers.As Moore’s Law approaches its limit,the development of exascale supercomputing systems is facing a series of grand challenges in both technologies and applications.Based on the experiences of China’s supercomputing development over the past years,this paper analyzes the major technical challenges on the path towards exascale computing.Additionally,ongoing major R&D activities on next-generation supercomputing in China are introduced,and the possible solutions to achieve exascale computing,including co-design and convergence computing,are discussed.展开更多
The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic,memory,architectures,interconnections and other essential technologies.It is difficult to design an interconnec...The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic,memory,architectures,interconnections and other essential technologies.It is difficult to design an interconnection network that combines high performance with low power consumption.Therefore,building an interconnection network with high cost performance plays a critical role in building such a large scale system.Currently,torus-interconnected network like 6D-Torus possesses suitable properties for the petascale computer.However,the diameter within the torus-interconnected network is too long to achieve efficient global communication in the exascale computer.In addition,a direct connection method is not adaptive to the diverse characteristics of traffic.Here,we propose an architecture called Wormhole Optical Network(WON)for the exascale computer which is based on optical circuit switching.WON was designed to fully integrate into the electrical network of 6D-Torus.WON allows for the use of three novel technologies,namely the dynamic topology with optical links,algorithm of cross dimension order routing,and strategy of flow control for deadlock-free.We evaluated WON using both a prototype system and a simulator for the exascale computer.Our analysis shows that compared to the traditional electrical architecture,WON architecture reduced the time of data communication by 14–29%on exascale,a result obtained for a wide selection of diverse applications.Through enabling an SDN controller to adjust topology,WON maintains high utilization of optical links for inter-process communication from diverse applications.Further,we quantified WON’s flexibility of job deployment for mitigating hotspot traffic.We show that WON reduced latency by 20–35%in the large-range deployment and improved throughput by 30%in the long-distance deployment.展开更多
Non-equilibrium electronic quantum transport is crucial for existing and envisioned electronic,optoelectronic,and spintronic devices.Encompassing atomistic to mesoscopic length scales in the same nonequilibrium device...Non-equilibrium electronic quantum transport is crucial for existing and envisioned electronic,optoelectronic,and spintronic devices.Encompassing atomistic to mesoscopic length scales in the same nonequilibrium device simulations has been challenging due to the computational cost of high-fidelity coupled multiphysics and multiscale requirements.In this work,we present ELEQTRONeX(ELEctrostatic Quantum TRansport modeling Of Nanomaterials at eXascale),a massively parallel GPU-accelerated framework for self-consistently solving the nonequilibrium Green’s function formalism and electrostatics in complex device geometries.By customizing algorithms for GPU multithreading,we achieve significant improvement in computational time,and excellent scaling on up to 512 GPUs and billions of spatial grid cells.We validate our code by computing band structures,current-voltage characteristics,conductance,and drain-induced barrier lowering for various 3D configurations of carbon nanotube field-effect transistors,and demonstrate its suitability for complex device/material geometries where periodic approaches are not feasible,such as arrays of misaligned carbon nanotubes requiring fully 3D simulations.展开更多
Facing the challenges of the next generation exascale computing,National University of Defense Technology has developed a prototype system to explore opportunities,solutions,and limits toward the next generation Tianh...Facing the challenges of the next generation exascale computing,National University of Defense Technology has developed a prototype system to explore opportunities,solutions,and limits toward the next generation Tianhe system.This paper briefly introduces the prototype system,which is deployed at the National Supercomputer Center in Tianjin and has a theoretical peak performance of 3.15 Pflops.A total of 512 compute nodes are found where each node has three proprietary CPUs called Matrix-2000+.The system memory is 98.3 TB,and the storage is 1.4 PB in total.展开更多
The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overa...The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall' from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup,defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing.展开更多
With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challe...With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challenge. In this study, we start our discussion with the current 125-Pflops Sunway TaihuLight system in China and its related application challenges and solutions. Based on our current experience with Sunway TaihuLight, we provide a projection into the next decade and discuss potential challenges and possible trends we would probably observe in future high performance computing software.展开更多
The ever-increasing need for high performance in scientific computation and engineering applications will push high-perfornlance computing beyond the exascale. As an integral part of a supercomputing system, high- per...The ever-increasing need for high performance in scientific computation and engineering applications will push high-perfornlance computing beyond the exascale. As an integral part of a supercomputing system, high- performance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exa.scale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.展开更多
In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previousl...In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previously defined patterns to respond to requests.By considering the existence or non-existence of the responding structure for the proposition that leads to activation of the resource discovery and considering the functional advantage of the resource discovery,after the occurrence of the dynamic and interactive event that influences the functionality of the resource discovery,this function presents a new pattern for the resource discovery.Results indicated that following the dynamic and interactive event that impacts the functionality of the resource discovery,in 60%of cases,the introduced function could provide a responding structure for the request based on a previously defined system.展开更多
The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and ...The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and interactive nature and itseffect on load balancing,this article introduces a framework for managing load balancing that does not need to study thedynamic and interactive nature.This framework proposes a mathematical scheme for the functionality of load-balancingelements and redefines its functions and components.The redefinition makes it possible to determine the constituent partsof the framework and their functionality without the need to analyze the dynamic and interactive nature of the system.Theproposed framework can manage and control dynamic and interactive events by reviewing changes in the functionality ofresources,the pattern of data collection to execute processes related to the load balancer,and a Scalable tool.In addition toperforming the load balancer’s functionality,our framework can continue to function under dynamic and interactive eventsin distributed exascale systems.On average,this framework has a 43%improvement,unable to respond to dynamic andinteractive requests.展开更多
In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance com...In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.展开更多
High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties ...High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.展开更多
Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/...Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/O requests per second,I/O systems always suffer from heavy workloads and impede the overall performance.How to allocate I/O resources and guarantee the QoS(Quality of Service)for each individual application is becoming an increasingly important question.In this paper,we propose SDQoS,a software-defined QoS framework with the token bucket algorithm,aiming to meet the I/O requirements of concurrent applications contending for the I/O resources and improve the overall performance of the I/O systems.Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%-10%deviation and improve the performance by 20%in extreme cases.展开更多
Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple ...Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.展开更多
With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more d...With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.展开更多
This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnec...This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnection network,performance evaluation and parallel algorithm.Prof.Yutong Lu summarizes the recent progress of supercomputing system in China by introducing the three pre-Exascale supercomputers.展开更多
With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneit...With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.展开更多
文摘This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.
基金supported by National Key R&D project of China under Grant no.2017YFB0202201the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant no.2016ZT06D211.
文摘Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.Over the past three decades,China has devoted considerable efforts on the development of supercomputing technologies,and made tremendous and remarkable achievements in this field.China’s supercomputing systems now rank among the world’s most powerful supercomputers.As Moore’s Law approaches its limit,the development of exascale supercomputing systems is facing a series of grand challenges in both technologies and applications.Based on the experiences of China’s supercomputing development over the past years,this paper analyzes the major technical challenges on the path towards exascale computing.Additionally,ongoing major R&D activities on next-generation supercomputing in China are introduced,and the possible solutions to achieve exascale computing,including co-design and convergence computing,are discussed.
基金supported in part by National Key R&D Program of China 2016YFB0200204National Natural Science Foundation of China 61702484National Program on Key Research Project 2016YFB0200300.
文摘The exascale computer will be built in the near future thanks to rapid innovations in semiconductor logic,memory,architectures,interconnections and other essential technologies.It is difficult to design an interconnection network that combines high performance with low power consumption.Therefore,building an interconnection network with high cost performance plays a critical role in building such a large scale system.Currently,torus-interconnected network like 6D-Torus possesses suitable properties for the petascale computer.However,the diameter within the torus-interconnected network is too long to achieve efficient global communication in the exascale computer.In addition,a direct connection method is not adaptive to the diverse characteristics of traffic.Here,we propose an architecture called Wormhole Optical Network(WON)for the exascale computer which is based on optical circuit switching.WON was designed to fully integrate into the electrical network of 6D-Torus.WON allows for the use of three novel technologies,namely the dynamic topology with optical links,algorithm of cross dimension order routing,and strategy of flow control for deadlock-free.We evaluated WON using both a prototype system and a simulator for the exascale computer.Our analysis shows that compared to the traditional electrical architecture,WON architecture reduced the time of data communication by 14–29%on exascale,a result obtained for a wide selection of diverse applications.Through enabling an SDN controller to adjust topology,WON maintains high utilization of optical links for inter-process communication from diverse applications.Further,we quantified WON’s flexibility of job deployment for mitigating hotspot traffic.We show that WON reduced latency by 20–35%in the large-range deployment and improved throughput by 30%in the long-distance deployment.
基金supported by the U.S.Department of Energy,Office of Science,under the Microelectronics Co-Design Research Program(Co-Design and Integration of Nano-sensors on CMOS)the Microelectronics Science Research Centers(Nanoscale hybrids:a new paradigm for energy-efficient optoelectronics)+3 种基金Accelerate Innovations in Emerging Technologies Program(Phonon Control for Next-Generation Superconducting Systems and Sensors)under Contract DE-AC02-05-CH11231supported by the same programs under Contract DE-NA-0003525Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia,LLC.,a wholly owned subsidiary of Honeywell International,Inc.,for the U.S.Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525the National Energy Research Scientific Computing Center(NERSC),a Department of Energy Office of Science User Facility using NERSC award ASCR-ERCAP0026882.
文摘Non-equilibrium electronic quantum transport is crucial for existing and envisioned electronic,optoelectronic,and spintronic devices.Encompassing atomistic to mesoscopic length scales in the same nonequilibrium device simulations has been challenging due to the computational cost of high-fidelity coupled multiphysics and multiscale requirements.In this work,we present ELEQTRONeX(ELEctrostatic Quantum TRansport modeling Of Nanomaterials at eXascale),a massively parallel GPU-accelerated framework for self-consistently solving the nonequilibrium Green’s function formalism and electrostatics in complex device geometries.By customizing algorithms for GPU multithreading,we achieve significant improvement in computational time,and excellent scaling on up to 512 GPUs and billions of spatial grid cells.We validate our code by computing band structures,current-voltage characteristics,conductance,and drain-induced barrier lowering for various 3D configurations of carbon nanotube field-effect transistors,and demonstrate its suitability for complex device/material geometries where periodic approaches are not feasible,such as arrays of misaligned carbon nanotubes requiring fully 3D simulations.
基金supported by the National Key Research and Development Program of China(No.2016YFB0200401)。
文摘Facing the challenges of the next generation exascale computing,National University of Defense Technology has developed a prototype system to explore opportunities,solutions,and limits toward the next generation Tianhe system.This paper briefly introduces the prototype system,which is deployed at the National Supercomputer Center in Tianjin and has a theoretical peak performance of 3.15 Pflops.A total of 512 compute nodes are found where each node has three proprietary CPUs called Matrix-2000+.The system memory is 98.3 TB,and the storage is 1.4 PB in total.
基金the National Natural Science Foundation of China(Nos.61272141 and 61120106005)the National High-Tech R&D Program(863)of China(No.2012AA01A301)
文摘The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall' from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup,defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing.
基金Project supported by the National Key Technology R&D Program of China(No.2016YFA0602200)
文摘With various exascale systems in different countries planned over the next three to five years, developing application software for such unprecedented computing capabilities and parallel scaling becomes a major challenge. In this study, we start our discussion with the current 125-Pflops Sunway TaihuLight system in China and its related application challenges and solutions. Based on our current experience with Sunway TaihuLight, we provide a projection into the next decade and discuss potential challenges and possible trends we would probably observe in future high performance computing software.
基金Project supported by the National Natural Science Foundation of China(Nos.91430214 and 61732018)
文摘The ever-increasing need for high performance in scientific computation and engineering applications will push high-perfornlance computing beyond the exascale. As an integral part of a supercomputing system, high- performance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exa.scale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.
文摘In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previously defined patterns to respond to requests.By considering the existence or non-existence of the responding structure for the proposition that leads to activation of the resource discovery and considering the functional advantage of the resource discovery,after the occurrence of the dynamic and interactive event that influences the functionality of the resource discovery,this function presents a new pattern for the resource discovery.Results indicated that following the dynamic and interactive event that impacts the functionality of the resource discovery,in 60%of cases,the introduced function could provide a responding structure for the request based on a previously defined system.
文摘The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and interactive nature and itseffect on load balancing,this article introduces a framework for managing load balancing that does not need to study thedynamic and interactive nature.This framework proposes a mathematical scheme for the functionality of load-balancingelements and redefines its functions and components.The redefinition makes it possible to determine the constituent partsof the framework and their functionality without the need to analyze the dynamic and interactive nature of the system.Theproposed framework can manage and control dynamic and interactive events by reviewing changes in the functionality ofresources,the pattern of data collection to execute processes related to the load balancer,and a Scalable tool.In addition toperforming the load balancer’s functionality,our framework can continue to function under dynamic and interactive eventsin distributed exascale systems.On average,this framework has a 43%improvement,unable to respond to dynamic andinteractive requests.
文摘In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.
基金supported by the National Natural Science Foundation of China(21688102,21803066,22003061,22173093)the Hefei National Laboratory for Physical Sciences at the Microscale(KF2020003)+6 种基金the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)the Anhui Initiative in Quantum Information Technologies(AHY090400)the CAS Project for Young Scientists in Basic Research(YSBR-005)the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)the Fundamental Research Funds for the Central Universities(WK2340000091,WK2060000018)the Hefei National Laboratory for Physical Sciences at the Microscale(SK2340002001)the Research Start-Up Grants(KY2340000094)and the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China.
文摘High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.
基金supported by the National Key R&D Program of China(No.2017YFC0803700)NSFC(No.61772218,61433019)the Outstanding Youth Foundation of Hubei Province(No.2016CFA032).
文摘Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/O requests per second,I/O systems always suffer from heavy workloads and impede the overall performance.How to allocate I/O resources and guarantee the QoS(Quality of Service)for each individual application is becoming an increasingly important question.In this paper,we propose SDQoS,a software-defined QoS framework with the token bucket algorithm,aiming to meet the I/O requirements of concurrent applications contending for the I/O resources and improve the overall performance of the I/O systems.Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%-10%deviation and improve the performance by 20%in extreme cases.
基金[King Abdulaziz University][Deanship of Scientific Research]Grant Number[KEP-PHD-20-611-42].
文摘Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.
基金the support from U.S.Department of Energy through its Advanced Grid Modeling program,Exascale Computing Program(ECP)The Grid Modernization Laboratory Consortium(GMLC)+1 种基金Advanced Research Projects Agency-Energy(ARPA-E),The National Quantum Information Science Research Centers,Co-design Center for Quantum Advantage(C2QA)the Office of Advanced Scientific Computing Research(ASCR).
文摘With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.
文摘This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnection network,performance evaluation and parallel algorithm.Prof.Yutong Lu summarizes the recent progress of supercomputing system in China by introducing the three pre-Exascale supercomputers.
文摘With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.