Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.O...Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.Over the past three decades,China has devoted considerable efforts on the development of supercomputing technologies,and made tremendous and remarkable achievements in this field.China’s supercomputing systems now rank among the world’s most powerful supercomputers.As Moore’s Law approaches its limit,the development of exascale supercomputing systems is facing a series of grand challenges in both technologies and applications.Based on the experiences of China’s supercomputing development over the past years,this paper analyzes the major technical challenges on the path towards exascale computing.Additionally,ongoing major R&D activities on next-generation supercomputing in China are introduced,and the possible solutions to achieve exascale computing,including co-design and convergence computing,are discussed.展开更多
In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previousl...In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previously defined patterns to respond to requests.By considering the existence or non-existence of the responding structure for the proposition that leads to activation of the resource discovery and considering the functional advantage of the resource discovery,after the occurrence of the dynamic and interactive event that influences the functionality of the resource discovery,this function presents a new pattern for the resource discovery.Results indicated that following the dynamic and interactive event that impacts the functionality of the resource discovery,in 60%of cases,the introduced function could provide a responding structure for the request based on a previously defined system.展开更多
The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and ...The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and interactive nature and itseffect on load balancing,this article introduces a framework for managing load balancing that does not need to study thedynamic and interactive nature.This framework proposes a mathematical scheme for the functionality of load-balancingelements and redefines its functions and components.The redefinition makes it possible to determine the constituent partsof the framework and their functionality without the need to analyze the dynamic and interactive nature of the system.Theproposed framework can manage and control dynamic and interactive events by reviewing changes in the functionality ofresources,the pattern of data collection to execute processes related to the load balancer,and a Scalable tool.In addition toperforming the load balancer’s functionality,our framework can continue to function under dynamic and interactive eventsin distributed exascale systems.On average,this framework has a 43%improvement,unable to respond to dynamic andinteractive requests.展开更多
Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/...Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/O requests per second,I/O systems always suffer from heavy workloads and impede the overall performance.How to allocate I/O resources and guarantee the QoS(Quality of Service)for each individual application is becoming an increasingly important question.In this paper,we propose SDQoS,a software-defined QoS framework with the token bucket algorithm,aiming to meet the I/O requirements of concurrent applications contending for the I/O resources and improve the overall performance of the I/O systems.Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%-10%deviation and improve the performance by 20%in extreme cases.展开更多
With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more d...With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.展开更多
The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overa...The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall' from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup,defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing.展开更多
Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple ...Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.展开更多
With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneit...With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.展开更多
In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance com...In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.展开更多
With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for sc...With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.展开更多
基金supported by National Key R&D project of China under Grant no.2017YFB0202201the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant no.2016ZT06D211.
文摘Supercomputing technology has been evolving rapidly in an accelerated way.It has made a significant impact on scientific research,technology innovation,economic and social development,and the life of ordinary people.Over the past three decades,China has devoted considerable efforts on the development of supercomputing technologies,and made tremendous and remarkable achievements in this field.China’s supercomputing systems now rank among the world’s most powerful supercomputers.As Moore’s Law approaches its limit,the development of exascale supercomputing systems is facing a series of grand challenges in both technologies and applications.Based on the experiences of China’s supercomputing development over the past years,this paper analyzes the major technical challenges on the path towards exascale computing.Additionally,ongoing major R&D activities on next-generation supercomputing in China are introduced,and the possible solutions to achieve exascale computing,including co-design and convergence computing,are discussed.
文摘In this article,the function used for the functionality of the resource discovery in the equilibrium state analysis for operation describing the functionality of the resource discovery is introduced based on previously defined patterns to respond to requests.By considering the existence or non-existence of the responding structure for the proposition that leads to activation of the resource discovery and considering the functional advantage of the resource discovery,after the occurrence of the dynamic and interactive event that influences the functionality of the resource discovery,this function presents a new pattern for the resource discovery.Results indicated that following the dynamic and interactive event that impacts the functionality of the resource discovery,in 60%of cases,the introduced function could provide a responding structure for the request based on a previously defined system.
文摘The dynamic and interactive nature of Distributed Exascale Computing System leads to a situation where the load balancerlacks the proper pattern for the solution.In addition to analyzing and reviewing the dynamic and interactive nature and itseffect on load balancing,this article introduces a framework for managing load balancing that does not need to study thedynamic and interactive nature.This framework proposes a mathematical scheme for the functionality of load-balancingelements and redefines its functions and components.The redefinition makes it possible to determine the constituent partsof the framework and their functionality without the need to analyze the dynamic and interactive nature of the system.Theproposed framework can manage and control dynamic and interactive events by reviewing changes in the functionality ofresources,the pattern of data collection to execute processes related to the load balancer,and a Scalable tool.In addition toperforming the load balancer’s functionality,our framework can continue to function under dynamic and interactive eventsin distributed exascale systems.On average,this framework has a 43%improvement,unable to respond to dynamic andinteractive requests.
基金supported by the National Key R&D Program of China(No.2017YFC0803700)NSFC(No.61772218,61433019)the Outstanding Youth Foundation of Hubei Province(No.2016CFA032).
文摘Supercomputers’capability is approaching the exascale level,which enables large computing systems to run more jobs concurrently.Since modern data-intensive scientific applications can sometimes produce millions of I/O requests per second,I/O systems always suffer from heavy workloads and impede the overall performance.How to allocate I/O resources and guarantee the QoS(Quality of Service)for each individual application is becoming an increasingly important question.In this paper,we propose SDQoS,a software-defined QoS framework with the token bucket algorithm,aiming to meet the I/O requirements of concurrent applications contending for the I/O resources and improve the overall performance of the I/O systems.Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%-10%deviation and improve the performance by 20%in extreme cases.
基金the support from U.S.Department of Energy through its Advanced Grid Modeling program,Exascale Computing Program(ECP)The Grid Modernization Laboratory Consortium(GMLC)+1 种基金Advanced Research Projects Agency-Energy(ARPA-E),The National Quantum Information Science Research Centers,Co-design Center for Quantum Advantage(C2QA)the Office of Advanced Scientific Computing Research(ASCR).
文摘With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.
基金the National Natural Science Foundation of China(Nos.61272141 and 61120106005)the National High-Tech R&D Program(863)of China(No.2012AA01A301)
文摘The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive,and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall' from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup,defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing.
基金[King Abdulaziz University][Deanship of Scientific Research]Grant Number[KEP-PHD-20-611-42].
文摘Recently,researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems(HPCs)to achieve exascale by applying parallelism at multiple levels.Combining different programming paradigms,such as Message Passing Interface(MPI),Open Multiple Processing(OpenMP),and Open Accelerators(OpenACC),can increase computation speed and improve performance.During the integration of multiple models,the probability of runtime errors increases,making their detection difficult,especially in the absence of testing techniques that can detect these errors.Numerous studies have been conducted to identify these errors,but no technique exists for detecting errors in three-level programming models.Despite the increasing research that integrates the three programming models,MPI,OpenMP,and OpenACC,a testing technology to detect runtime errors,such as deadlocks and race conditions,which can arise from this integration has not been developed.Therefore,this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect.For the first time,this paper presents a classification of operational errors that can result from the integration of the three models.This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++programming language that uses the triple programming models MPI,OpenMP,and OpenACC.This hybrid technology combines static technology and dynamic technology,given that some errors can be detected using static techniques,whereas others can be detected using dynamic technology.The hybrid technique can detect more errors because it combines two distinct technologies.The proposed static technology detects a wide range of error types in less time,whereas a portion of the potential errors that may or may not occur depending on the 4502 CMC,2023,vol.74,no.2 operating environment are left to the dynamic technology,which completes the validation.
文摘With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.
文摘In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.
基金This work was supported by the National Key Research and Development Program of China under Grant No. 2016YFB0200501, tile National Natural Science Foundation of China under Grant Nos. 61332009 and 61521092, the Open Project Program of State Key Laboratory of Mathematical Engineering and Advanced Computing under Grant No. 2016A04 and tile Beijing Municipal Science and Technology Commission under Grant No. Z15010101009, the Open Project Program of State Key Laboratory of Computer Architecture under Grant No. CARCH201503, China Scholarship Council, and Beijing Advanced hmovation Center for hnaging Technology.
文摘With the coming of exascale supercomputing era, power efficiency has become the most important obstacle to build an exascale system. Dataflow architecture has native advantage in achieving high power efficiency for scientific applications. However, the state-of-the-art dataflow architectures fail to exploit high parallelism for loop processing. To address this issue, we propose a pipelining loop optimization method (PLO), which makes iterations in loops flow in the processing element (PE) array of dataflow accelerator. This method consists of two techniques, architecture-assisted hardware iteration and instruction-assisted software iteration. In hardware iteration execution model, an on-chip loop controller is designed to generate loop indexes, reducing the complexity of computing kernel and laying a good f(mndation for pipelining execution. In software iteration execution model, additional loop instructions are presented to solve the iteration dependency problem. Via these two techniques, the average number of instructions ready to execute per cycle is increased to keep floating-point unit busy. Simulation results show that our proposed method outperforms static and dynamic loop execution model in floating-point efficiency by 2.45x and 1.1x on average, respectively, while the hardware cost of these two techniques is acceptable.