The processor and the main memory in the traditional computing system cannot satisfy the requirements of the emerging large-scale applications in terms of computing power and memory capacity.Tackling with such challen...The processor and the main memory in the traditional computing system cannot satisfy the requirements of the emerging large-scale applications in terms of computing power and memory capacity.Tackling with such challenges,heterogeneous computing systems,which consist of specialized accelerators and high-performance storage devices,have demonstrated their superiority in various computing domains.However,data migration between the accelerators and the storage devices becomes the major performance bottleneck in such computing systems owing to the long data path imposed by the stale Von Neumann architecture.There already exist plenty of discussions on the software and hardware techniques to resolve the data migration issues in the heterogeneous computing system.In this paper,we present a survey of these techniques with respect to the system designs,architectural innovations,and application-level optimizations.We expect our survey would aid the development of the research community and inspire the researchers,who are interested in the relevant areas.展开更多
High performance computing(HPC)has been one of the primary drivers for advancing modern scientific research and development.For example,using numerical simulations enabled by HPC,scientists are now able to conduct pre...High performance computing(HPC)has been one of the primary drivers for advancing modern scientific research and development.For example,using numerical simulations enabled by HPC,scientists are now able to conduct predictive research,which allows a better understanding of the essence of nature and discovering the law of the physical world.Many engineers now exploit HPC in their design and optimization of novel products and complex engineering schemes,and such simulation-based approaches are much more superior to the traditional prototyping methods.展开更多
With the HPC systems gradually evolving into the era of exascale,the algorithms and applications are facing challenges of both architectural changes,significantly increased level of parallelism and heterogeneity,as we...With the HPC systems gradually evolving into the era of exascale,the algorithms and applications are facing challenges of both architectural changes,significantly increased level of parallelism and heterogeneity,as well as new domain requirements from big data analytics and machine learning.This issue focuses on the novel ideas,methods,as well as efforts of software development for resolving the above challenges,and to fill the gap between applications and hardware systems in the coming exascale era.展开更多
Sparse matrix operations are widely used in computational science and engineering applications such as quantum chemistry and finite element analysis,as well as modern machine learning scenarios such as social network ...Sparse matrix operations are widely used in computational science and engineering applications such as quantum chemistry and finite element analysis,as well as modern machine learning scenarios such as social network and compressed deep neural networks.The University of California,Berkeley in the famous article‘A View of the Parallel Computing Landscape’,Asanovic et al.展开更多
High performance extended math library is used by many scientific engineering and artificial intelligence applications,which usually involves many common mathematical computations and the most time-consuming functions...High performance extended math library is used by many scientific engineering and artificial intelligence applications,which usually involves many common mathematical computations and the most time-consuming functions.In order to take full advantage of the high performance processors,these functions need to be parallelized and optimized intensively.It is common for processor vendors to supply highly optimized commercial math library.For example,Intel maintains oneMKL,and NVIDIA has cuBLAS,cuSolver,and cuFFT.In this paper,we release a new-generation high-performance extended math library,xMath 2.0,specifically designed for the SW26010-Pro many-core processor,which includes four major modules:BLAS,LAPACK,FFT,and SPARSE.Each module is optimized for the domestic SW26010-Pro processor,leveraging parallelization on the many-core CPE mesh and optimization techniques such as assembly instruction rearrangement and computation-communication overlapping.In xMath2.0,the BLAS module has an average performance increase of 146.02 times over the MPE version of GotoBLAS2,and the performance of BLAS level 3 functions has increased by 393.95 times.The LAPACK module(calling xMath BLAS)is 233.44 times better than LAPACK(calling GotoBLAS2).And the FFT module is 47.63 times faster than FFTW3.3.2.The library has been deployed on the domestic Sunway TaihuLight Pro supercomputer,which have been used by dozens of users.展开更多
Modern data-centric applications,such as artifical artificial intelligence(AI)workloads,graph data analysis,IoT and mobile systems,have emerged and been widely used in our daily life.These applications keep raising th...Modern data-centric applications,such as artifical artificial intelligence(AI)workloads,graph data analysis,IoT and mobile systems,have emerged and been widely used in our daily life.These applications keep raising the requirements of large capacity,high performance,low power consumption,and high reliability for the whole memory hierarchy,which may be difficult to be satisfied by traditional memory architectures and systems.On the other hand,with the rapid advancement of memory technologies,various emerging memory technologies have been proposed to mitigate the problems mentioned above.These emerging memory technologies have the advantages of high density,low standy power,etc.However,they also face the challenges of programming overhead,limited lifetime,and reliablility issues.Thus,to leverage the uniques features and handle the limitations of these emerging memory technologies,we are expecting innovations in memory architecture and system designs.展开更多
High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,...High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,these HPC systems have become much larger and more complex,featuring multi-/many-core processors with thousands of cores that are often connected to form a large-scale computing system.展开更多
Recent progress on large-scale machine learning(ML)and deep learning(DL)have demonstrated great potential in both traditional artificial intelligence(AI)applications in computer science(such as natural language proces...Recent progress on large-scale machine learning(ML)and deep learning(DL)have demonstrated great potential in both traditional artificial intelligence(AI)applications in computer science(such as natural language processing,knowledge engineering and computer vision)and AI-enabled applications in scientific domains(such as AlphaFold2).展开更多
With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-...With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-performance systems typically operate in a distributed environment.Today,high-performance distributed computing systems are the foundations of many important infrastructures.However,the diverse development of hardware platforms,the spurt of data growth,and the rapid changes in applications are increasingly challenging resource management,energy efficiency,performance tuning,scalability,and fault tolerance.展开更多
In the current era of AI and Big Data,an increasing and significant amount of computing power is needed for many applications and algorithms such as AIGC models,face detection,autonomous driving and atmosphere simulat...In the current era of AI and Big Data,an increasing and significant amount of computing power is needed for many applications and algorithms such as AIGC models,face detection,autonomous driving and atmosphere simulation.Recently,there is a significant amount of interest among the community in improving AI and big data applications with heterogenous computing,which refers to a computing system using different types of computing cores such as GPU,NPU,ASIC,DSP and FPGA.It can improve the performane and enery efficiency by dispatching different workloads to processors that are designed for specialized processing and specific purposes.This issue aims to cover challenges that can hamper efficiency and utilization for AI and big data applications on heterogenous computing systems,such as efficient utilization of the raw hardware,I/O management,task scheduling,etc.展开更多
High Performance Computing(HPC)performs more complex tasks with the application of parallel and distributed algorithms than computing on a single node.And HPC continuously advances in traditional domains of science an...High Performance Computing(HPC)performs more complex tasks with the application of parallel and distributed algorithms than computing on a single node.And HPC continuously advances in traditional domains of science and engineering.However,the emergence of novel applications calls for the lower latency of the network,which pushed the horizon of edge computing.Today,the diversity of HPC systems is more extensive,and rapid changes in hardware platforms and program environments increasingly challenge the high concurrency exploitation,hybrid resource management,energy efficiency,performance tuning,scalability and fault-tolerance.展开更多
In this article the author De Dong was incorrectly flagged as a corresponding author.The correct corresponding author of this article is Nurbol Luktarhan.The Original Article has been corrected.Publisher’s note Sprin...In this article the author De Dong was incorrectly flagged as a corresponding author.The correct corresponding author of this article is Nurbol Luktarhan.The Original Article has been corrected.Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.展开更多
With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneit...With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.展开更多
This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnec...This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnection network,performance evaluation and parallel algorithm.Prof.Yutong Lu summarizes the recent progress of supercomputing system in China by introducing the three pre-Exascale supercomputers.展开更多
This issue focuses on the topic“Storage System and Technology”.Data storage systems are an important part of highperformance computing(HPC).HPC cannot be separated from the support of high-performance storage system...This issue focuses on the topic“Storage System and Technology”.Data storage systems are an important part of highperformance computing(HPC).HPC cannot be separated from the support of high-performance storage systems and technologies.New storage technologies and techniques continue to be applied to HPC,such as non-volatile memory technologies,solid state storage,parallel I/O,storage performance and scalability,storage virtualization,and deduplication.展开更多
HPCChina is the annual conference established in 2005 by the Technical Committee on High Performance Computing(TCHPC)of China Computer Federation(CCF).The HPCChina is the leading venue in China for presenting high-qua...HPCChina is the annual conference established in 2005 by the Technical Committee on High Performance Computing(TCHPC)of China Computer Federation(CCF).The HPCChina is the leading venue in China for presenting high-quality original research in all fields related to high performance computing.In 2023,the conference received a total of 123 submissions and accepted 70 papers based on a strict peer-review procedure.The special issue invited eight papers of high quality.They can be divided into two kinds,algorithm research or system study.The first four papers focus on parallel algorithms in numerical methods.The second four papers study scheduling strategy,algorithm selector,communication protocol and performance evaluation,all from the system perspective.We provide a short summary of each paper as follows.展开更多
In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance com...In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.展开更多
As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries a...As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries around the world,and have been used in a wide range of areas.Technology of designing,building,managing,and utilizing high performance computers has been developing dramatically.With the debut of Frontier in ORNL,the world has stepped into the Exascale era.展开更多
The modern computing paradigm is experiencing many revolutions from almost every aspects:the introduction of emerging memories like spintronic memory and resistive memory dramatically as well as quantum computing chan...The modern computing paradigm is experiencing many revolutions from almost every aspects:the introduction of emerging memories like spintronic memory and resistive memory dramatically as well as quantum computing change the design of memory hierarchy and interface;new applications such as artificial intelligence and deep learning trigger wide adaptation of deep neural network accelerators and neuromorphic computing circuits;new computing models induced by these applications such as in-memory computing also inspires the corresponding circuit-and architecturelevel practices.展开更多
In this paper,we propose two mixed precision algorithms for block-Jacobi preconditioner(BJAC):a fixed low precision strategy and an adaptive precision strategy.We evaluate the performance improvement of the proposed m...In this paper,we propose two mixed precision algorithms for block-Jacobi preconditioner(BJAC):a fixed low precision strategy and an adaptive precision strategy.We evaluate the performance improvement of the proposed mixed precision BJAC preconditioners combined with the preconditioned conjugate gradient(PCG)method using problems including diffusion equations and radiation hydrodynamics equations.Numerical results show that,compared with the uniform high precision PCG,the mixed precision preconditioners can achieve speedups from 1.3×to 1.8×without losing accuracy.Furthermore,we observe the phenomenon of convergence delay in some test cases for the mixed precision preconditioners,and analyse the correlation between matrix features and convergence delay behaviors.Some interesting conclusions are obtained which are significant and valuable for the design of more efficient mixed precision preconditioners.展开更多
基金supported by Peking University start-up package(7100603645).
文摘The processor and the main memory in the traditional computing system cannot satisfy the requirements of the emerging large-scale applications in terms of computing power and memory capacity.Tackling with such challenges,heterogeneous computing systems,which consist of specialized accelerators and high-performance storage devices,have demonstrated their superiority in various computing domains.However,data migration between the accelerators and the storage devices becomes the major performance bottleneck in such computing systems owing to the long data path imposed by the stale Von Neumann architecture.There already exist plenty of discussions on the software and hardware techniques to resolve the data migration issues in the heterogeneous computing system.In this paper,we present a survey of these techniques with respect to the system designs,architectural innovations,and application-level optimizations.We expect our survey would aid the development of the research community and inspire the researchers,who are interested in the relevant areas.
文摘High performance computing(HPC)has been one of the primary drivers for advancing modern scientific research and development.For example,using numerical simulations enabled by HPC,scientists are now able to conduct predictive research,which allows a better understanding of the essence of nature and discovering the law of the physical world.Many engineers now exploit HPC in their design and optimization of novel products and complex engineering schemes,and such simulation-based approaches are much more superior to the traditional prototyping methods.
文摘With the HPC systems gradually evolving into the era of exascale,the algorithms and applications are facing challenges of both architectural changes,significantly increased level of parallelism and heterogeneity,as well as new domain requirements from big data analytics and machine learning.This issue focuses on the novel ideas,methods,as well as efforts of software development for resolving the above challenges,and to fill the gap between applications and hardware systems in the coming exascale era.
文摘Sparse matrix operations are widely used in computational science and engineering applications such as quantum chemistry and finite element analysis,as well as modern machine learning scenarios such as social network and compressed deep neural networks.The University of California,Berkeley in the famous article‘A View of the Parallel Computing Landscape’,Asanovic et al.
基金supported in part by Special Project on High-Performance Computing under the National Key R&D Program(2020YFB0204601).
文摘High performance extended math library is used by many scientific engineering and artificial intelligence applications,which usually involves many common mathematical computations and the most time-consuming functions.In order to take full advantage of the high performance processors,these functions need to be parallelized and optimized intensively.It is common for processor vendors to supply highly optimized commercial math library.For example,Intel maintains oneMKL,and NVIDIA has cuBLAS,cuSolver,and cuFFT.In this paper,we release a new-generation high-performance extended math library,xMath 2.0,specifically designed for the SW26010-Pro many-core processor,which includes four major modules:BLAS,LAPACK,FFT,and SPARSE.Each module is optimized for the domestic SW26010-Pro processor,leveraging parallelization on the many-core CPE mesh and optimization techniques such as assembly instruction rearrangement and computation-communication overlapping.In xMath2.0,the BLAS module has an average performance increase of 146.02 times over the MPE version of GotoBLAS2,and the performance of BLAS level 3 functions has increased by 393.95 times.The LAPACK module(calling xMath BLAS)is 233.44 times better than LAPACK(calling GotoBLAS2).And the FFT module is 47.63 times faster than FFTW3.3.2.The library has been deployed on the domestic Sunway TaihuLight Pro supercomputer,which have been used by dozens of users.
文摘Modern data-centric applications,such as artifical artificial intelligence(AI)workloads,graph data analysis,IoT and mobile systems,have emerged and been widely used in our daily life.These applications keep raising the requirements of large capacity,high performance,low power consumption,and high reliability for the whole memory hierarchy,which may be difficult to be satisfied by traditional memory architectures and systems.On the other hand,with the rapid advancement of memory technologies,various emerging memory technologies have been proposed to mitigate the problems mentioned above.These emerging memory technologies have the advantages of high density,low standy power,etc.However,they also face the challenges of programming overhead,limited lifetime,and reliablility issues.Thus,to leverage the uniques features and handle the limitations of these emerging memory technologies,we are expecting innovations in memory architecture and system designs.
文摘High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,these HPC systems have become much larger and more complex,featuring multi-/many-core processors with thousands of cores that are often connected to form a large-scale computing system.
文摘Recent progress on large-scale machine learning(ML)and deep learning(DL)have demonstrated great potential in both traditional artificial intelligence(AI)applications in computer science(such as natural language processing,knowledge engineering and computer vision)and AI-enabled applications in scientific domains(such as AlphaFold2).
文摘With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-performance systems typically operate in a distributed environment.Today,high-performance distributed computing systems are the foundations of many important infrastructures.However,the diverse development of hardware platforms,the spurt of data growth,and the rapid changes in applications are increasingly challenging resource management,energy efficiency,performance tuning,scalability,and fault tolerance.
文摘In the current era of AI and Big Data,an increasing and significant amount of computing power is needed for many applications and algorithms such as AIGC models,face detection,autonomous driving and atmosphere simulation.Recently,there is a significant amount of interest among the community in improving AI and big data applications with heterogenous computing,which refers to a computing system using different types of computing cores such as GPU,NPU,ASIC,DSP and FPGA.It can improve the performane and enery efficiency by dispatching different workloads to processors that are designed for specialized processing and specific purposes.This issue aims to cover challenges that can hamper efficiency and utilization for AI and big data applications on heterogenous computing systems,such as efficient utilization of the raw hardware,I/O management,task scheduling,etc.
文摘High Performance Computing(HPC)performs more complex tasks with the application of parallel and distributed algorithms than computing on a single node.And HPC continuously advances in traditional domains of science and engineering.However,the emergence of novel applications calls for the lower latency of the network,which pushed the horizon of edge computing.Today,the diversity of HPC systems is more extensive,and rapid changes in hardware platforms and program environments increasingly challenge the high concurrency exploitation,hybrid resource management,energy efficiency,performance tuning,scalability and fault-tolerance.
文摘In this article the author De Dong was incorrectly flagged as a corresponding author.The correct corresponding author of this article is Nurbol Luktarhan.The Original Article has been corrected.Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
文摘With the coming of exascale computing era,programming systems and operating systems(including runtime systems)are facing several challenges.In aspect of architecture,increasing deeper level of parallelism,heterogeneity,and the adoption of diverse domain specific accelerators raise the urgent need for programmability,performance optimization and portability.On the other side,big data analytics and machine learning applications demand to be ported and optimized on modern HPC systems.This issue focuses on the novel ideas,methods,as well as efforts of system software development for resolving the above challenges,and to fill the gap between applications and the underlying hardware systems.
文摘This issue focuses on the topic of innovations in supercomputing techniques.Six invited papers are finally selected based on a peer review procedure,which cover research progress of China’s supercomputing,interconnection network,performance evaluation and parallel algorithm.Prof.Yutong Lu summarizes the recent progress of supercomputing system in China by introducing the three pre-Exascale supercomputers.
文摘This issue focuses on the topic“Storage System and Technology”.Data storage systems are an important part of highperformance computing(HPC).HPC cannot be separated from the support of high-performance storage systems and technologies.New storage technologies and techniques continue to be applied to HPC,such as non-volatile memory technologies,solid state storage,parallel I/O,storage performance and scalability,storage virtualization,and deduplication.
文摘HPCChina is the annual conference established in 2005 by the Technical Committee on High Performance Computing(TCHPC)of China Computer Federation(CCF).The HPCChina is the leading venue in China for presenting high-quality original research in all fields related to high performance computing.In 2023,the conference received a total of 123 submissions and accepted 70 papers based on a strict peer-review procedure.The special issue invited eight papers of high quality.They can be divided into two kinds,algorithm research or system study.The first four papers focus on parallel algorithms in numerical methods.The second four papers study scheduling strategy,algorithm selector,communication protocol and performance evaluation,all from the system perspective.We provide a short summary of each paper as follows.
文摘In this article,we describe the context in which an international race towards Exascale computing has started.We cover the political and economic context and make a review of the recent history in high performance computing(HPC)architectures,with special emphasis on the recently announced European initiatives to reach Exascale computing in Europe.We conclude by describing current challenges and trends.
文摘As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries around the world,and have been used in a wide range of areas.Technology of designing,building,managing,and utilizing high performance computers has been developing dramatically.With the debut of Frontier in ORNL,the world has stepped into the Exascale era.
文摘The modern computing paradigm is experiencing many revolutions from almost every aspects:the introduction of emerging memories like spintronic memory and resistive memory dramatically as well as quantum computing change the design of memory hierarchy and interface;new applications such as artificial intelligence and deep learning trigger wide adaptation of deep neural network accelerators and neuromorphic computing circuits;new computing models induced by these applications such as in-memory computing also inspires the corresponding circuit-and architecturelevel practices.
基金National Key Research and Development Program of China(No.2023YFB3001605).
文摘In this paper,we propose two mixed precision algorithms for block-Jacobi preconditioner(BJAC):a fixed low precision strategy and an adaptive precision strategy.We evaluate the performance improvement of the proposed mixed precision BJAC preconditioners combined with the preconditioned conjugate gradient(PCG)method using problems including diffusion equations and radiation hydrodynamics equations.Numerical results show that,compared with the uniform high precision PCG,the mixed precision preconditioners can achieve speedups from 1.3×to 1.8×without losing accuracy.Furthermore,we observe the phenomenon of convergence delay in some test cases for the mixed precision preconditioners,and analyse the correlation between matrix features and convergence delay behaviors.Some interesting conclusions are obtained which are significant and valuable for the design of more efficient mixed precision preconditioners.