The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive ...The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive growth of data.Optical computing provides a distinctive perspective to address this bottleneck by harnessing the unique properties of photons including broad bandwidth,low latency,and high energy efficiency.In this review,we introduce the latest developments of optical computing for different AI models,including feedforward neural networks,reservoir computing,and spiking neural networks(SNNs).Recent progress in integrated photonic devices,combined with the rise of AI,provides a great opportunity for the renaissance of optical computing in practical applications.This effort requires multidisciplinary efforts from a broad community.This review provides an overview of the state-of-the-art accomplishments in recent years,discusses the availability of current technologies,and points out various remaining challenges in different aspects to push the frontier.We anticipate that the era of large-scale integrated photonics processors will soon arrive for practical AI applications in the form of hybrid optoelectronic frameworks.展开更多
Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperfo...Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).展开更多
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ...Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.展开更多
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu...Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.展开更多
This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wa...This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.展开更多
Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very importa...Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.展开更多
The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application...The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application of meteorological high-performance computing resources can not only provide reference for the optimization of active resources, but also provide a quantitative basis for future resource construction and planning. In this paper, the concept of the utility value B and index compliance rate E of the meteorological high performance computing system are presented. The evaluation process, evaluation index and calculation method of the high performance computing resource application benefits are introduced.展开更多
Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must...Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University’s Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work.展开更多
A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuch...A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.展开更多
The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous flui...The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.展开更多
In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this...In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this work demonstrates the promise of modern high-performance computing technology to achieve real-time flood modeling at a regional scale. The software is implemented for high-performance heterogeneous computing using the OpenCL programming framework, and developed to support simulations across multiple GPUs using a domain decomposition technique and across multiple systems through an efficient implementation of the Message Passing Interface (MPI) standard. The software is applied for a convective storm induced flood event in Newcastle upon Tyne, demonstrating high computational performance across a GPU cluster, and good agreement against crowd- sourced observations. Issues relating to data availability, complex urban topography and differences in drainage capacity affect results for a small number of areas.展开更多
The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented...The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented. Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of 'hardware' function that can be called by the DSP in high-level algorithm. It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP.展开更多
The study of global climate change seeks to understand:(1)the components of the Earth’s varying environmental system,with a particular focus on climate;(2)how these components interact to determine present conditions...The study of global climate change seeks to understand:(1)the components of the Earth’s varying environmental system,with a particular focus on climate;(2)how these components interact to determine present conditions;(3)the factors driving these components;(4)the history of global change and the projection of future change;and(5)how knowledge about global environmental variability and change can be applied to present-day and future decision-making.This paper addresses the use of high-performance computing and high-throughput computing for a global change study on the Digital Earth(DE)platform.Two aspects of the use of high-performance computing(HPC)/high-throughput computing(HTC)on the DE platform are the processing of data from all sources,especially Earth observation data,and the simulation of global change models.The HPC/HTC is an essential and efficient tool for the processing of vast amounts of global data,especially Earth observation data.The current trend involves running complex global climate models using potentially millions of personal computers to achieve better climate change predictions than would ever be possible using the supercomputers currently available to scientists.展开更多
Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed withi...Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed within compute nodes.Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task,and most scientists therefore do not take advantage of the faster storage media.One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes,which serve as temporary storage systems for single applications or longer-running campaigns.This paper presents results from the Dagstuhl Seminar 17202"Challenges and Opportunities of User-Level File Systems for HPC"and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media.The discussion includes open research questions,such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems.Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility.Various interfaces and semantics are presented,for example those used by the three ad hoc file systems BeeOND,GekkoFS,and BurstFS.Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.展开更多
Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage reso...Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.展开更多
t In this paper an overall scheme of the task management system of ternary optical computer (TOC) is proposed, and the software architecture chart is given. The function and accomplishment of each module in the syst...t In this paper an overall scheme of the task management system of ternary optical computer (TOC) is proposed, and the software architecture chart is given. The function and accomplishment of each module in the system are described in general. In addition, according to the aforementioned scheme a prototype of TOC task management system is implemented, and the feasibility, rationality and completeness of the scheme are verified via running and testing the prototype.展开更多
Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platfo...Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.展开更多
As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively f...As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.展开更多
基金supported by the National Natural Science Foundation of China(61927802,61722209,and 61805145)the Beijing Municipal Science and Technology Commission(Z181100003118014)+3 种基金the National Key Research and Development Program of China(2020AAA0130000)the support from the National Postdoctoral Program for Innovative TalentShuimu Tsinghua Scholar Programthe support from the Hong Kong Research Grants Council(16306220)。
文摘The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive growth of data.Optical computing provides a distinctive perspective to address this bottleneck by harnessing the unique properties of photons including broad bandwidth,low latency,and high energy efficiency.In this review,we introduce the latest developments of optical computing for different AI models,including feedforward neural networks,reservoir computing,and spiking neural networks(SNNs).Recent progress in integrated photonic devices,combined with the rise of AI,provides a great opportunity for the renaissance of optical computing in practical applications.This effort requires multidisciplinary efforts from a broad community.This review provides an overview of the state-of-the-art accomplishments in recent years,discusses the availability of current technologies,and points out various remaining challenges in different aspects to push the frontier.We anticipate that the era of large-scale integrated photonics processors will soon arrive for practical AI applications in the form of hybrid optoelectronic frameworks.
基金Financial supports from the Natural Science Foundation of China(41761134089,41977218)Six Talent Peaks Project of Jiangsu Province(RJFW-003)the Fundamental Research Funds for the Central Universities(14380103)are gratefully acknowledged.
文摘Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).
文摘Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.
基金Supported by the National Natural Science Foundation of China(No.61802304,61834005,61772417,61602377)the Shaanxi Province KeyR&D Plan(No.2021GY-029)。
文摘Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.
文摘This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.
文摘Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.
文摘The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application of meteorological high-performance computing resources can not only provide reference for the optimization of active resources, but also provide a quantitative basis for future resource construction and planning. In this paper, the concept of the utility value B and index compliance rate E of the meteorological high performance computing system are presented. The evaluation process, evaluation index and calculation method of the high performance computing resource application benefits are introduced.
文摘Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University’s Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work.
文摘A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.
基金supported by National Key Research and Development Program of China under Grant 2024YFE0210800National Natural Science Foundation of China under Grant 62495062Beijing Natural Science Foundation under Grant L242017.
文摘The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.
基金Project supported by the UK NERC SINATRA Project(Grant No.NE/K008781/1)
文摘In the last two decades, computational hydraulics has undergone a rapid development following the advancement of data acquisition and computing technologies. Using a finite-volume Godunov-type hydrodynamic model, this work demonstrates the promise of modern high-performance computing technology to achieve real-time flood modeling at a regional scale. The software is implemented for high-performance heterogeneous computing using the OpenCL programming framework, and developed to support simulations across multiple GPUs using a domain decomposition technique and across multiple systems through an efficient implementation of the Message Passing Interface (MPI) standard. The software is applied for a convective storm induced flood event in Newcastle upon Tyne, demonstrating high computational performance across a GPU cluster, and good agreement against crowd- sourced observations. Issues relating to data availability, complex urban topography and differences in drainage capacity affect results for a small number of areas.
文摘The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented. Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of 'hardware' function that can be called by the DSP in high-level algorithm. It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP.
基金This work was supported in part by the MOST,China under Grant Nos.2009CB723906 and 2008AA12Z109by CAS under Grant No.KZCX2-YW-313.
文摘The study of global climate change seeks to understand:(1)the components of the Earth’s varying environmental system,with a particular focus on climate;(2)how these components interact to determine present conditions;(3)the factors driving these components;(4)the history of global change and the projection of future change;and(5)how knowledge about global environmental variability and change can be applied to present-day and future decision-making.This paper addresses the use of high-performance computing and high-throughput computing for a global change study on the Digital Earth(DE)platform.Two aspects of the use of high-performance computing(HPC)/high-throughput computing(HTC)on the DE platform are the processing of data from all sources,especially Earth observation data,and the simulation of global change models.The HPC/HTC is an essential and efficient tool for the processing of vast amounts of global data,especially Earth observation data.The current trend involves running complex global climate models using potentially millions of personal computers to achieve better climate change predictions than would ever be possible using the supercomputers currently available to scientists.
基金This work has also been partially funded by the German Research Foundation(DFG)through the German Priority Programme 1648"Software for Exascale Computing"(SPPEXA)and the ADA-FS project,and by the European Union's Horizon 2020 Research and Innovation Program under the NEXTGenIO Project under Grant No.671591the Spanish Ministry of Science and Innovation under Contract No.TIN2015-65316+3 种基金the Generalitat de Catalunya under Contract No.2014-SGR-1051This work was performed under the auspices of the U.S.Department of Energy by Lawrence Livermore National Laboratory under Contract No.DE-AC52-07NA27344(LLNL-JRNL-779789)also supported by the U.S.Department of Energy,Office of Science,Advanced Scientific Computing Research,under Contract No.DE-AC02-06CH11357This work is also supported in part by the National Science Foundation of USA under Grant Nos.1561041,1564647,1744336,1763547,and 1822737.
文摘Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed within compute nodes.Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task,and most scientists therefore do not take advantage of the faster storage media.One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes,which serve as temporary storage systems for single applications or longer-running campaigns.This paper presents results from the Dagstuhl Seminar 17202"Challenges and Opportunities of User-Level File Systems for HPC"and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media.The discussion includes open research questions,such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems.Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility.Various interfaces and semantics are presented,for example those used by the three ad hoc file systems BeeOND,GekkoFS,and BurstFS.Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.
基金This work was supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China under(Grant No.61772053)the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2020ZX15).
文摘Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.
基金Project supported by the National Natural Science Foundation of China(Grant No.61073049)the Ph D Programs Foundation of the Ministry of Education of China(Grant No.20093108110016)the Shanghai Leading Academic Discipline Project(Grant No.J50103)
文摘t In this paper an overall scheme of the task management system of ternary optical computer (TOC) is proposed, and the software architecture chart is given. The function and accomplishment of each module in the system are described in general. In addition, according to the aforementioned scheme a prototype of TOC task management system is implemented, and the feasibility, rationality and completeness of the scheme are verified via running and testing the prototype.
基金This work is in part supported by the Director,Office of Advanced Scientific Computing Research,Office of Science,of the U.S.Department of Energy under Contract No.DE-AC02-06CH11357in part supported by the Exascale Computing Project under Grant No.17-SC-20-SC+1 种基金a joint project of the U.S.Department of Energy's Office of Science and National Nuclear Security Administration,responsible for delivering a capable exascale ecosystem,including software,applications,and hardware technology,to support the nation's exascale computing imperativein part supported by the U.S.Department of Energy,Office of Science,Office of Advanced Scientific Computing Research,Scientific Discovery through Advanced Computing(SciDAC)program.
文摘Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.
基金Project supported by the National Natural Science Foundation of China(Nos.61272141,61120106005,and 61303068)the National High-Tech R&D Program of China(No.2012AA01A301)
文摘As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.