A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuch...A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.展开更多
The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous flui...The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.展开更多
The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive ...The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive growth of data.Optical computing provides a distinctive perspective to address this bottleneck by harnessing the unique properties of photons including broad bandwidth,low latency,and high energy efficiency.In this review,we introduce the latest developments of optical computing for different AI models,including feedforward neural networks,reservoir computing,and spiking neural networks(SNNs).Recent progress in integrated photonic devices,combined with the rise of AI,provides a great opportunity for the renaissance of optical computing in practical applications.This effort requires multidisciplinary efforts from a broad community.This review provides an overview of the state-of-the-art accomplishments in recent years,discusses the availability of current technologies,and points out various remaining challenges in different aspects to push the frontier.We anticipate that the era of large-scale integrated photonics processors will soon arrive for practical AI applications in the form of hybrid optoelectronic frameworks.展开更多
Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperfo...Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).展开更多
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ...Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.展开更多
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu...Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.展开更多
This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wa...This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.展开更多
Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very importa...Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.展开更多
The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application...The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application of meteorological high-performance computing resources can not only provide reference for the optimization of active resources, but also provide a quantitative basis for future resource construction and planning. In this paper, the concept of the utility value B and index compliance rate E of the meteorological high performance computing system are presented. The evaluation process, evaluation index and calculation method of the high performance computing resource application benefits are introduced.展开更多
Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must...Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University’s Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work.展开更多
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
Underpinned by the ultrahigh-core rockfill dam at the Nuozhadu Hydropower Station,comprehensive studies and engineering practices have been conducted to address several critical challenges:coordination of seepage defo...Underpinned by the ultrahigh-core rockfill dam at the Nuozhadu Hydropower Station,comprehensive studies and engineering practices have been conducted to address several critical challenges:coordination of seepage deformation in dam materials,prevention and control of high-water-pressure seepage failure,static and dynamic deformation control,and construction quality monitoring.Advanced technologies have been developed for modifying impermeable soil materials and utilizing soft rocks.Constitutive models and high-performance fine computational methods for dam materials have been improved,along with innovative seismic safety measures.Additionally,a“Digital Dam”and an information system for monitoring the construction quality were implemented.These efforts ensured the successful construction of the Nuozhadu Dam,making it the tallest dam in China and the third tallest dam in the world upon completion.This achievement increased the height of core dams in China by 100 m and established a design and safety evaluation framework for ultrahigh-core rockfill dams exceeding 300 m in height.Furthermore,the current safety monitoring results indicate that the Nuozhadu Dam is safe and controllable.展开更多
文摘A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.
基金supported by National Key Research and Development Program of China under Grant 2024YFE0210800National Natural Science Foundation of China under Grant 62495062Beijing Natural Science Foundation under Grant L242017.
文摘The Dynamical Density Functional Theory(DDFT)algorithm,derived by associating classical Density Functional Theory(DFT)with the fundamental Smoluchowski dynamical equation,describes the evolution of inhomo-geneous fluid density distributions over time.It plays a significant role in studying the evolution of density distributions over time in inhomogeneous systems.The Sunway Bluelight II supercomputer,as a new generation of China’s developed supercomputer,possesses powerful computational capabilities.Porting and optimizing industrial software on this platform holds significant importance.For the optimization of the DDFT algorithm,based on the Sunway Bluelight II supercomputer and the unique hardware architecture of the SW39000 processor,this work proposes three acceleration strategies to enhance computational efficiency and performance,including direct parallel optimization,local-memory constrained optimization for CPEs,and multi-core groups collaboration and communication optimization.This method combines the characteristics of the program’s algorithm with the unique hardware architecture of the Sunway Bluelight II supercomputer,optimizing the storage and transmission structures to achieve a closer integration of software and hardware.For the first time,this paper presents Sunway-Dynamical Density Functional Theory(SW-DDFT).Experimental results show that SW-DDFT achieves a speedup of 6.67 times within a single-core group compared to the original DDFT implementation,with six core groups(a total of 384 CPEs),the maximum speedup can reach 28.64 times,and parallel efficiency can reach 71%,demonstrating excellent acceleration performance.
基金supported by the National Natural Science Foundation of China(61927802,61722209,and 61805145)the Beijing Municipal Science and Technology Commission(Z181100003118014)+3 种基金the National Key Research and Development Program of China(2020AAA0130000)the support from the National Postdoctoral Program for Innovative TalentShuimu Tsinghua Scholar Programthe support from the Hong Kong Research Grants Council(16306220)。
文摘The rapid development of artificial intelligence(AI)facilitates various applications from all areas but also poses great challenges in its hardware implementation in terms of speed and energy because of the explosive growth of data.Optical computing provides a distinctive perspective to address this bottleneck by harnessing the unique properties of photons including broad bandwidth,low latency,and high energy efficiency.In this review,we introduce the latest developments of optical computing for different AI models,including feedforward neural networks,reservoir computing,and spiking neural networks(SNNs).Recent progress in integrated photonic devices,combined with the rise of AI,provides a great opportunity for the renaissance of optical computing in practical applications.This effort requires multidisciplinary efforts from a broad community.This review provides an overview of the state-of-the-art accomplishments in recent years,discusses the availability of current technologies,and points out various remaining challenges in different aspects to push the frontier.We anticipate that the era of large-scale integrated photonics processors will soon arrive for practical AI applications in the form of hybrid optoelectronic frameworks.
基金Financial supports from the Natural Science Foundation of China(41761134089,41977218)Six Talent Peaks Project of Jiangsu Province(RJFW-003)the Fundamental Research Funds for the Central Universities(14380103)are gratefully acknowledged.
文摘Discrete element method can effectively simulate the discontinuity,inhomogeneity and large deformation and failure of rock and soil.Based on the innovative matrix computing of the discrete element method,the highperformance discrete element software MatDEM may handle millions of elements in one computer,and enables the discrete element simulation at the engineering scale.It supports heat calculation,multi-field and fluidsolid coupling numerical simulations.Furthermore,the software integrates pre-processing,solver,postprocessing,and powerful secondary development,allowing recompiling new discrete element software.The basic principles of the DEM,the implement and development of the MatDEM software,and its applications are introduced in this paper.The software and sample source code are available online(http://matdem.com).
文摘Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan.
基金Supported by the National Natural Science Foundation of China(No.61802304,61834005,61772417,61602377)the Shaanxi Province KeyR&D Plan(No.2021GY-029)。
文摘Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.
文摘This paper introduces the development of the exascale (10^18) computing. Though exascalc computing is a hot research direction worldwide, we are facing many challenges in the areas of memory wall, communica- tion wall, reliability wall, power wall and scalability of parallel computing. According to these challenges, some thoughts and strategies are proposed.
文摘Cloud computing is currently dominated within the space of highperformance distributed computing and it provides resource polling and ondemand services through the web.So,task scheduling problem becomes a very important analysis space within the field of a cloud computing environment as a result of user’s services demand modification dynamically.The main purpose of task scheduling is to assign tasks to available processors to produce minimum schedule length without violating precedence restrictions.In heterogeneous multiprocessor systems,task assignments and schedules have a significant impact on system operation.Within the heuristic-based task scheduling algorithm,the different processes will lead to a different task execution time(makespan)on a heterogeneous computing system.Thus,a good scheduling algorithm should be able to set precedence efficiently for every subtask depending on the resources required to reduce(makespan).In this paper,we propose a new efficient task scheduling algorithm in cloud computing systems based on RAO algorithm to solve an important task and schedule a heterogeneous multiple processing problem.The basic idea of this process is to exploit the advantages of heuristic-based algorithms to reduce space search and time to get the best solution.We evaluate our algorithm’s performance by applying it to three examples with a different number of tasks and processors.The experimental results show that the proposed approach significantly succeeded in finding the optimal solutions than others in terms of the time of task implementation.
文摘The meteorological high-performance computing resource is the support platform for the weather forecast and climate prediction numerical model operation. The scientific and objective method to evaluate the application of meteorological high-performance computing resources can not only provide reference for the optimization of active resources, but also provide a quantitative basis for future resource construction and planning. In this paper, the concept of the utility value B and index compliance rate E of the meteorological high performance computing system are presented. The evaluation process, evaluation index and calculation method of the high performance computing resource application benefits are introduced.
文摘Within the last few decades, increases in computational resources have contributed enormously to the progress of science and engineering (S & E). To continue making rapid advancements, the S & E community must be able to access computing resources. One way to provide such resources is through High-Performance Computing (HPC) centers. Many academic research institutions offer their own HPC Centers but struggle to make the computing resources easily accessible and user-friendly. Here we present SHABU, a RESTful Web API framework that enables S & E communities to access resources from Boston University’s Shared Computing Center (SCC). The SHABU requirements are derived from the use cases described in this work.
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金National Natural Science Foundation of China,Grant/Award Number:52309153Key Science and Technology Projects of China Power Construction Group,Grant/Award Numbers:DJ-ZDXM-2022-01,DJ-ZDXM-12025-45。
文摘Underpinned by the ultrahigh-core rockfill dam at the Nuozhadu Hydropower Station,comprehensive studies and engineering practices have been conducted to address several critical challenges:coordination of seepage deformation in dam materials,prevention and control of high-water-pressure seepage failure,static and dynamic deformation control,and construction quality monitoring.Advanced technologies have been developed for modifying impermeable soil materials and utilizing soft rocks.Constitutive models and high-performance fine computational methods for dam materials have been improved,along with innovative seismic safety measures.Additionally,a“Digital Dam”and an information system for monitoring the construction quality were implemented.These efforts ensured the successful construction of the Nuozhadu Dam,making it the tallest dam in China and the third tallest dam in the world upon completion.This achievement increased the height of core dams in China by 100 m and established a design and safety evaluation framework for ultrahigh-core rockfill dams exceeding 300 m in height.Furthermore,the current safety monitoring results indicate that the Nuozhadu Dam is safe and controllable.