Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more ...Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.展开更多
Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineerin...Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.展开更多
High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-G...Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-GPU heterogeneous computing framework designed to significantly enhance the efficiency of computing 9-component NCFs from seismic ambient noise data.This framework not only accelerated the computational process by leveraging the Compute Unified Device Architecture(CUDA)but also improved the signal-to-noise ratio(SNR)through innovative stacking techniques,such as time-frequency domain phaseweighted stacking(tf-PWS).We validated the program using multiple datasets,confirming its superior computation speed,improved reliability,and higher signal-to-noise ratios for NCFs.Our comprehensive study provides detailed insights into optimizing the computational processes for noise cross-correlation functions,thereby enhancing the precision and efficiency of ambient noise imaging.展开更多
Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage reso...Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.展开更多
Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging tec...Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.展开更多
The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting ...The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.展开更多
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and el...As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.展开更多
The advancement of flexible memristors has significantly promoted the development of wearable electronic for emerging neuromorphic computing applications.Inspired by in-memory computing architecture of human brain,fle...The advancement of flexible memristors has significantly promoted the development of wearable electronic for emerging neuromorphic computing applications.Inspired by in-memory computing architecture of human brain,flexible memristors exhibit great application potential in emulating artificial synapses for highefficiency and low power consumption neuromorphic computing.This paper provides comprehensive overview of flexible memristors from perspectives of development history,material system,device structure,mechanical deformation method,device performance analysis,stress simulation during deformation,and neuromorphic computing applications.The recent advances in flexible electronics are summarized,including single device,device array and integration.The challenges and future perspectives of flexible memristor for neuromorphic computing are discussed deeply,paving the way for constructing wearable smart electronics and applications in large-scale neuromorphic computing and high-order intelligent robotics.展开更多
High-entropy oxides(HEOs)have emerged as a promising class of memristive materials,characterized by entropy-stabilized crystal structures,multivalent cation coordination,and tunable defect landscapes.These intrinsic f...High-entropy oxides(HEOs)have emerged as a promising class of memristive materials,characterized by entropy-stabilized crystal structures,multivalent cation coordination,and tunable defect landscapes.These intrinsic features enable forming-free resistive switching,multilevel conductance modulation,and synaptic plasticity,making HEOs attractive for neuromorphic computing.This review outlines recent progress in HEO-based memristors across materials engineering,switching mechanisms,and synaptic emulation.Particular attention is given to vacancy migration,phase transitions,and valence-state dynamics—mechanisms that underlie the switching behaviors observed in both amorphous and crystalline systems.Their relevance to neuromorphic functions such as short-term plasticity and spike-timing-dependent learning is also examined.While encouraging results have been achieved at the device level,challenges remain in conductance precision,variability control,and scalable integration.Addressing these demands a concerted effort across materials design,interface optimization,and task-aware modeling.With such integration,HEO memristors offer a compelling pathway toward energy-efficient and adaptable brain-inspired electronics.展开更多
Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed withi...Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed within compute nodes.Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task,and most scientists therefore do not take advantage of the faster storage media.One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes,which serve as temporary storage systems for single applications or longer-running campaigns.This paper presents results from the Dagstuhl Seminar 17202"Challenges and Opportunities of User-Level File Systems for HPC"and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media.The discussion includes open research questions,such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems.Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility.Various interfaces and semantics are presented,for example those used by the three ad hoc file systems BeeOND,GekkoFS,and BurstFS.Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.展开更多
Low-power and low-variability artificial neuronal devices are highly desired for high-performance neuromorphic computing.In this paper,an oscillation neuron based on a low-variability Ag nanodots(NDs)threshold switchi...Low-power and low-variability artificial neuronal devices are highly desired for high-performance neuromorphic computing.In this paper,an oscillation neuron based on a low-variability Ag nanodots(NDs)threshold switching(TS)device with low operation voltage,large on/off ratio and high uniformity is presented.Measurement results indicate that this neuron demonstrates self-oscillation behavior under applied voltages as low as 1 V.The oscillation frequency increases with the applied voltage pulse amplitude and decreases with the load resistance.It can then be used to evaluate the resistive random-access memory(RRAM)synaptic weights accurately when the oscillation neuron is connected to the output of the RRAM crossbar array for neuromorphic computing.Meanwhile,simulation results show that a large RRAM crossbar array(>128×128)can be supported by our oscillation neuron owing to the high on/off ratio(>10^(8))of Ag NDs TS device.Moreover,the high uniformity of the Ag NDs TS device helps improve the distribution of the output frequency and suppress the degradation of neural network recognition accuracy(<1%).Therefore,the developed oscillation neuron based on the Ag NDs TS device shows great potential for future neuromorphic computing applications.展开更多
Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platfo...Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.展开更多
As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively f...As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.展开更多
Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bo...Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bounds im-pact performance,especially for systems built from high-end nodes,each consisting of multiple power hungry components.Because placing an inappropriate power bound on a node or a component can lead to severe performance loss,coordinat-ing power allocation among nodes and components is mandatory to achieve desired performance given a total power bud-get.In this article,we describe the paradigm of power bounded high-performance computing,which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization.We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing.Using several case studies,we demonstrate how the principles of balanced power coordination can be applied and adapted to the inter-play of workloads,hardware technology,and the available total power for performance improvement.展开更多
Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a con...Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.展开更多
In order to investigate the effects of two mineral admixtures (i. e., fly ash and ground slag)on initial defects existing in concrete microstructures, a high-resolution X-ray micro-CT( micro-focus computer tomogra...In order to investigate the effects of two mineral admixtures (i. e., fly ash and ground slag)on initial defects existing in concrete microstructures, a high-resolution X-ray micro-CT( micro-focus computer tomography)is employed to quantitatively analyze the initial defects in four series of highperformance concrete (HPC)specimens with additions of different mineral admixtures. The nigh-resolution 3D images of microstructures and filtered defects are reconstructed by micro- CT software. The size distribution and volume fractions of initial defects are analyzed based on 3D and 2D micro-CT images. The analysis results are verified by experimental results of watersuction tests. The results show that the additions of mineral admixtures in concrete as cementitious materials greatly change the geometrical properties of the microstructures and the spatial features of defects by physical-chemistry actions of these mineral admixtures. This is the major cause of the differences between the mechanical behaviors of HPC with and without mineral admixtures when the water-to-binder ratio and the size distribution of aggregates are constant.展开更多
Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in us...Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in user demand for latency-sensitive tasks has inevitably led to offloading bottlenecks and insufficient computational capacity on individual satellite edge servers,making it necessary to implement effective task offloading scheduling to enhance user experience.In this paper,we propose a priority-based task scheduling strategy based on a Software-Defined Network(SDN)framework for satellite-terrestrial integrated networks,which clarifies the execution order of tasks based on their priority.Subsequently,we apply a Dueling-Double Deep Q-Network(DDQN)algorithm enhanced with prioritized experience replay to derive a computation offloading strategy,improving the experience replay mechanism within the Dueling-DDQN framework.Next,we utilize the Deep Deterministic Policy Gradient(DDPG)algorithm to determine the optimal resource allocation strategy to reduce the processing latency of sub-tasks.Simulation results demonstrate that the proposed d3-DDPG algorithm outperforms other approaches,effectively reducing task processing latency and thus improving user experience and system efficiency.展开更多
文摘Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.
基金National Natural Science Foundation of China,Grant/Award Number:51979187。
文摘Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.
文摘High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
基金supported by the Key Research and Development Program of China(2021YFC3000704)Institute of Geophysics,China Earthquake Administration Grant DQJB23R18+1 种基金the USTC Research Funds of the Double First-Class Initiative(YD2080002012)NSFC Grant(U2239206)。
文摘Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-GPU heterogeneous computing framework designed to significantly enhance the efficiency of computing 9-component NCFs from seismic ambient noise data.This framework not only accelerated the computational process by leveraging the Compute Unified Device Architecture(CUDA)but also improved the signal-to-noise ratio(SNR)through innovative stacking techniques,such as time-frequency domain phaseweighted stacking(tf-PWS).We validated the program using multiple datasets,confirming its superior computation speed,improved reliability,and higher signal-to-noise ratios for NCFs.Our comprehensive study provides detailed insights into optimizing the computational processes for noise cross-correlation functions,thereby enhancing the precision and efficiency of ambient noise imaging.
基金This work was supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China under(Grant No.61772053)the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2020ZX15).
文摘Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.
基金supported by the National Natural Science Foundation of China(Grant No.31771466)the National Key R&D Program of China(Grant Nos.2018YFB0203903,2016YFC0503607,and 2016YFB0200300)+3 种基金the Transformation Project in Scientific and Technological Achievements of Qinghai,China(Grant No.2016-SF-127)the Special Project of Informatization of Chinese Academy of Sciences,China(Grant No.XXH13504-08)the Strategic Pilot Science and Technology Project of Chinese Academy of Sciences,China(Grant No.XDA12010000)the 100-Talents Program of Chinese Academy of Sciences,China(awarded to BN)
文摘Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.
文摘The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金supported by the NSFC(12474071)Natural Science Foundation of Shandong Province(ZR2024YQ051,ZR2025QB50)+6 种基金Guangdong Basic and Applied Basic Research Foundation(2025A1515011191)the Shanghai Sailing Program(23YF1402200,23YF1402400)funded by Basic Research Program of Jiangsu(BK20240424)Open Research Fund of State Key Laboratory of Crystal Materials(KF2406)Taishan Scholar Foundation of Shandong Province(tsqn202408006,tsqn202507058)Young Talent of Lifting engineering for Science and Technology in Shandong,China(SDAST2024QTB002)the Qilu Young Scholar Program of Shandong University。
文摘As emerging two-dimensional(2D)materials,carbides and nitrides(MXenes)could be solid solutions or organized structures made up of multi-atomic layers.With remarkable and adjustable electrical,optical,mechanical,and electrochemical characteristics,MXenes have shown great potential in brain-inspired neuromorphic computing electronics,including neuromorphic gas sensors,pressure sensors and photodetectors.This paper provides a forward-looking review of the research progress regarding MXenes in the neuromorphic sensing domain and discussed the critical challenges that need to be resolved.Key bottlenecks such as insufficient long-term stability under environmental exposure,high costs,scalability limitations in large-scale production,and mechanical mismatch in wearable integration hinder their practical deployment.Furthermore,unresolved issues like interfacial compatibility in heterostructures and energy inefficiency in neu-romorphic signal conversion demand urgent attention.The review offers insights into future research directions enhance the fundamental understanding of MXene properties and promote further integration into neuromorphic computing applications through the convergence with various emerging technologies.
基金supported by the NSFC(12474071)Natural Science Foundation of Shandong Province(ZR2024YQ051)+5 种基金Open Research Fund of State Key Laboratory of Materials for Integrated Circuits(SKLJC-K2024-12)the Shanghai Sailing Program(23YF1402200,23YF1402400)Natural Science Foundation of Jiangsu Province(BK20240424)Taishan Scholar Foundation of Shandong Province(tsqn202408006)Young Talent of Lifting engineering for Science and Technology in Shandong,China(SDAST2024QTB002)the Qilu Young Scholar Program of Shandong University.
文摘The advancement of flexible memristors has significantly promoted the development of wearable electronic for emerging neuromorphic computing applications.Inspired by in-memory computing architecture of human brain,flexible memristors exhibit great application potential in emulating artificial synapses for highefficiency and low power consumption neuromorphic computing.This paper provides comprehensive overview of flexible memristors from perspectives of development history,material system,device structure,mechanical deformation method,device performance analysis,stress simulation during deformation,and neuromorphic computing applications.The recent advances in flexible electronics are summarized,including single device,device array and integration.The challenges and future perspectives of flexible memristor for neuromorphic computing are discussed deeply,paving the way for constructing wearable smart electronics and applications in large-scale neuromorphic computing and high-order intelligent robotics.
基金financially supported by the National Natural Science Foundation of China(Grant No.12172093)the Guangdong Basic and Applied Basic Research Foundation(Grant No.2021A1515012607)。
文摘High-entropy oxides(HEOs)have emerged as a promising class of memristive materials,characterized by entropy-stabilized crystal structures,multivalent cation coordination,and tunable defect landscapes.These intrinsic features enable forming-free resistive switching,multilevel conductance modulation,and synaptic plasticity,making HEOs attractive for neuromorphic computing.This review outlines recent progress in HEO-based memristors across materials engineering,switching mechanisms,and synaptic emulation.Particular attention is given to vacancy migration,phase transitions,and valence-state dynamics—mechanisms that underlie the switching behaviors observed in both amorphous and crystalline systems.Their relevance to neuromorphic functions such as short-term plasticity and spike-timing-dependent learning is also examined.While encouraging results have been achieved at the device level,challenges remain in conductance precision,variability control,and scalable integration.Addressing these demands a concerted effort across materials design,interface optimization,and task-aware modeling.With such integration,HEO memristors offer a compelling pathway toward energy-efficient and adaptable brain-inspired electronics.
基金This work has also been partially funded by the German Research Foundation(DFG)through the German Priority Programme 1648"Software for Exascale Computing"(SPPEXA)and the ADA-FS project,and by the European Union's Horizon 2020 Research and Innovation Program under the NEXTGenIO Project under Grant No.671591the Spanish Ministry of Science and Innovation under Contract No.TIN2015-65316+3 种基金the Generalitat de Catalunya under Contract No.2014-SGR-1051This work was performed under the auspices of the U.S.Department of Energy by Lawrence Livermore National Laboratory under Contract No.DE-AC52-07NA27344(LLNL-JRNL-779789)also supported by the U.S.Department of Energy,Office of Science,Advanced Scientific Computing Research,under Contract No.DE-AC02-06CH11357This work is also supported in part by the National Science Foundation of USA under Grant Nos.1561041,1564647,1744336,1763547,and 1822737.
文摘Storage backends of parallel compute clusters are still based mostly on magnetic disks,while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory(NVRAM)are deployed within compute nodes.Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task,and most scientists therefore do not take advantage of the faster storage media.One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes,which serve as temporary storage systems for single applications or longer-running campaigns.This paper presents results from the Dagstuhl Seminar 17202"Challenges and Opportunities of User-Level File Systems for HPC"and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media.The discussion includes open research questions,such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems.Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility.Various interfaces and semantics are presented,for example those used by the three ad hoc file systems BeeOND,GekkoFS,and BurstFS.Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.
基金supported in part by China Key Research and Development Program(2016YFA0201800)the National Natural Science Foundation of China(91964104,61974081)。
文摘Low-power and low-variability artificial neuronal devices are highly desired for high-performance neuromorphic computing.In this paper,an oscillation neuron based on a low-variability Ag nanodots(NDs)threshold switching(TS)device with low operation voltage,large on/off ratio and high uniformity is presented.Measurement results indicate that this neuron demonstrates self-oscillation behavior under applied voltages as low as 1 V.The oscillation frequency increases with the applied voltage pulse amplitude and decreases with the load resistance.It can then be used to evaluate the resistive random-access memory(RRAM)synaptic weights accurately when the oscillation neuron is connected to the output of the RRAM crossbar array for neuromorphic computing.Meanwhile,simulation results show that a large RRAM crossbar array(>128×128)can be supported by our oscillation neuron owing to the high on/off ratio(>10^(8))of Ag NDs TS device.Moreover,the high uniformity of the Ag NDs TS device helps improve the distribution of the output frequency and suppress the degradation of neural network recognition accuracy(<1%).Therefore,the developed oscillation neuron based on the Ag NDs TS device shows great potential for future neuromorphic computing applications.
基金This work is in part supported by the Director,Office of Advanced Scientific Computing Research,Office of Science,of the U.S.Department of Energy under Contract No.DE-AC02-06CH11357in part supported by the Exascale Computing Project under Grant No.17-SC-20-SC+1 种基金a joint project of the U.S.Department of Energy's Office of Science and National Nuclear Security Administration,responsible for delivering a capable exascale ecosystem,including software,applications,and hardware technology,to support the nation's exascale computing imperativein part supported by the U.S.Department of Energy,Office of Science,Office of Advanced Scientific Computing Research,Scientific Discovery through Advanced Computing(SciDAC)program.
文摘Technology enhancements and the growing breadth of application workflows running on high-performance computing(HPC)platforms drive the development of new data services that provide high performance on these new platforms,provide capable and productive interfaces and abstractions for a variety of applications,and are readily adapted when new technologies are deployed.The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices.Rather than forcing all applications to use a one-size-fits-all data staging and I/O software configuration,Mochi allows each application to use a data service specialized to its needs and access patterns.This paper introduces the Mochi framework and methodology.The Mochi core components and microservices are described.Examples of the application of the Mochi methodology to the development of four specialized services are detailed.Finally,a performance evaluation of a Mochi core component,a Mochi microservice,and a composed service providing an object model is performed.The paper concludes by positioning Mochi relative to related work in the HPC space and indicating directions for future work.
基金Project supported by the National Natural Science Foundation of China(Nos.61272141,61120106005,and 61303068)the National High-Tech R&D Program of China(No.2012AA01A301)
文摘As the scale of supercomputers rapidly grows, the reliability problem dominates the system availability. Existing fault tolerance mechanisms, such as periodic checkpointing and process redundancy, cannot effectively fix this problem. To address this issue, we present a new fault tolerance framework using process replication and prefetching (FTRP), combining the benefits of proactive and reactive mechanisms. FTRP incorporates a novel cost model and a new proactive fault tolerance mechanism to improve the application execution efficiency. The novel cost model, called the 'work-most' (WM) model, makes runtime decisions to adaptively choose an action from a set of fault tolerance mechanisms based on failure prediction results and application status. Similar to program locality, we observe the failure locality phenomenon in supercomputers for the first time. In the new proactive fault tolerance mechanism, process replication with process prefetching is proposed based on the failure locality, significantly avoiding losses caused by the failures regardless of whether they have been predicted. Simulations with real failure traces demonstrate that the FTRP framework outperforms existing fault tolerance mechanisms with up to 10% improvement in application efficiency for common failure prediction accuracy, and is effective for petascale systems and beyond.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-1551511 and CNS-1551262.
文摘Modern computer systems are increasingly bounded by the available or permissible power at multiple layers from individual components to data centers.To cope with this reality,it is necessary to understand how power bounds im-pact performance,especially for systems built from high-end nodes,each consisting of multiple power hungry components.Because placing an inappropriate power bound on a node or a component can lead to severe performance loss,coordinat-ing power allocation among nodes and components is mandatory to achieve desired performance given a total power bud-get.In this article,we describe the paradigm of power bounded high-performance computing,which considers coordinated power bound assignment to be a key factor in computer system performance analysis and optimization.We apply this paradigm to the problem of power coordination across multiple layers for both CPU and GPU computing.Using several case studies,we demonstrate how the principles of balanced power coordination can be applied and adapted to the inter-play of workloads,hardware technology,and the available total power for performance improvement.
基金funded by the Defense Advanced Research Project Agency(DARPA)Low Temperature Logic Technology(LTLT)program.
文摘Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.
基金The Scholarship Supported by Ministry of Education of China for Research Abroad(No.3037[2006])the Excellent Doctoral Dissertation Foundation of Southeast University (No.YBTJ-0512)the National Basic Research Program of China(973Program)(No.2009CB623203)
文摘In order to investigate the effects of two mineral admixtures (i. e., fly ash and ground slag)on initial defects existing in concrete microstructures, a high-resolution X-ray micro-CT( micro-focus computer tomography)is employed to quantitatively analyze the initial defects in four series of highperformance concrete (HPC)specimens with additions of different mineral admixtures. The nigh-resolution 3D images of microstructures and filtered defects are reconstructed by micro- CT software. The size distribution and volume fractions of initial defects are analyzed based on 3D and 2D micro-CT images. The analysis results are verified by experimental results of watersuction tests. The results show that the additions of mineral admixtures in concrete as cementitious materials greatly change the geometrical properties of the microstructures and the spatial features of defects by physical-chemistry actions of these mineral admixtures. This is the major cause of the differences between the mechanical behaviors of HPC with and without mineral admixtures when the water-to-binder ratio and the size distribution of aggregates are constant.
文摘Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in user demand for latency-sensitive tasks has inevitably led to offloading bottlenecks and insufficient computational capacity on individual satellite edge servers,making it necessary to implement effective task offloading scheduling to enhance user experience.In this paper,we propose a priority-based task scheduling strategy based on a Software-Defined Network(SDN)framework for satellite-terrestrial integrated networks,which clarifies the execution order of tasks based on their priority.Subsequently,we apply a Dueling-Double Deep Q-Network(DDQN)algorithm enhanced with prioritized experience replay to derive a computation offloading strategy,improving the experience replay mechanism within the Dueling-DDQN framework.Next,we utilize the Deep Deterministic Policy Gradient(DDPG)algorithm to determine the optimal resource allocation strategy to reduce the processing latency of sub-tasks.Simulation results demonstrate that the proposed d3-DDPG algorithm outperforms other approaches,effectively reducing task processing latency and thus improving user experience and system efficiency.