Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more ...Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.展开更多
Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineerin...Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.展开更多
High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-G...Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-GPU heterogeneous computing framework designed to significantly enhance the efficiency of computing 9-component NCFs from seismic ambient noise data.This framework not only accelerated the computational process by leveraging the Compute Unified Device Architecture(CUDA)but also improved the signal-to-noise ratio(SNR)through innovative stacking techniques,such as time-frequency domain phaseweighted stacking(tf-PWS).We validated the program using multiple datasets,confirming its superior computation speed,improved reliability,and higher signal-to-noise ratios for NCFs.Our comprehensive study provides detailed insights into optimizing the computational processes for noise cross-correlation functions,thereby enhancing the precision and efficiency of ambient noise imaging.展开更多
This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from g...This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.展开更多
Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a con...Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.展开更多
Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage reso...Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.展开更多
Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in us...Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in user demand for latency-sensitive tasks has inevitably led to offloading bottlenecks and insufficient computational capacity on individual satellite edge servers,making it necessary to implement effective task offloading scheduling to enhance user experience.In this paper,we propose a priority-based task scheduling strategy based on a Software-Defined Network(SDN)framework for satellite-terrestrial integrated networks,which clarifies the execution order of tasks based on their priority.Subsequently,we apply a Dueling-Double Deep Q-Network(DDQN)algorithm enhanced with prioritized experience replay to derive a computation offloading strategy,improving the experience replay mechanism within the Dueling-DDQN framework.Next,we utilize the Deep Deterministic Policy Gradient(DDPG)algorithm to determine the optimal resource allocation strategy to reduce the processing latency of sub-tasks.Simulation results demonstrate that the proposed d3-DDPG algorithm outperforms other approaches,effectively reducing task processing latency and thus improving user experience and system efficiency.展开更多
Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging tec...Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.展开更多
The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting ...The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.展开更多
Optoelectronic memristor is generating growing research interest for high efficient computing and sensing-memory applications.In this work,an optoelectronic memristor with Au/a-C:Te/Pt structure is developed.Synaptic ...Optoelectronic memristor is generating growing research interest for high efficient computing and sensing-memory applications.In this work,an optoelectronic memristor with Au/a-C:Te/Pt structure is developed.Synaptic functions,i.e.,excita-tory post-synaptic current and pair-pulse facilitation are successfully mimicked with the memristor under electrical and optical stimulations.More importantly,the device exhibited distinguishable response currents by adjusting 4-bit input electrical/opti-cal signals.A multi-mode reservoir computing(RC)system is constructed with the optoelectronic memristors to emulate human tactile-visual fusion recognition and an accuracy of 98.7%is achieved.The optoelectronic memristor provides potential for developing multi-mode RC system.展开更多
As an important complement to cloud computing, edge computing can effectively reduce the workload of the backbone network. To reduce latency and energy consumption of edge computing, deep learning is used to learn the...As an important complement to cloud computing, edge computing can effectively reduce the workload of the backbone network. To reduce latency and energy consumption of edge computing, deep learning is used to learn the task offloading strategies by interacting with the entities. In actual application scenarios, users of edge computing are always changing dynamically. However, the existing task offloading strategies cannot be applied to such dynamic scenarios. To solve this problem, we propose a novel dynamic task offloading framework for distributed edge computing, leveraging the potential of meta-reinforcement learning (MRL). Our approach formulates a multi-objective optimization problem aimed at minimizing both delay and energy consumption. We model the task offloading strategy using a directed acyclic graph (DAG). Furthermore, we propose a distributed edge computing adaptive task offloading algorithm rooted in MRL. This algorithm integrates multiple Markov decision processes (MDP) with a sequence-to-sequence (seq2seq) network, enabling it to learn and adapt task offloading strategies responsively across diverse network environments. To achieve joint optimization of delay and energy consumption, we incorporate the non-dominated sorting genetic algorithm II (NSGA-II) into our framework. Simulation results demonstrate the superiority of our proposed solution, achieving a 21% reduction in time delay and a 19% decrease in energy consumption compared to alternative task offloading schemes. Moreover, our scheme exhibits remarkable adaptability, responding swiftly to changes in various network environments.展开更多
The rise of large-scale artificial intelligence(AI)models,such as ChatGPT,Deep-Seek,and autonomous vehicle systems,has significantly advanced the boundaries of AI,enabling highly complex tasks in natural language proc...The rise of large-scale artificial intelligence(AI)models,such as ChatGPT,Deep-Seek,and autonomous vehicle systems,has significantly advanced the boundaries of AI,enabling highly complex tasks in natural language processing,image recognition,and real-time decisionmaking.However,these models demand immense computational power and are often centralized,relying on cloud-based architectures with inherent limitations in latency,privacy,and energy efficiency.To address these challenges and bring AI closer to real-world applications,such as wearable health monitoring,robotics,and immersive virtual environments,innovative hardware solutions are urgently needed.This work introduces a near-sensor edge computing(NSEC)system,built on a bilayer AlN/Si waveguide platform,to provide real-time,energy-efficient AI capabilities at the edge.Leveraging the electro-optic properties of AlN microring resonators for photonic feature extraction,coupled with Si-based thermo-optic Mach-Zehnder interferometers for neural network computations,the system represents a transformative approach to AI hardware design.Demonstrated through multimodal gesture and gait analysis,the NSEC system achieves high classification accuracies of 96.77%for gestures and 98.31%for gaits,ultra-low latency(<10 ns),and minimal energy consumption(<0.34 pJ).This groundbreaking system bridges the gap between AI models and real-world applications,enabling efficient,privacy-preserving AI solutions for healthcare,robotics,and next-generation human-machine interfaces,marking a pivotal advancement in edge computing and AI deployment.展开更多
To address the increasing demand for massive data storage and processing,brain-inspired neuromorphic comput-ing systems based on artificial synaptic devices have been actively developed in recent years.Among the vario...To address the increasing demand for massive data storage and processing,brain-inspired neuromorphic comput-ing systems based on artificial synaptic devices have been actively developed in recent years.Among the various materials inves-tigated for the fabrication of synaptic devices,silicon carbide(SiC)has emerged as a preferred choices due to its high electron mobility,superior thermal conductivity,and excellent thermal stability,which exhibits promising potential for neuromorphic applications in harsh environments.In this review,the recent progress in SiC-based synaptic devices is summarized.Firstly,an in-depth discussion is conducted regarding the categories,working mechanisms,and structural designs of these devices.Subse-quently,several application scenarios for SiC-based synaptic devices are presented.Finally,a few perspectives and directions for their future development are outlined.展开更多
The rapid advent in artificial intelligence and big data has revolutionized the dynamic requirement in the demands of the computing resource for executing specific tasks in the cloud environment.The process of achievi...The rapid advent in artificial intelligence and big data has revolutionized the dynamic requirement in the demands of the computing resource for executing specific tasks in the cloud environment.The process of achieving autonomic resource management is identified to be a herculean task due to its huge distributed and heterogeneous environment.Moreover,the cloud network needs to provide autonomic resource management and deliver potential services to the clients by complying with the requirements of Quality-of-Service(QoS)without impacting the Service Level Agreements(SLAs).However,the existing autonomic cloud resource managing frameworks are not capable in handling the resources of the cloud with its dynamic requirements.In this paper,Coot Bird Behavior Model-based Workload Aware Autonomic Resource Management Scheme(CBBM-WARMS)is proposed for handling the dynamic requirements of cloud resources through the estimation of workload that need to be policed by the cloud environment.This CBBM-WARMS initially adopted the algorithm of adaptive density peak clustering for workloads clustering of the cloud.Then,it utilized the fuzzy logic during the process of workload scheduling for achieving the determining the availability of cloud resources.It further used CBBM for potential Virtual Machine(VM)deployment that attributes towards the provision of optimal resources.It is proposed with the capability of achieving optimal QoS with minimized time,energy consumption,SLA cost and SLA violation.The experimental validation of the proposed CBBMWARMS confirms minimized SLA cost of 19.21%and reduced SLA violation rate of 18.74%,better than the compared autonomic cloud resource managing frameworks.展开更多
Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power sta...Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power stations).To solve the problem,we propose an energy harvesting based task scheduling and resource management framework to provide robust and low-cost edge computing services for smart grid.First,we formulate an energy consumption minimization problem with regard to task offloading,time switching,and resource allocation for mobile devices,which can be decoupled and transformed into a typical knapsack problem.Then,solutions are derived by two different algorithms.Furthermore,we deploy renewable energy and energy storage units at edge servers to tackle intermittency and instability problems.Finally,we design an energy management algorithm based on sampling average approximation for edge computing servers to derive the optimal charging/discharging strategies,number of energy storage units,and renewable energy utilization.The simulation results show the efficiency and superiority of our proposed framework.展开更多
Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different ...Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.展开更多
Efficient resource provisioning,allocation,and computation offloading are critical to realizing lowlatency,scalable,and energy-efficient applications in cloud,fog,and edge computing.Despite its importance,integrating ...Efficient resource provisioning,allocation,and computation offloading are critical to realizing lowlatency,scalable,and energy-efficient applications in cloud,fog,and edge computing.Despite its importance,integrating Software Defined Networks(SDN)for enhancing resource orchestration,task scheduling,and traffic management remains a relatively underexplored area with significant innovation potential.This paper provides a comprehensive review of existing mechanisms,categorizing resource provisioning approaches into static,dynamic,and user-centric models,while examining applications across domains such as IoT,healthcare,and autonomous systems.The survey highlights challenges such as scalability,interoperability,and security in managing dynamic and heterogeneous infrastructures.This exclusive research evaluates how SDN enables adaptive policy-based handling of distributed resources through advanced orchestration processes.Furthermore,proposes future directions,including AI-driven optimization techniques and hybrid orchestrationmodels.By addressing these emerging opportunities,thiswork serves as a foundational reference for advancing resource management strategies in next-generation cloud,fog,and edge computing ecosystems.This survey concludes that SDN-enabled computing environments find essential guidance in addressing upcoming management opportunities.展开更多
The number of satellites,especially those operating in Low-Earth Orbit(LEO),has been exploding in recent years.Additionally,the burgeoning development of Artificial Intelligence(AI)software and hardware has opened up ...The number of satellites,especially those operating in Low-Earth Orbit(LEO),has been exploding in recent years.Additionally,the burgeoning development of Artificial Intelligence(AI)software and hardware has opened up new industrial opportunities in both air and space,with satellite-powered computing emerging as a new computing paradigm:Orbital Edge Computing(OEC).Compared to terrestrial edge computing,the mobility of LEO satellites and their limited communication,computation,and storage resources pose challenges in designing task-specific scheduling algorithms.Previous survey papers have largely focused on terrestrial edge computing or the integration of space and ground technologies,lacking a comprehensive summary of OEC architecture,algorithms,and case studies.This paper conducts a comprehensive survey and analysis of OEC's system architecture,applications,algorithms,and simulation tools,providing a solid background for researchers in the field.By discussing OEC use cases and the challenges faced,potential research directions for future OEC research are proposed.展开更多
文摘Milling Process Simulation is one of the important re search areas in manufacturing science. For the purpose of improving the prec ision of simulation and extending its usability, numerical algorithm is more and more used in the milling modeling areas. But simulative efficiency is decreasin g with increase of its complexity. As a result, application of the method is lim ited. Aimed at above question, high-efficient algorithm for milling process sim ulation is studied. It is important for milling process simulation’s applicatio n. Parallel computing is widely used to solve the large-scale computation question s. Its advantages include system flexibility, robust, high-efficient computing capability and high ratio of performance to price. With the development of compu ter network, utilizing the computing resource in the Internet, a virtual computi ng environment with powerful computing capability can be consisted by microc omputers, and the difficulty of building hardware environment which is used to s upport parallel computing is reduced. How to use network technology and parallel algorithm to improve simulative effic iency for milling forces simulation is investigated in the paper. In order to pr edict milling forces, a simplified local milling forces model is used in the pap er. End milling cutter is assumed to be divided by r number of differential elem ents along the axial direction of the cutter. For a given time, the total cuttin g forces can be obtained by summarizing the resultant cutting force produced by each differential cutter disc. Divide the whole simulative time into some segmen ts, send these program’s segments to microcomputers in the Internet and obtain the result of the program’s segments, all of the result of program’s segments a re composed the final result. For implementing the algorithm, a distributed Parallel computing framework is de signed in the paper. In the framework, web server plays a role of controller. Us ing Java RMI(remote method interface), the computing processes in computing serv er are called by web server. There are lots of control processes in web server a nd control the computing servers. The codes of simulative algorithm can be dynam ic sent to the computing servers, and milling forces at the different time are c omputed through utilizing the local computer’s resource. The results that are ca lculated by every computing servers are sent to the web server, and composed the final result. The framework can be used by different simulative algorithm. Comp ared with the algorithm running single machine, the efficiency of provided algor ithm is higher than that of single machine.
基金National Natural Science Foundation of China,Grant/Award Number:51979187。
文摘Parallel computing assigns the computing model to different processors on different devices and implements it simultaneously.Accordingly,it has broad applications in the numerical simulation of geotechnical engineering and underground engineering,of which models are always large-scale.With parallel computing,the computing time or the memory requirements will be reduced by splitting the original domain of the numerical model into many subdomains,which is thus named as the domain decomposition method.In this study,a cubic and equal volume domain decomposition strategy was utilized to realize the parallel computing on the distributed memory system of four-dimensional lattice spring model(4D-LSM)based on the message passing interface.With a more efficient communication strategy introduced,this study aimed at operating an one-billion-particle model on a supercomputer platform.The preprocessing procedure of the parallelized 4D-LSM was restructured and the particle generation strategy suitable for the supercomputer platform was employed to minimize the time consumption in preprocessing and calculation.On this basis,numerical calculations were performed on TianHe-3 prototype E class supercomputer at the National Supercomputer Center in Tianjin.Two fieldscale three-dimensional blasting wave propagation models were carried out,of which the numerical results verify the computing power and the advantage of the parallelized 4D-LSM in the simulation of large-scale three-dimension models.Subsequently,the time complexity and spatial complexity of 4D-LSM and other particle discrete element methods were analyzed.
文摘High-performance computing(HPC)refers to the ability to process data and perform complex calculations at high speeds.It is one of the most essential tools fueling the advancement of science and technology.
基金supported by the Key Research and Development Program of China(2021YFC3000704)Institute of Geophysics,China Earthquake Administration Grant DQJB23R18+1 种基金the USTC Research Funds of the Double First-Class Initiative(YD2080002012)NSFC Grant(U2239206)。
文摘Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-GPU heterogeneous computing framework designed to significantly enhance the efficiency of computing 9-component NCFs from seismic ambient noise data.This framework not only accelerated the computational process by leveraging the Compute Unified Device Architecture(CUDA)but also improved the signal-to-noise ratio(SNR)through innovative stacking techniques,such as time-frequency domain phaseweighted stacking(tf-PWS).We validated the program using multiple datasets,confirming its superior computation speed,improved reliability,and higher signal-to-noise ratios for NCFs.Our comprehensive study provides detailed insights into optimizing the computational processes for noise cross-correlation functions,thereby enhancing the precision and efficiency of ambient noise imaging.
文摘This study embarks on a comprehensive examination of optimization techniques within GPU-based parallel programming models,pivotal for advancing high-performance computing(HPC).Emphasizing the transition of GPUs from graphic-centric processors to versatile computing units,it delves into the nuanced optimization of memory access,thread management,algorithmic design,and data structures.These optimizations are critical for exploiting the parallel processing capabilities of GPUs,addressingboth the theoretical frameworks and practical implementations.By integrating advanced strategies such as memory coalescing,dynamic scheduling,and parallel algorithmic transformations,this research aims to significantly elevate computational efficiency and throughput.The findings underscore the potential of optimized GPU programming to revolutionize computational tasks across various domains,highlighting a pathway towards achieving unparalleled processing power and efficiency in HPC environments.The paper not only contributes to the academic discourse on GPU optimization but also provides actionable insights for developers,fostering advancements in computational sciences and technology.
基金funded by the Defense Advanced Research Project Agency(DARPA)Low Temperature Logic Technology(LTLT)program.
文摘Low temperature complementary metal oxide semiconductor(CMOS)or cryogenic CMOS is a promising avenue for the continuation of Moore’s law while serving the needs of high performance computing.With temperature as a control“knob”to steepen the subthreshold slope behavior of CMOS devices,the supply voltage of operation can be reduced with no impact on operating speed.With the optimal threshold voltage engineering,the device ON current can be further enhanced,translating to higher performance.In this article,the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device,circuit and system level.We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator(FDSOI)operating at cryogenic temperature.Finally,the challenges and opportunities in the further development and deployment of such systems were discussed.
基金This work was supported by the National key R&D Program of China(2018YFB0203901)the National Natural Science Foundation of China under(Grant No.61772053)the fund of the State Key Laboratory of Software Development Environment(SKLSDE-2020ZX15).
文摘Wide-area high-performance computing is widely used for large-scale parallel computing applications owing to its high computing and storage resources.However,the geographical distribution of computing and storage resources makes efficient task distribution and data placement more challenging.To achieve a higher system performance,this study proposes a two-level global collaborative scheduling strategy for wide-area high-performance computing environments.The collaborative scheduling strategy integrates lightweight solution selection,redundant data placement and task stealing mechanisms,optimizing task distribution and data placement to achieve efficient computing in wide-area environments.The experimental results indicate that compared with the state-of-the-art collaborative scheduling algorithm HPS+,the proposed scheduling strategy reduces the makespan by 23.24%,improves computing and storage resource utilization by 8.28%and 21.73%respectively,and achieves similar global data migration costs.
文摘Satellite edge computing has garnered significant attention from researchers;however,processing a large volume of tasks within multi-node satellite networks still poses considerable challenges.The sharp increase in user demand for latency-sensitive tasks has inevitably led to offloading bottlenecks and insufficient computational capacity on individual satellite edge servers,making it necessary to implement effective task offloading scheduling to enhance user experience.In this paper,we propose a priority-based task scheduling strategy based on a Software-Defined Network(SDN)framework for satellite-terrestrial integrated networks,which clarifies the execution order of tasks based on their priority.Subsequently,we apply a Dueling-Double Deep Q-Network(DDQN)algorithm enhanced with prioritized experience replay to derive a computation offloading strategy,improving the experience replay mechanism within the Dueling-DDQN framework.Next,we utilize the Deep Deterministic Policy Gradient(DDPG)algorithm to determine the optimal resource allocation strategy to reduce the processing latency of sub-tasks.Simulation results demonstrate that the proposed d3-DDPG algorithm outperforms other approaches,effectively reducing task processing latency and thus improving user experience and system efficiency.
基金supported by the National Natural Science Foundation of China(Grant No.31771466)the National Key R&D Program of China(Grant Nos.2018YFB0203903,2016YFC0503607,and 2016YFB0200300)+3 种基金the Transformation Project in Scientific and Technological Achievements of Qinghai,China(Grant No.2016-SF-127)the Special Project of Informatization of Chinese Academy of Sciences,China(Grant No.XXH13504-08)the Strategic Pilot Science and Technology Project of Chinese Academy of Sciences,China(Grant No.XDA12010000)the 100-Talents Program of Chinese Academy of Sciences,China(awarded to BN)
文摘Brain science accelerates the study of intelligence and behavior,contributes fundamental insights into human cognition,and offers prospective treatments for brain disease.Faced with the challenges posed by imaging technologies and deep learning computational models,big data and high-performance computing(HPC)play essential roles in studying brain function,brain diseases,and large-scale brain models or connectomes.We review the driving forces behind big data and HPC methods applied to brain science,including deep learning,powerful data analysis capabilities,and computational performance solutions,each of which can be used to improve diagnostic accuracy and research output.This work reinforces predictions that big data and HPC will continue to improve brain science by making ultrahigh-performance analysis possible,by improving data standardization and sharing,and by providing new neuromorphic insights.
文摘The publication of Tsinghua Science and Technology was started in 1996.Since then,it has been an international academic journal sponsored by Tsinghua University and published bimonthly.This journal aims at presenting the state-of-the-art scientific achievements in computer science and other IT fields.
基金supported by the"Science and Technology Development Plan Project of Jilin Province,China"(Grant No.20240101018JJ)the Fundamental Research Funds for the Central Universities(Grant No.2412023YQ004)the National Natural Science Foundation of China(Grant Nos.52072065,52272140,52372137,and U23A20568).
文摘Optoelectronic memristor is generating growing research interest for high efficient computing and sensing-memory applications.In this work,an optoelectronic memristor with Au/a-C:Te/Pt structure is developed.Synaptic functions,i.e.,excita-tory post-synaptic current and pair-pulse facilitation are successfully mimicked with the memristor under electrical and optical stimulations.More importantly,the device exhibited distinguishable response currents by adjusting 4-bit input electrical/opti-cal signals.A multi-mode reservoir computing(RC)system is constructed with the optoelectronic memristors to emulate human tactile-visual fusion recognition and an accuracy of 98.7%is achieved.The optoelectronic memristor provides potential for developing multi-mode RC system.
基金funded by the Fundamental Research Funds for the Central Universities(J2023-024,J2023-027).
文摘As an important complement to cloud computing, edge computing can effectively reduce the workload of the backbone network. To reduce latency and energy consumption of edge computing, deep learning is used to learn the task offloading strategies by interacting with the entities. In actual application scenarios, users of edge computing are always changing dynamically. However, the existing task offloading strategies cannot be applied to such dynamic scenarios. To solve this problem, we propose a novel dynamic task offloading framework for distributed edge computing, leveraging the potential of meta-reinforcement learning (MRL). Our approach formulates a multi-objective optimization problem aimed at minimizing both delay and energy consumption. We model the task offloading strategy using a directed acyclic graph (DAG). Furthermore, we propose a distributed edge computing adaptive task offloading algorithm rooted in MRL. This algorithm integrates multiple Markov decision processes (MDP) with a sequence-to-sequence (seq2seq) network, enabling it to learn and adapt task offloading strategies responsively across diverse network environments. To achieve joint optimization of delay and energy consumption, we incorporate the non-dominated sorting genetic algorithm II (NSGA-II) into our framework. Simulation results demonstrate the superiority of our proposed solution, achieving a 21% reduction in time delay and a 19% decrease in energy consumption compared to alternative task offloading schemes. Moreover, our scheme exhibits remarkable adaptability, responding swiftly to changes in various network environments.
基金the National Research Foundation(NRF)Singapore mid-sized center grant(NRF-MSG-2023-0002)FrontierCRP grant(NRF-F-CRP-2024-0006)+2 种基金A*STAR Singapore MTC RIE2025 project(M24W1NS005)IAF-PP project(M23M5a0069)Ministry of Education(MOE)Singapore Tier 2 project(MOE-T2EP50220-0014).
文摘The rise of large-scale artificial intelligence(AI)models,such as ChatGPT,Deep-Seek,and autonomous vehicle systems,has significantly advanced the boundaries of AI,enabling highly complex tasks in natural language processing,image recognition,and real-time decisionmaking.However,these models demand immense computational power and are often centralized,relying on cloud-based architectures with inherent limitations in latency,privacy,and energy efficiency.To address these challenges and bring AI closer to real-world applications,such as wearable health monitoring,robotics,and immersive virtual environments,innovative hardware solutions are urgently needed.This work introduces a near-sensor edge computing(NSEC)system,built on a bilayer AlN/Si waveguide platform,to provide real-time,energy-efficient AI capabilities at the edge.Leveraging the electro-optic properties of AlN microring resonators for photonic feature extraction,coupled with Si-based thermo-optic Mach-Zehnder interferometers for neural network computations,the system represents a transformative approach to AI hardware design.Demonstrated through multimodal gesture and gait analysis,the NSEC system achieves high classification accuracies of 96.77%for gestures and 98.31%for gaits,ultra-low latency(<10 ns),and minimal energy consumption(<0.34 pJ).This groundbreaking system bridges the gap between AI models and real-world applications,enabling efficient,privacy-preserving AI solutions for healthcare,robotics,and next-generation human-machine interfaces,marking a pivotal advancement in edge computing and AI deployment.
基金supported by the Natural Science Foundation of Zhejiang Province(Grant No.LQ24F040007)the National Natural Science Foundation of China(Grant No.U22A2075)the Opening Project of State Key Laboratory of Polymer Materials Engineering(Sichuan University)(Grant No.sklpme2024-1-21).
文摘To address the increasing demand for massive data storage and processing,brain-inspired neuromorphic comput-ing systems based on artificial synaptic devices have been actively developed in recent years.Among the various materials inves-tigated for the fabrication of synaptic devices,silicon carbide(SiC)has emerged as a preferred choices due to its high electron mobility,superior thermal conductivity,and excellent thermal stability,which exhibits promising potential for neuromorphic applications in harsh environments.In this review,the recent progress in SiC-based synaptic devices is summarized.Firstly,an in-depth discussion is conducted regarding the categories,working mechanisms,and structural designs of these devices.Subse-quently,several application scenarios for SiC-based synaptic devices are presented.Finally,a few perspectives and directions for their future development are outlined.
文摘The rapid advent in artificial intelligence and big data has revolutionized the dynamic requirement in the demands of the computing resource for executing specific tasks in the cloud environment.The process of achieving autonomic resource management is identified to be a herculean task due to its huge distributed and heterogeneous environment.Moreover,the cloud network needs to provide autonomic resource management and deliver potential services to the clients by complying with the requirements of Quality-of-Service(QoS)without impacting the Service Level Agreements(SLAs).However,the existing autonomic cloud resource managing frameworks are not capable in handling the resources of the cloud with its dynamic requirements.In this paper,Coot Bird Behavior Model-based Workload Aware Autonomic Resource Management Scheme(CBBM-WARMS)is proposed for handling the dynamic requirements of cloud resources through the estimation of workload that need to be policed by the cloud environment.This CBBM-WARMS initially adopted the algorithm of adaptive density peak clustering for workloads clustering of the cloud.Then,it utilized the fuzzy logic during the process of workload scheduling for achieving the determining the availability of cloud resources.It further used CBBM for potential Virtual Machine(VM)deployment that attributes towards the provision of optimal resources.It is proposed with the capability of achieving optimal QoS with minimized time,energy consumption,SLA cost and SLA violation.The experimental validation of the proposed CBBMWARMS confirms minimized SLA cost of 19.21%and reduced SLA violation rate of 18.74%,better than the compared autonomic cloud resource managing frameworks.
基金supported in part by the National Natural Science Foundation of China under Grant No.61473066in part by the Natural Science Foundation of Hebei Province under Grant No.F2021501020+2 种基金in part by the S&T Program of Qinhuangdao under Grant No.202401A195in part by the Science Research Project of Hebei Education Department under Grant No.QN2025008in part by the Innovation Capability Improvement Plan Project of Hebei Province under Grant No.22567637H
文摘Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power stations).To solve the problem,we propose an energy harvesting based task scheduling and resource management framework to provide robust and low-cost edge computing services for smart grid.First,we formulate an energy consumption minimization problem with regard to task offloading,time switching,and resource allocation for mobile devices,which can be decoupled and transformed into a typical knapsack problem.Then,solutions are derived by two different algorithms.Furthermore,we deploy renewable energy and energy storage units at edge servers to tackle intermittency and instability problems.Finally,we design an energy management algorithm based on sampling average approximation for edge computing servers to derive the optimal charging/discharging strategies,number of energy storage units,and renewable energy utilization.The simulation results show the efficiency and superiority of our proposed framework.
基金supported by the ONR Vannevar Bush Faculty Fellowship(Grant No.N00014-22-1-2795).
文摘Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.
文摘Efficient resource provisioning,allocation,and computation offloading are critical to realizing lowlatency,scalable,and energy-efficient applications in cloud,fog,and edge computing.Despite its importance,integrating Software Defined Networks(SDN)for enhancing resource orchestration,task scheduling,and traffic management remains a relatively underexplored area with significant innovation potential.This paper provides a comprehensive review of existing mechanisms,categorizing resource provisioning approaches into static,dynamic,and user-centric models,while examining applications across domains such as IoT,healthcare,and autonomous systems.The survey highlights challenges such as scalability,interoperability,and security in managing dynamic and heterogeneous infrastructures.This exclusive research evaluates how SDN enables adaptive policy-based handling of distributed resources through advanced orchestration processes.Furthermore,proposes future directions,including AI-driven optimization techniques and hybrid orchestrationmodels.By addressing these emerging opportunities,thiswork serves as a foundational reference for advancing resource management strategies in next-generation cloud,fog,and edge computing ecosystems.This survey concludes that SDN-enabled computing environments find essential guidance in addressing upcoming management opportunities.
基金funded by the Hong Kong-Macao-Taiwan Science and Technology Cooperation Project of the Science and Technology Innovation Action Plan in Shanghai,China(23510760200)the Oriental Talent Youth Program of Shanghai,China(No.Y3DFRCZL01)+1 种基金the Outstanding Program of the Youth Innovation Promotion Association of the Chinese Academy of Sciences(No.Y2023080)the Strategic Priority Research Program of the Chinese Academy of Sciences Category A(No.XDA0360404).
文摘The number of satellites,especially those operating in Low-Earth Orbit(LEO),has been exploding in recent years.Additionally,the burgeoning development of Artificial Intelligence(AI)software and hardware has opened up new industrial opportunities in both air and space,with satellite-powered computing emerging as a new computing paradigm:Orbital Edge Computing(OEC).Compared to terrestrial edge computing,the mobility of LEO satellites and their limited communication,computation,and storage resources pose challenges in designing task-specific scheduling algorithms.Previous survey papers have largely focused on terrestrial edge computing or the integration of space and ground technologies,lacking a comprehensive summary of OEC architecture,algorithms,and case studies.This paper conducts a comprehensive survey and analysis of OEC's system architecture,applications,algorithms,and simulation tools,providing a solid background for researchers in the field.By discussing OEC use cases and the challenges faced,potential research directions for future OEC research are proposed.