This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr...This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.展开更多
Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power sta...Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power stations).To solve the problem,we propose an energy harvesting based task scheduling and resource management framework to provide robust and low-cost edge computing services for smart grid.First,we formulate an energy consumption minimization problem with regard to task offloading,time switching,and resource allocation for mobile devices,which can be decoupled and transformed into a typical knapsack problem.Then,solutions are derived by two different algorithms.Furthermore,we deploy renewable energy and energy storage units at edge servers to tackle intermittency and instability problems.Finally,we design an energy management algorithm based on sampling average approximation for edge computing servers to derive the optimal charging/discharging strategies,number of energy storage units,and renewable energy utilization.The simulation results show the efficiency and superiority of our proposed framework.展开更多
Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model ...Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.展开更多
In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep compu...In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep computing unit)heterogeneous computing platform.Multiple hybrid parallel schemes are assessed using a range of model systems,including those with up to 1200 atoms and 10000 basis func-tions.The findings of our research reveal that,during Hartree-Fock(HF)calculations,a single DCU ex-hibits 33.6 speedups over 32 C86 CPU cores.Compared with the efficiency of Wuhan Electronic Structure Package on Intel X86 and NVIDIA A100 computing platform,the Hygon platform exhibits good cost-effective-ness,showing great potential in quantum chemistry calculation and other high-performance scientific computations.展开更多
提出了一种基于Vulkan架构的弹跳射线(shooting and bouncing ray,SBR)加速计算方法,用于电大复杂目标雷达散射截面的快速计算。设计了高效的Vulkan计算着色器,充分利用GPU硬件光追,显著提升了SBR法中光线求交的计算速度;引入了双命令...提出了一种基于Vulkan架构的弹跳射线(shooting and bouncing ray,SBR)加速计算方法,用于电大复杂目标雷达散射截面的快速计算。设计了高效的Vulkan计算着色器,充分利用GPU硬件光追,显著提升了SBR法中光线求交的计算速度;引入了双命令缓冲机制,使得CPU与GPU能够高效协同工作,从而加速多角度扫描任务的执行;在虚拟孔径面上划分互不干扰的子任务,进一步提升了多GPU并行的利用效率。实验结果表明:所提出方法在计算电大复杂目标雷达散射截面时相较于FEKO RL-GO方法实现了40倍以上的加速;双命令缓冲机制提升了约42%的多角度扫描速度;双GPU计算并行效率超过90%。展开更多
受全球气候变暖和极端暴雨的双重影响,洪涝灾害频发,提高洪涝模型计算效率对洪涝实时模拟预报至关重要。然而,精细化洪涝模拟带来的巨大计算量导致模型无法满足实时计算结果并发布洪涝预警的需求。构建了基于GPU加速技术的高效高精度全...受全球气候变暖和极端暴雨的双重影响,洪涝灾害频发,提高洪涝模型计算效率对洪涝实时模拟预报至关重要。然而,精细化洪涝模拟带来的巨大计算量导致模型无法满足实时计算结果并发布洪涝预警的需求。构建了基于GPU加速技术的高效高精度全水动力数值模型,定量研究了GPU和CPU在洪涝模拟时的计算效率。结果表明:①相同情境下,NVIDIA Tesla P100-PCIE相比其他类型计算引擎的计算效率最优;②DEM网格分辨率相同时,GPU计算效率随着降雨重现期的增加而提升,GPU/CPU并行计算效率加速比为1.25~16.28倍;③降雨重现期相同时,DEM网格分辨率精度越高,GPU加速效率越显著,网格分辨率为3 m和5 m时,NVIDIA GeForce GTX 980Ti计算效率分别为CPU(单核)的4.32倍和3.26倍,NVIDIA Tesla P100-PCIE分别为CPU(单核)的16.28倍和7.86倍。综上,在保障较好的模拟精度的同时,DEM网格分辨率越精细,GPU加速计算效率越高。展开更多
基金the National Key R&D Program of China(2020YFB1708300)the National Natural Science Foundation of China(52005192)the Project of Ministry of Industry and Information Technology(TC210804R-3).
文摘This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude.
基金supported in part by the National Natural Science Foundation of China under Grant No.61473066in part by the Natural Science Foundation of Hebei Province under Grant No.F2021501020+2 种基金in part by the S&T Program of Qinhuangdao under Grant No.202401A195in part by the Science Research Project of Hebei Education Department under Grant No.QN2025008in part by the Innovation Capability Improvement Plan Project of Hebei Province under Grant No.22567637H
文摘Recently,one of the main challenges facing the smart grid is insufficient computing resources and intermittent energy supply for various distributed components(such as monitoring systems for renewable energy power stations).To solve the problem,we propose an energy harvesting based task scheduling and resource management framework to provide robust and low-cost edge computing services for smart grid.First,we formulate an energy consumption minimization problem with regard to task offloading,time switching,and resource allocation for mobile devices,which can be decoupled and transformed into a typical knapsack problem.Then,solutions are derived by two different algorithms.Furthermore,we deploy renewable energy and energy storage units at edge servers to tackle intermittency and instability problems.Finally,we design an energy management algorithm based on sampling average approximation for edge computing servers to derive the optimal charging/discharging strategies,number of energy storage units,and renewable energy utilization.The simulation results show the efficiency and superiority of our proposed framework.
基金support from the National Natural Science Foundation of China(Grant No.T2293771)the STI 2030-Major Projects(Grant No.2022ZD0211400)the Sichuan Province Outstanding Young Scientists Foundation(Grant No.2023NSFSC1919)。
文摘Independent cascade(IC)models,by simulating how one node can activate another,are important tools for studying the dynamics of information spreading in complex networks.However,traditional algorithms for the IC model implementation face significant efficiency bottlenecks when dealing with large-scale networks and multi-round simulations.To settle this problem,this study introduces a GPU-based parallel independent cascade(GPIC)algorithm,featuring an optimized representation of the network data structure and parallel task scheduling strategies.Specifically,for this GPIC algorithm,we propose a network data structure tailored for GPU processing,thereby enhancing the computational efficiency and the scalability of the IC model.In addition,we design a parallel framework that utilizes the full potential of GPU's parallel processing capabilities,thereby augmenting the computational efficiency.The results from our simulation experiments demonstrate that GPIC not only preserves accuracy but also significantly boosts efficiency,achieving a speedup factor of 129 when compared to the baseline IC method.Our experiments also reveal that when using GPIC for the independent cascade simulation,100-200 simulation rounds are sufficient for higher-cost studies,while high precision studies benefit from 500 rounds to ensure reliable results,providing empirical guidance for applying this new algorithm to practical research.
基金supported by the National Natural Science Foundation of China(No.22373112 to Ji Qi,No.22373111 and 21921004 to Minghui Yang)GH-fund A(No.202107011790)。
文摘In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep computing unit)heterogeneous computing platform.Multiple hybrid parallel schemes are assessed using a range of model systems,including those with up to 1200 atoms and 10000 basis func-tions.The findings of our research reveal that,during Hartree-Fock(HF)calculations,a single DCU ex-hibits 33.6 speedups over 32 C86 CPU cores.Compared with the efficiency of Wuhan Electronic Structure Package on Intel X86 and NVIDIA A100 computing platform,the Hygon platform exhibits good cost-effective-ness,showing great potential in quantum chemistry calculation and other high-performance scientific computations.
文摘提出了一种基于Vulkan架构的弹跳射线(shooting and bouncing ray,SBR)加速计算方法,用于电大复杂目标雷达散射截面的快速计算。设计了高效的Vulkan计算着色器,充分利用GPU硬件光追,显著提升了SBR法中光线求交的计算速度;引入了双命令缓冲机制,使得CPU与GPU能够高效协同工作,从而加速多角度扫描任务的执行;在虚拟孔径面上划分互不干扰的子任务,进一步提升了多GPU并行的利用效率。实验结果表明:所提出方法在计算电大复杂目标雷达散射截面时相较于FEKO RL-GO方法实现了40倍以上的加速;双命令缓冲机制提升了约42%的多角度扫描速度;双GPU计算并行效率超过90%。
文摘受全球气候变暖和极端暴雨的双重影响,洪涝灾害频发,提高洪涝模型计算效率对洪涝实时模拟预报至关重要。然而,精细化洪涝模拟带来的巨大计算量导致模型无法满足实时计算结果并发布洪涝预警的需求。构建了基于GPU加速技术的高效高精度全水动力数值模型,定量研究了GPU和CPU在洪涝模拟时的计算效率。结果表明:①相同情境下,NVIDIA Tesla P100-PCIE相比其他类型计算引擎的计算效率最优;②DEM网格分辨率相同时,GPU计算效率随着降雨重现期的增加而提升,GPU/CPU并行计算效率加速比为1.25~16.28倍;③降雨重现期相同时,DEM网格分辨率精度越高,GPU加速效率越显著,网格分辨率为3 m和5 m时,NVIDIA GeForce GTX 980Ti计算效率分别为CPU(单核)的4.32倍和3.26倍,NVIDIA Tesla P100-PCIE分别为CPU(单核)的16.28倍和7.86倍。综上,在保障较好的模拟精度的同时,DEM网格分辨率越精细,GPU加速计算效率越高。