期刊文献+
共找到35篇文章
< 1 2 >
每页显示 20 50 100
Accelerating Hartree-Fock Self-consistent Field Calculation on C86/DCU Heterogenous Computing Platform
1
作者 Ji Qi Huimin Zhang +1 位作者 Dezun Shan Minghui Yang 《Chinese Journal of Chemical Physics》 2025年第1期81-94,I0056,共15页
In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep compu... In this study,we investigate the ef-ficacy of a hybrid parallel algo-rithm aiming at enhancing the speed of evaluation of two-electron repulsion integrals(ERI)and Fock matrix generation on the Hygon C86/DCU(deep computing unit)heterogeneous computing platform.Multiple hybrid parallel schemes are assessed using a range of model systems,including those with up to 1200 atoms and 10000 basis func-tions.The findings of our research reveal that,during Hartree-Fock(HF)calculations,a single DCU ex-hibits 33.6 speedups over 32 C86 CPU cores.Compared with the efficiency of Wuhan Electronic Structure Package on Intel X86 and NVIDIA A100 computing platform,the Hygon platform exhibits good cost-effective-ness,showing great potential in quantum chemistry calculation and other high-performance scientific computations. 展开更多
关键词 Quantum chemistry Self-consistent field HARTREE-FOCK Electron repulsion inte-grals heterogenous parallel computing C86/deep computing unit
在线阅读 下载PDF
Heterogeneous Computing Power Scheduling Method Based on Distributed Deep Reinforcement Learning in Cloud-Edge-End Environments
2
作者 Jinwei Mao Wang Luo +5 位作者 Jiangtao Xu Daohua Zhu WeiLiang Zhechen Huang Bao Feng Shuang Yang 《Computers, Materials & Continua》 2026年第5期1964-1985,共22页
With the rapid development of power Internet of Things(IoT)scenarios such as smart factories and smart homes,numerous intelligent terminal devices and real-time interactive applications impose higher demands on comput... With the rapid development of power Internet of Things(IoT)scenarios such as smart factories and smart homes,numerous intelligent terminal devices and real-time interactive applications impose higher demands on computing latency and resource supply efficiency.Multi-access edge computing technology deploys cloud computing capabilities at the network edge;constructs distributed computing nodes and multi-access systems and offers infrastructure support for services with low latency and high reliability.Existing research relies on a strong assumption that the environmental state is fully observable and fails to thoroughly consider the continuous time-varying features of edge server load fluctuations,leading to insufficient adaptability of the model in a heterogeneous dynamic environment.Thus,this paper establishes a framework for end-edge collaborative task offloading based on a partially observable Markov decision-making process(POMDP)and proposes a method for end-edge collaborative task offloading in heterogeneous scenarios.It achieves time-series modeling of the historical load characteristics of edge servers and endows the agent with the ability to be aware of the load in dynamic environmental states.Moreover,by dynamically assessing the exploration value of historical trajectories in the central trajectory pool and adjusting the sample weight distribution,directional exploration and strategy optimization of high-value trajectories are realized.Experimental results indicate that the proposed method exhibits distinct advantages compared with existing methods in terms of average delay and task failure rate and also verifies the method’s robustness in a dynamic environment. 展开更多
关键词 Edge computing end-edge collaboration heterogeneous computing power scheduling resource allocation
在线阅读 下载PDF
High-performance CPU-GPU heterogeneous computing method for 9-component ambient noise cross-correlation
3
作者 Jingxi Wang Weitao Wang +4 位作者 Chao Wu Lei Jiang Hanwen Zou Huajian Yao Ling Chen 《Earthquake Research Advances》 2025年第3期81-87,共7页
Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-G... Ambient noise tomography is an established technique in seismology,where calculating single-or ninecomponent noise cross-correlation functions(NCFs)is a fundamental first step.In this study,we introduced a novel CPU-GPU heterogeneous computing framework designed to significantly enhance the efficiency of computing 9-component NCFs from seismic ambient noise data.This framework not only accelerated the computational process by leveraging the Compute Unified Device Architecture(CUDA)but also improved the signal-to-noise ratio(SNR)through innovative stacking techniques,such as time-frequency domain phaseweighted stacking(tf-PWS).We validated the program using multiple datasets,confirming its superior computation speed,improved reliability,and higher signal-to-noise ratios for NCFs.Our comprehensive study provides detailed insights into optimizing the computational processes for noise cross-correlation functions,thereby enhancing the precision and efficiency of ambient noise imaging. 展开更多
关键词 Nine-component NCFs Heterogeneous computing Ambient noise tomography CUDA tf-PWS
在线阅读 下载PDF
Joint Resource Allocation Using Evolutionary Algorithms in Heterogeneous Mobile Cloud Computing Networks 被引量:10
4
作者 Weiwei Xia Lianfeng Shen 《China Communications》 SCIE CSCD 2018年第8期189-204,共16页
The problem of joint radio and cloud resources allocation is studied for heterogeneous mobile cloud computing networks. The objective of the proposed joint resource allocation schemes is to maximize the total utility ... The problem of joint radio and cloud resources allocation is studied for heterogeneous mobile cloud computing networks. The objective of the proposed joint resource allocation schemes is to maximize the total utility of users as well as satisfy the required quality of service(QoS) such as the end-to-end response latency experienced by each user. We formulate the problem of joint resource allocation as a combinatorial optimization problem. Three evolutionary approaches are considered to solve the problem: genetic algorithm(GA), ant colony optimization with genetic algorithm(ACO-GA), and quantum genetic algorithm(QGA). To decrease the time complexity, we propose a mapping process between the resource allocation matrix and the chromosome of GA, ACO-GA, and QGA, search the available radio and cloud resource pairs based on the resource availability matrixes for ACOGA, and encode the difference value between the allocated resources and the minimum resource requirement for QGA. Extensive simulation results show that our proposed methods greatly outperform the existing algorithms in terms of running time, the accuracy of final results, the total utility, resource utilization and the end-to-end response latency guaranteeing. 展开更多
关键词 heterogeneous mobile cloud computing networks resource allocation genetic algorithm ant colony optimization quantum genetic algorithm
在线阅读 下载PDF
Fast weighting method for plasma PIC simulation on GPU-accelerated heterogeneous systems 被引量:2
5
作者 杨灿群 吴强 +3 位作者 胡慧俐 石志才 陈娟 唐滔 《Journal of Central South University》 SCIE EI CAS 2013年第6期1527-1535,共9页
Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic pro... Particle-in-cell (PIC) method has got much benefits from GPU-accelerated heterogeneous systems.However,the performance of PIC is constrained by the interpolation operations in the weighting process on GPU (graphic processing unit).Aiming at this problem,a fast weighting method for PIC simulation on GPU-accelerated systems was proposed to avoid the atomic memory operations during the weighting process.The method was implemented by taking advantage of GPU's thread synchronization mechanism and dividing the problem space properly.Moreover,software managed shared memory on the GPU was employed to buffer the intermediate data.The experimental results show that the method achieves speedups up to 3.5 times compared to previous works,and runs 20.08 times faster on one NVIDIA Tesla M2090 GPU compared to a single core of Intel Xeon X5670 CPU. 展开更多
关键词 GPU computing heterogeneous computing plasma physics simulations particle-in-cell (PIC)
在线阅读 下载PDF
Federated Feature Concatenate Method for Heterogeneous Computing in Federated Learning 被引量:2
6
作者 Wu-Chun Chung Yung-Chin Chang +2 位作者 Ching-Hsien Hsu Chih-Hung Chang Che-Lun Hung 《Computers, Materials & Continua》 SCIE EI 2023年第4期351-371,共21页
Federated learning is an emerging machine learning techniquethat enables clients to collaboratively train a deep learning model withoutuploading raw data to the aggregation server. Each client may be equippedwith diff... Federated learning is an emerging machine learning techniquethat enables clients to collaboratively train a deep learning model withoutuploading raw data to the aggregation server. Each client may be equippedwith different computing resources for model training. The client equippedwith a lower computing capability requires more time for model training,resulting in a prolonged training time in federated learning. Moreover, it mayfail to train the entire model because of the out-of-memory issue. This studyaims to tackle these problems and propose the federated feature concatenate(FedFC) method for federated learning considering heterogeneous clients.FedFC leverages the model splitting and feature concatenate for offloadinga portion of the training loads from clients to the aggregation server. Eachclient in FedFC can collaboratively train a model with different cutting layers.Therefore, the specific features learned in the deeper layer of the serversidemodel are more identical for the data class classification. Accordingly,FedFC can reduce the computation loading for the resource-constrainedclient and accelerate the convergence time. The performance effectiveness isverified by considering different dataset scenarios, such as data and classimbalance for the participant clients in the experiments. The performanceimpacts of different cutting layers are evaluated during the model training.The experimental results show that the co-adapted features have a criticalimpact on the adequate classification of the deep learning model. Overall,FedFC not only shortens the convergence time, but also improves the bestaccuracy by up to 5.9% and 14.5% when compared to conventional federatedlearning and splitfed, respectively. In conclusion, the proposed approach isfeasible and effective for heterogeneous clients in federated learning. 展开更多
关键词 Federated learning deep learning artificial intelligence heterogeneous computing
在线阅读 下载PDF
Discrete particle methods for engineering simulation:Reproducing mesoscale structures in multiphase systems 被引量:1
7
作者 Ji Xu Peng Zhao +2 位作者 Yong Zhang Junwu Wang Wei Ge 《Resources Chemicals and Materials》 2022年第1期69-79,共11页
Most natural resources are processed as particle-fluid multiphase systems in chemical,mineral and material indus-tries,therefore,discrete particles methods(DPM)are reasonable choices of simulation method for engineeri... Most natural resources are processed as particle-fluid multiphase systems in chemical,mineral and material indus-tries,therefore,discrete particles methods(DPM)are reasonable choices of simulation method for engineering the relevant processes and equipments.However,direct application of these methods is challenged by the complex multiscale behavior of such systems,which leads to enormous computational cost or otherwise qualitatively inac-curate description of the mesoscale structures.The coarse-grained DPM based on the energy-minimization multi-scale(EMMS)model,or EMMS-DPM,was proposed to reduce the computational cost by several orders while main-taining an accurate description of the mesoscale structures,which paves the way for its engineering applications.Further empowered by the high-efficiency multi-scale DEM software DEMms and the corresponding customized heterogeneous supercomputing facilities with graphics processing units(GPUs),it may even approach realtime simulation of industrial reactors.This short review will introduce the principle of DPM,in particular,EMMS-DPM,and the recent developments in modeling,numerical implementation and application of large-scale DPM which aims to reach industrial scale on one hand and resolves mesoscale structures critical to reaction-transport coupling on the other hand.This review finally prospects on the future developments of DPM in this direction. 展开更多
关键词 COARSE-GRAINING Discrete element method(DEM) EMMS-DPM(EMMS-based discrete particle method) GPU-CPU heterogeneous computing MESOSCALE
在线阅读 下载PDF
FPGA Accelerators for Computing Interatomic Potential-Based Molecular Dynamics Simulation for Gold Nanoparticles:Exploring Different Communication Protocols
8
作者 Ankitkumar Patel Srivathsan Vasudevan Satya Bulusu 《Computers, Materials & Continua》 SCIE EI 2024年第9期3803-3818,共16页
Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,... Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application. 展开更多
关键词 Ethernet hardware accelerator heterogeneous computing interatomic potential(IAP) MDsimulation peripheral component interconnect express(PCIe) UART
在线阅读 下载PDF
A new heuristic for task scheduling in heterogeneous computing environment
9
作者 Ehsan Ullah MUNIR Jian-zhong LI +2 位作者 Sheng-fei SHI Zhao-nian ZOU Qaisar RASOOL 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2008年第12期1715-1723,共9页
Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The ta... Heterogeneous computing (HC) environment utilizes diverse resources with different computational capabilities to solve computing-intensive applications having diverse computational requirements and constraints. The task assignment problem in HC environment can be formally defined as for a given set of tasks and machines, assigning tasks to machines to achieve the minimum makespan. In this paper we propose a new task scheduling heuristic, high standard deviation first (HSTDF), which considers the standard deviation of the expected execution time of a task as a selection criterion. Standard deviation of the ex- pected execution time of a task represents the amount of variation in task execution time on different machines. Our conclusion is that tasks having high standard deviation must be assigned first for scheduling. A large number of experiments were carried out to check the effectiveness of the proposed heuristic in different scenarios, and the comparison with the existing heuristics (Max-min, Sufferage, Segmented Min-average, Segmented Min-min, and Segmented Max-min) clearly reveals that the proposed heuristic outperforms all existing heuristics in terms of average makespan. 展开更多
关键词 Heterogeneous computing Task scheduling Greedy heuristics High standard deviation first (HSTDF) heuristic
在线阅读 下载PDF
Efficient Data-parallel Computations on Distributed Systems
10
作者 曾志勇 LU Xinda 《High Technology Letters》 EI CAS 2002年第3期92-96,共5页
Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest het... Task scheduling determines the performance of NOW computing to a large extent. However, the computer system architecture, computing capability and system load are rarely proposed together. In this paper, a biggest heterogeneous scheduling algorithm is presented. It fully considers the system characteristics (from application view), structure and state. So it always can utilize all processing resource under a reasonable premise. The results of experiment show the algorithm can significantly shorten the response time of jobs. 展开更多
关键词 parallel algorithms heterogeneous computing message passing load balancing
在线阅读 下载PDF
Dynamic load balancing for real-time multiview path tracing on multi-GPU architectures
11
作者 Erwan LERIA Markku MAKITALO +1 位作者 Julius IKKALA Pekka JÄÄSKELÄINEN 《虚拟现实与智能硬件(中英文)》 2025年第4期393-405,共13页
Stereoscopic and multiview rendering are used for virtual reality and the synthetic generation of light fields from three-dimensional scenes.Because rendering multiple views using ray tracing techniques is computation... Stereoscopic and multiview rendering are used for virtual reality and the synthetic generation of light fields from three-dimensional scenes.Because rendering multiple views using ray tracing techniques is computationally expensive,the utilization of multiprocessor machines is necessary to achieve real-time frame rates.In this study,we propose a dynamic load-balancing algorithm for real-time multiview path tracing on multi-compute device platforms.The proposed algorithm was adapted to heterogeneous hardware combinations and dynamic scenes in real time.We show that on a heterogeneous dual-GPU platform,our implementation reduces the rendering time by an average of approximately 30%–50%compared with that of a uniform workload distribution,depending on the scene and number of views. 展开更多
关键词 Virtual reality Multiview Light field Heterogeneous computing
在线阅读 下载PDF
THUBrachy:fast Monte Carlo dose calculation tool accelerated by heterogeneous hardware for high-dose-rate brachytherapy 被引量:1
12
作者 An-Kang Hu Rui Qiu +5 位作者 Huan Liu Zhen Wu Chun-Yan Li Hui Zhang Jun-Li Li Rui-Jie Yang 《Nuclear Science and Techniques》 SCIE EI CAS CSCD 2021年第3期107-119,共13页
The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible t... The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible to substantially accelerate calculations with hardware accelerators.Accordingly,this study develops a fast MC tool,called THUBrachy,which can be accelerated by several types of hardware accelerators.THUBrachy can simulate photons with energy less than 3 MeV and considers all photon interactions in the energy range.It was benchmarked against the American Association of Physicists in Medicine Task Group No.43 Report using a water phantom and validated with Geant4 using a clinical case.A performance test was conducted using the clinical case,showing that a multicore central processing unit,Intel Xeon Phi,and graphics processing unit(GPU)can efficiently accelerate the simulation.GPU-accelerated THUBrachy is the fastest version,which is 200 times faster than the serial version and approximately 500 times faster than Geant4.The proposed tool shows great potential for fast and accurate dose calculations in clinical applications. 展开更多
关键词 High-dose-rate brachytherapy Monte Carlo Heterogeneous computing Hardware accelerators
在线阅读 下载PDF
Resource Scheduling Strategy for Performance Optimization Based on Heterogeneous CPU-GPU Platform 被引量:1
13
作者 Juan Fang Kuan Zhou +1 位作者 Mengyuan Zhang Wei Xiang 《Computers, Materials & Continua》 SCIE EI 2022年第10期1621-1635,共15页
In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences o... In recent years,with the development of processor architecture,heterogeneous processors including Center processing unit(CPU)and Graphics processing unit(GPU)have become the mainstream.However,due to the differences of heterogeneous core,the heterogeneous system is now facing many problems that need to be solved.In order to solve these problems,this paper try to focus on the utilization and efficiency of heterogeneous core and design some reasonable resource scheduling strategies.To improve the performance of the system,this paper proposes a combination strategy for a single task and a multi-task scheduling strategy for multiple tasks.The combination strategy consists of two sub-strategies,the first strategy improves the execution efficiency of tasks on the GPU by changing the thread organization structure.The second focuses on the working state of the efficient core and develops more reasonable workload balancing schemes to improve resource utilization of heterogeneous systems.The multi-task scheduling strategy obtains the execution efficiency of heterogeneous cores and global task information through the processing of task samples.Based on this information,an improved ant colony algorithm is used to quickly obtain a reasonable task allocation scheme,which fully utilizes the characteristics of heterogeneous cores.The experimental results show that the combination strategy reduces task execution time by 29.13%on average.In the case of processing multiple tasks,the multi-task scheduling strategy reduces the execution time by up to 23.38%based on the combined strategy.Both strategies can make better use of the resources of heterogeneous systems and significantly reduce the execution time of tasks on heterogeneous systems. 展开更多
关键词 Heterogeneous computing CPU-GPU PERFORMANCE Workload balance
在线阅读 下载PDF
Leaching from Heterogeneous Heck Catalysts:A Computational Approach
14
作者 Peter. M. Jenkins and Shik Chi Tsang Surface Science and Catalysis Research Centre, Department of Chemistry, University of Reading, Whiteknights, Reading, RG6 6AD, UK 《Chemical Research in Chinese Universities》 SCIE CAS CSCD 2002年第2期175-177,共3页
The possibility of carrying out a purely heterogeneous Heck reaction in practice without Pd leaching has been previously considered by a number of research groups but no general consent has yet arrived. Here, the reac... The possibility of carrying out a purely heterogeneous Heck reaction in practice without Pd leaching has been previously considered by a number of research groups but no general consent has yet arrived. Here, the reaction was, for the first time, evaluated by a simple computational approach. Modelling experiments were performed on one of the initial catalytic steps: phenyl halides attachment on Pd (111) to (100) and (111) to (111) ridges of a Pd crystal. Three surface structures of resulting were identified as possible reactive intermediates. Following potential energy minimisation calculations based on a universal force field, the relative stabilities of these surface species were then determined. Results showed the most stable species to be one in which a Pd ridge atom is removed from the Pd crystal structure, suggesting Pd leaching induced by phenyl halides is energetically favourable. 展开更多
关键词 Heterogeneous Heck reaction Aryl halides Computational Modelling Leaching of palladium
在线阅读 下载PDF
Smart data deduplication for telehealth systems in heterogeneous cloud computing
15
作者 GAI Keke QIU Meikang +1 位作者 SUN Xiaotong ZHAO Hui 《Journal of Communications and Information Networks》 2016年第4期93-104,共12页
The widespread application of heterogeneous cloud computing has enabled enormous advances in the real-time performance of telehealth systems.A cloud-based telehealth system allows healthcare users to obtain medical da... The widespread application of heterogeneous cloud computing has enabled enormous advances in the real-time performance of telehealth systems.A cloud-based telehealth system allows healthcare users to obtain medical data from various data sources supported by heterogeneous cloud providers.Employing data duplications in distributed cloud databases is an alternative approach for achieving data sharing among multiple data users.However,this approach results in additional storage space being used,even though reducing data duplications would lead to a decrease in data acquisitions and real-time performance.To address this issue,this paper focuses on developing a dynamic data deduplication method that uses an intelligent blocker to determine the working mode of data duplications for each data package in heterogeneous cloud-based telehealth systems.The proposed approach is named the SD2M(Smart Data Deduplication Model),in which the main algorithm applies dynamic programming to produce optimal solutions to minimizing the total cost of data usage.We implement experimental evaluations to examine the adaptability of the proposed approach. 展开更多
关键词 data deduplication TELEHEALTH heterogeneous cloud computing optimal solution dynamic programming
原文传递
Editorial for the special issue on heterogenous computing
16
作者 Shanjiang Tang Yusen Li 《CCF Transactions on High Performance Computing》 2024年第2期113-114,共2页
In the current era of AI and Big Data,an increasing and significant amount of computing power is needed for many applications and algorithms such as AIGC models,face detection,autonomous driving and atmosphere simulat... In the current era of AI and Big Data,an increasing and significant amount of computing power is needed for many applications and algorithms such as AIGC models,face detection,autonomous driving and atmosphere simulation.Recently,there is a significant amount of interest among the community in improving AI and big data applications with heterogenous computing,which refers to a computing system using different types of computing cores such as GPU,NPU,ASIC,DSP and FPGA.It can improve the performane and enery efficiency by dispatching different workloads to processors that are designed for specialized processing and specific purposes.This issue aims to cover challenges that can hamper efficiency and utilization for AI and big data applications on heterogenous computing systems,such as efficient utilization of the raw hardware,I/O management,task scheduling,etc. 展开更多
关键词 types computing cores heterogenous computingwhich big dataan aigc modelsface detectionautonomous driving computing system AI heterogenous computing big data
在线阅读 下载PDF
CPTF–a new heuristic based branch and bound algorithm for workflow scheduling in heterogeneous distributed computing systems
17
作者 D.Sirisha S.Sambhu Prasad 《CCF Transactions on High Performance Computing》 2024年第5期472-487,共16页
Computationally intensive applications embodied as workflows entail interdependent tasks that involve multifarious computation requirements and necessitate Heterogeneous Distributed Computing Systems(HDCS)to attain hi... Computationally intensive applications embodied as workflows entail interdependent tasks that involve multifarious computation requirements and necessitate Heterogeneous Distributed Computing Systems(HDCS)to attain high performance.The scheduling of workflows on HDCS was demonstrated as an NP-Complete problem.In the current work,a new heuristic based Branch and Bound(BnB)technique namely Critical Path_finish Time First(CPTF)algorithm is proposed for workflow scheduling on HDCS to achieve the best solutions.The primary merits of CPTF algorithm are due to the bounding functions that are tight and of less complexity.The sharp bounding functions could precisely estimate the promise of each state and aid in pruning infeasible states.Thus,the search space size is reduced.The CPTF algorithm explores the most promising states in the search space and converges to the solution quickly.Therefore,high performance is achieved.The experimental results on random and scientific workflows reveal that CPTF algorithm could effectively exploit high potency of BnB technique in realizing better quality solutions against the widely referred heuristic scheduling algorithms.The results on the benchmark workflows show that CPTF algorithm has improved schedules for 89.36%of the cases. 展开更多
关键词 Workflow scheduling Task scheduling HEURISTICS Heterogeneous distributed computing systems Branch and bound technique MAKESPAN
在线阅读 下载PDF
Time Predictable Modeling Method for GPU Architecture with SIMT and Cache Miss Awareness
18
作者 Shaojie Zhang 《Journal of Electronic Research and Application》 2024年第2期109-115,共7页
Graphics Processing Units(GPUs)are used to accelerate computing-intensive tasks,such as neural networks,data analysis,high-performance computing,etc.In the past decade or so,researchers have done a lot of work on GPU ... Graphics Processing Units(GPUs)are used to accelerate computing-intensive tasks,such as neural networks,data analysis,high-performance computing,etc.In the past decade or so,researchers have done a lot of work on GPU architecture and proposed a variety of theories and methods to study the microarchitectural characteristics of various GPUs.In this study,the GPU serves as a co-processor and works together with the CPU in an embedded real-time system to handle computationally intensive tasks.It models the architecture of the GPU and further considers it based on some excellent work.The SIMT mechanism and Cache-miss situation provide a more detailed analysis of the GPU architecture.In order to verify the GPU architecture model proposed in this article,10 GPU kernel_task and an Nvidia GPU device were used to perform experiments.The experimental results showed that the minimum error between the kernel task execution time predicted by the GPU architecture model proposed in this article and the actual measured kernel task execution time was 3.80%,and the maximum error was 8.30%. 展开更多
关键词 Heterogeneous computing GPU Architecture modeling Time predictability
在线阅读 下载PDF
MPEFT:a makespan minimizing heuristic scheduling algorithm for workflows in heterogeneous computing systems
19
作者 D.Sirisha S.Sambhu Prasad 《CCF Transactions on High Performance Computing》 2023年第4期374-389,共16页
Applications involving multifarious computational requirements take the advantage of the versatility of heterogeneous computing systems(HCS)with more than one type of parallelism.Efficient scheduling of workflow appli... Applications involving multifarious computational requirements take the advantage of the versatility of heterogeneous computing systems(HCS)with more than one type of parallelism.Efficient scheduling of workflow applications is paramount to harness high performance from HCS.In the present work,a new list-based heuristic strategy namely maximizing parallelism for minimizing earliest finish time(MPEFT)algorithm is proposed with a primary objective of minimizing the makespan.In order to minimize the makespan,the proposed scheduling policy focuses on proliferating the parallelism of the workflows by choosing the globally heaviest task with more number of successors such that more number of successors can be released.Thus,the priority policy maximizes the length of the ready queue by exploring higher degree of parallelism of the workflow.The proposed approach is designed to adapt depth-wise whenever the tasks at subsequent levels are released and continues to be level-wise otherwise.This increases the degree of parallelism and shortens the makespan.To evaluate the proposed scheduling algorithm,experimentations are conducted using randomly generated workflows and scientific workflows namely LIGO,Epigenomics,Cybershake,and Montage.The experimental results show that the proposed MPEFT algorithm surpassed the classical list based heuristic algorithms in terms of metrics viz.,makespan,speedup,efficiency and frequency of best results. 展开更多
关键词 Workflow scheduling Task scheduling HEURISTICS Heterogeneous computing systems
在线阅读 下载PDF
KANETAS:an elastic scheduler for heterogeneous many‑core systems
20
作者 Zhao Mao Xingjun Zhang Longxiang Wang 《CCF Transactions on High Performance Computing》 2025年第3期179-193,共15页
Efficient program execution on massively parallel clusters is critical for fields like scientific computing and artificial intelligence.However,traditional task scheduling algorithms do not fully leverage platform cha... Efficient program execution on massively parallel clusters is critical for fields like scientific computing and artificial intelligence.However,traditional task scheduling algorithms do not fully leverage platform characteristics,resulting in inefficiency and long task execution times.We propose KANETAS,a reinforcement learning-based DAG(Directed Acyclic Graph)elastic task scheduling algorithm,designed to adapt to DAG tasks of various scales and structures.Kolmogorov-Arnold Network(KAN)was applied to the DAG scheduling problem.It enhances the efficiency of heterogeneous hardware by using Graph Convolutional Networks(GCN)and Actor-Critic Algorithm(A2C),recognizing hardware features and assigning tasks to appropriate computing units.We have conducted extensive experiments to evaluate the proposed solution with four strong baseline algorithms,including the state-of-the-art heuristics method and a variety of deep reinforcement learning based algorithms.The experimental results suggest that KANETAS can reduce the average makespan of the best baseline algorithm by 13.1%at most.Furthermore,compared to the MLP version,the KAN version showed superior performance.The proposed model demonstrates a clear advantage in load balancing. 展开更多
关键词 Reinforcement learning Task scheduling algorithm Graph neural network Heterogeneous computing Kolmogorov-Arnold network
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部