The migration of tasks aided by machine learning(ML)predictions IN(DPM)is a system-level design technique that is used to reduce energy by enhancing the overall performance of the processor.In this paper,we address th...The migration of tasks aided by machine learning(ML)predictions IN(DPM)is a system-level design technique that is used to reduce energy by enhancing the overall performance of the processor.In this paper,we address the issue of system-level higher task dissipation during the execution of parallel workloads with common deadlines by introducing a machine learning-based framework that includes task migration using energy-efficient earliest deadline first scheduling(EA-EDF).ML-based EA-EDF enhances the overall throughput and optimizes the energy to avoid delay and performance degradation in a multiprocessor system.The proposed system model allocates processors to the ready task set in such a way that their deadlines are guaranteed.A full task migration policy is also integrated to ensure proper task mapping that ensures inter-process linkage among the arrived tasks with the same deadlines.The execution of a task can halt on one CPU and reschedule the execution on a different processor to avoid delay and ensure meeting the deadline.Our approach shows promising potential for machine-learning-based schedulability analysis enables a comparison between different ML models and shows a promising reduction in energy as compared with other ML-aware task migration techniques for SoC like Multi-Layer Feed-Forward Neural Networks(MLFNN)based on convolutional neural network(CNN),Random Forest(RF)and Deep learning(DL)algorithm.The Simulations are conducted using super pipelined microarchitecture of advanced micro devices(AMD)XScale PXA270 using instruction and data cache per core 32 Kbyte I-cache and 32 Kbyte D-cache on various utilization factors(u_(i))12%,31%and 50%.The proposed approach consumes 5.3%less energy when almost half of the CPU is running and on a lower workload consumes 1.04%less energy.The proposed design accumulatively gives significant improvements by reducing the energy dissipation on three clock rates by 4.41%,on 624 MHz by 5.4%and 5.9%on applications operating on 416 and 312 MHz standard operating frequencies.展开更多
An asynchronous wrapper with novel handshake circuits for data communication in globally asynchronous locally synchronous (GALS) systems is proposed. The handshake circuits include two communication ports and a loca...An asynchronous wrapper with novel handshake circuits for data communication in globally asynchronous locally synchronous (GALS) systems is proposed. The handshake circuits include two communication ports and a local clock generator. Two approaches for the implementation of communication ports are presented, one with pure standard cells and the others with Mttller-C elements. The detailed design methodology for GALS systems is given and the circuits are validated with VHDL and circuits simulation in standard CMOS technology.展开更多
Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently i...Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently in order to satisity their performance constraints. Exploring all the possible mappings, i.e., tasks to resources combinations exhaustively may take days or weeks. Additionally, the exploration is performed at design-time, which cannot handle dynamism in applications and resources' status. A runtime mapping technique can cater for the dynamism but cannot guarantee for strict timing deadlines due to large computations involved at run-time. Thus, an approach performing feasible compute intensive exploration at design-time and using the explored results at run-time is required. This paper presents a solution in the same direction. Communicationaware design space exploration (CADSE) techniques have been proposed to explore different mapping options to be selected at run-time subject to desired performance and available MPSoC resources. Experiments show that the proposed techniques for exploration are faster over an exhaustive exploration and provides almost the same quality of results.展开更多
文摘The migration of tasks aided by machine learning(ML)predictions IN(DPM)is a system-level design technique that is used to reduce energy by enhancing the overall performance of the processor.In this paper,we address the issue of system-level higher task dissipation during the execution of parallel workloads with common deadlines by introducing a machine learning-based framework that includes task migration using energy-efficient earliest deadline first scheduling(EA-EDF).ML-based EA-EDF enhances the overall throughput and optimizes the energy to avoid delay and performance degradation in a multiprocessor system.The proposed system model allocates processors to the ready task set in such a way that their deadlines are guaranteed.A full task migration policy is also integrated to ensure proper task mapping that ensures inter-process linkage among the arrived tasks with the same deadlines.The execution of a task can halt on one CPU and reschedule the execution on a different processor to avoid delay and ensure meeting the deadline.Our approach shows promising potential for machine-learning-based schedulability analysis enables a comparison between different ML models and shows a promising reduction in energy as compared with other ML-aware task migration techniques for SoC like Multi-Layer Feed-Forward Neural Networks(MLFNN)based on convolutional neural network(CNN),Random Forest(RF)and Deep learning(DL)algorithm.The Simulations are conducted using super pipelined microarchitecture of advanced micro devices(AMD)XScale PXA270 using instruction and data cache per core 32 Kbyte I-cache and 32 Kbyte D-cache on various utilization factors(u_(i))12%,31%and 50%.The proposed approach consumes 5.3%less energy when almost half of the CPU is running and on a lower workload consumes 1.04%less energy.The proposed design accumulatively gives significant improvements by reducing the energy dissipation on three clock rates by 4.41%,on 624 MHz by 5.4%and 5.9%on applications operating on 416 and 312 MHz standard operating frequencies.
文摘An asynchronous wrapper with novel handshake circuits for data communication in globally asynchronous locally synchronous (GALS) systems is proposed. The handshake circuits include two communication ports and a local clock generator. Two approaches for the implementation of communication ports are presented, one with pure standard cells and the others with Mttller-C elements. The detailed design methodology for GALS systems is given and the circuits are validated with VHDL and circuits simulation in standard CMOS technology.
基金The authors would like to thank the reviewers for their feedback and suggestions. We also wish to mention that this work is partly supported by Singapore Ministry of Education Academic Research Fund Tier 1 (R-263-000-655-133) and National Natural Science Foundation of China (NSFC) (Grant No. 61173032).
文摘Real-time multi-media applications are increasingly mapped on modern embedded systems based on multiprocessor systems-on-chip (MPSoC). Tasks of the applications need to be mapped on the MPSoC resources efficiently in order to satisity their performance constraints. Exploring all the possible mappings, i.e., tasks to resources combinations exhaustively may take days or weeks. Additionally, the exploration is performed at design-time, which cannot handle dynamism in applications and resources' status. A runtime mapping technique can cater for the dynamism but cannot guarantee for strict timing deadlines due to large computations involved at run-time. Thus, an approach performing feasible compute intensive exploration at design-time and using the explored results at run-time is required. This paper presents a solution in the same direction. Communicationaware design space exploration (CADSE) techniques have been proposed to explore different mapping options to be selected at run-time subject to desired performance and available MPSoC resources. Experiments show that the proposed techniques for exploration are faster over an exhaustive exploration and provides almost the same quality of results.