Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies c...Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies can not meet the satisfaction of modern tasks gradually.Moreover,devices stay idle in the scenario of edge computing(EC),which presents a waste of resources since they can share the pressure of the busy devices but they do not.To address the problem,the strategy leveraging distributed processing has been applied to load computation tasks from a single processor to a group of devices,which results in the acceleration of training or inference of DNN models and promotes the high utilization of devices in edge computing.Compared with existing papers,this paper presents an enlightening and novel review of applying distributed processing with data and model parallelism to improve deep learning tasks in edge computing.Considering the practicalities,commonly used lightweight models in a distributed system are introduced as well.As the key technique,the parallel strategy will be described in detail.Then some typical applications of distributed processing will be analyzed.Finally,the challenges of distributed processing with edge computing will be described.展开更多
Efficiently executing inference tasks of deep neural networks on devices with limited resources poses a significant load in IoT systems.To alleviate the load,one innovative method is branching that adds extra layers w...Efficiently executing inference tasks of deep neural networks on devices with limited resources poses a significant load in IoT systems.To alleviate the load,one innovative method is branching that adds extra layers with classification exits to a pre-trained model,enabling inputs with high-confidence predictions to exit early,thus reducing inference cost.However,branching networks,not originally tailored for IoT environments,are susceptible to noisy and out-of-distribution(OOD)data,and they demand additional training for optimal performance.The authors introduce BrevisNet,a novel branching methodology designed for creating on-device branching models that are both resourceadaptive and noise-robust for IoT applications.The method leverages the refined uncertainty estimation capabilities of Dirichlet distributions for classification predictions,combined with the superior OOD detection of energy-based models.The authors propose a unique training approach and thresholding technique that enhances the precision of branch predictions,offering robustness against noise and OOD inputs.The findings demonstrate that BrevisNet surpasses existing branching techniques in training efficiency,accuracy,overall performance,and robustness.展开更多
针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分...针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分DNN模型的计算任务,并结合智能映射策略优化NoC架构中的任务分配与数据流管理。实验结果表明,与传统方法相比,该算法在计算吞吐量、NoC传输时延、外部内存访问次数和计算能效等方面均显著提升,尤其在复杂模型上表现突出。展开更多
基金supported by the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production,Construction Corps under Grant No.2020DB005the National Natural Science Foundation of China under Grant Nos.61872219,62002276 and 62177014。
文摘Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also booming.As a result,the intolerable long time for models’training or inference with conventional strategies can not meet the satisfaction of modern tasks gradually.Moreover,devices stay idle in the scenario of edge computing(EC),which presents a waste of resources since they can share the pressure of the busy devices but they do not.To address the problem,the strategy leveraging distributed processing has been applied to load computation tasks from a single processor to a group of devices,which results in the acceleration of training or inference of DNN models and promotes the high utilization of devices in edge computing.Compared with existing papers,this paper presents an enlightening and novel review of applying distributed processing with data and model parallelism to improve deep learning tasks in edge computing.Considering the practicalities,commonly used lightweight models in a distributed system are introduced as well.As the key technique,the parallel strategy will be described in detail.Then some typical applications of distributed processing will be analyzed.Finally,the challenges of distributed processing with edge computing will be described.
基金Australian Research Council,Grant/Award Numbers:DE200101465,DP240101108。
文摘Efficiently executing inference tasks of deep neural networks on devices with limited resources poses a significant load in IoT systems.To alleviate the load,one innovative method is branching that adds extra layers with classification exits to a pre-trained model,enabling inputs with high-confidence predictions to exit early,thus reducing inference cost.However,branching networks,not originally tailored for IoT environments,are susceptible to noisy and out-of-distribution(OOD)data,and they demand additional training for optimal performance.The authors introduce BrevisNet,a novel branching methodology designed for creating on-device branching models that are both resourceadaptive and noise-robust for IoT applications.The method leverages the refined uncertainty estimation capabilities of Dirichlet distributions for classification predictions,combined with the superior OOD detection of energy-based models.The authors propose a unique training approach and thresholding technique that enhances the precision of branch predictions,offering robustness against noise and OOD inputs.The findings demonstrate that BrevisNet surpasses existing branching techniques in training efficiency,accuracy,overall performance,and robustness.
文摘针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分DNN模型的计算任务,并结合智能映射策略优化NoC架构中的任务分配与数据流管理。实验结果表明,与传统方法相比,该算法在计算吞吐量、NoC传输时延、外部内存访问次数和计算能效等方面均显著提升,尤其在复杂模型上表现突出。