The surge of large-scale models in recent years has led to breakthroughs in numerous fields,but it has also introduced higher computational costs and more complex network architectures.These increasingly large and int...The surge of large-scale models in recent years has led to breakthroughs in numerous fields,but it has also introduced higher computational costs and more complex network architectures.These increasingly large and intricate networks pose challenges for deployment and execution while also exacerbating the issue of network over-parameterization.To address this issue,various network compression techniques have been developed,such as network pruning.A typical pruning algorithm follows a three-step pipeline involving training,pruning,and retraining.Existing methods often directly set the pruned filters to zero during retraining,significantly reducing the parameter space.However,this direct pruning strategy frequently results in irreversible information loss.In the early stages of training,a network still contains much uncertainty,and evaluating filter importance may not be sufficiently rigorous.To manage the pruning process effectively,this paper proposes a flexible neural network pruning algorithm based on the logistic growth differential equation,considering the characteristics of network training.Unlike other pruning algorithms that directly reduce filter weights,this algorithm introduces a three-stage adaptive weight decay strategy inspired by the logistic growth differential equation.It employs a gentle decay rate in the initial training stage,a rapid decay rate during the intermediate stage,and a slower decay rate in the network convergence stage.Additionally,the decay rate is adjusted adaptively based on the filter weights at each stage.By controlling the adaptive decay rate at each stage,the pruning of neural network filters can be effectively managed.In experiments conducted on the CIFAR-10 and ILSVRC-2012 datasets,the pruning of neural networks significantly reduces the floating-point operations while maintaining the same pruning rate.Specifically,when implementing a 30%pruning rate on the ResNet-110 network,the pruned neural network not only decreases floating-point operations by 40.8%but also enhances the classification accuracy by 0.49%compared to the original network.展开更多
The dynamic routing mechanism in evolvable networks enables adaptive reconfiguration of topol-ogical structures and transmission pathways based on real-time task requirements and data character-istics.However,the heig...The dynamic routing mechanism in evolvable networks enables adaptive reconfiguration of topol-ogical structures and transmission pathways based on real-time task requirements and data character-istics.However,the heightened architectural complexity and expanded parameter dimensionality in evolvable networks present significant implementation challenges when deployed in resource-con-strained environments.Due to the critical paths ignored,traditional pruning strategies cannot get a desired trade-off between accuracy and efficiency.For this reason,a critical path retention pruning(CPRP)method is proposed.By deeply traversing the computational graph,the dependency rela-tionship among nodes is derived.Then the nodes are grouped and sorted according to their contribu-tion value.The redundant operations are removed as much as possible while ensuring that the criti-cal path is not affected.As a result,computational efficiency is improved while a higher accuracy is maintained.On the CIFAR benchmark,the experimental results demonstrate that CPRP-induced pruning incurs accuracy degradation below 4.00%,while outperforming traditional feature-agnostic grouping methods by an average 8.98%accuracy improvement.Simultaneously,the pruned model attains a 2.41 times inference acceleration while achieving 48.92%parameter compression and 53.40%floating-point operations(FLOPs)reduction.展开更多
Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs...Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs) that incorporate novel architectures to speedup packet processing, there is an increasing need for an efficient method to facilitate the study of their performance. In this paper, we present a tool called SimNP, which provides a flexible platform for the simulation of a network processing system in order to provide information for workload characterization, architecture development, and application implementation. The simulator models several architectural features that are commonly employed by NPs, including multiple processing engines (PEs), integrated network interface and memory controller, and hardware accelerators. ARM instruction set is emulated and a simple memory model is provided so that applications implemented in high level programming language such as C can be easily compiled into an executable binary using a common compiler like gcc. Moreover, new features or new modules can also be easily added into this simulator. Experiments have shown that our simulator provides abundant information for the study of network processing systems.展开更多
Parkinson’s disease is a serious disease that causes death.Recently,a new dataset has been introduced on this disease.The aim of this study is to improve the predictive performance of the model designed for Parkinson...Parkinson’s disease is a serious disease that causes death.Recently,a new dataset has been introduced on this disease.The aim of this study is to improve the predictive performance of the model designed for Parkinson’s disease diagnosis.By and large,original DNN models were designed by using specific or random number of neurons and layers.This study analyzed the effects of parameters,i.e.,neuron number and activation function on the model performance based on growing and pruning approach.In other words,this study addressed the optimum hidden layer and neuron numbers and ideal activation and optimization functions in order to find out the best Deep Neural Networks model.In this context of this study,several models were designed and evaluated.The overall results revealed that the Deep Neural Networks were significantly successful with 99.34%accuracy value on test data.Also,it presents the highest prediction performance reported so far.Therefore,this study presents a model promising with respect to more accurate Parkinson’s disease diagnosis.展开更多
Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not...Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not only avoids time-consuming defect and low pruning efficiency in OBS process, but also keeps higher generalization and pruning accuracy than Levenberg-Marquardt method.展开更多
Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning st...Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning structure design algorithm for DSCNs based on mutual information and relevance.During the training process,the mutual information algorithm is used to calculate and sort the importance scores of the nodes in each hidden layer in a layer-by-layer manner,the node pruning rate of each layer is set according to the depth of the DSCN at the current time,the nodes that contribute little to the model are deleted,and the network-related parameters are updated.When the model completes the configuration procedure,the correlation evaluation strategy is used to sort the global connection weights and delete insignificance connections;then,the network parameters are updated after pruning is completed.The experimental results show that the proposed structure design method can effectively compress the scale of a DSCN model and improve its modeling speed;the model accuracy loss is small,and fine-tuning for accuracy restoration is not needed.The obtained DSCN model has certain application value in the field of regression analysis.展开更多
Filter pruning effectively compresses the neural network by reducing both its parameters and computational cost.Existing pruning methods typically rely on pre-designed pruning criteria to measure filter importance and...Filter pruning effectively compresses the neural network by reducing both its parameters and computational cost.Existing pruning methods typically rely on pre-designed pruning criteria to measure filter importance and remove those deemed unimportant.However,different layers of the neural network exhibit varying filter distributions,making it inappropriate to implement the same pruning criterion for all layers.Additionally,some approaches apply different criteria from the set of pre-defined pruning rules for different layers,but the limited space leads to the difficulty of covering all layers.If criteria for all layers are manually designed,it is costly and difficult to generalize to other networks.To solve this problem,we present a novel neural network pruning method based on the Criterion Learner and Attention Distillation(CLAD).Specifically,CLAD develops a differentiable criterion learner,which is integrated into each layer of the network.The learner can automatically learn the appropriate pruning criterion according to the filter parameters of each layer,thus the requirement of manual design is eliminated.Furthermore,the criterion learner is trained end-to-end by the gradient optimization algorithm to achieve efficient pruning.In addition,attention distillation,which fully utilizes the knowledge of unpruned networks to guide the optimization of the learner and improve the pruned network performance,is introduced in the process of learner optimization.Experiments conducted on various datasets and networks demonstrate the effectiveness of the proposed method.Notably,CLAD reduces the FLOPs of Res Net-110 by about 53%on the CIFAR-10 dataset,while simultaneously improves the network's accuracy by 0.05%.Moreover,it reduces the FLOPs of Res Net-50 by about 46%on the Image Net-1K dataset,and maintains a top-1 accuracy of 75.45%.展开更多
5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核...5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核、高速缓存、加速器等异质处理资源,提供面向多样化应用场景的敏捷可定制能力.然而,随着摩尔定律和登纳德缩放定律失效问题的逐渐凸显,单片NP芯片研制在研发周期、成本、创新迭代等方面面临巨大挑战,越来越难以为继.针对上述问题,提出新型敏捷可定制NP架构ChipletNP,基于芯粒化(Chiplet)技术解耦异质资源,在充分利用成熟芯片产品及工艺的基础上,通过多个芯粒组合,满足不同应用场景下NP的快速定制和演化发展需求.基于ChipletNP设计实现了一款集成商用CPU、FPGA(field programmable gate array)和自研敏捷交换芯粒的银河衡芯敏捷NP芯片(YHHX-NP).基于该芯片的应用部署与实验结果表明,ChipletNP可支持NP的快速敏捷定制,能够有效承载SRv6(segment routing over IPv6)等新型网络协议与网络功能部署.其中,核心的敏捷交换芯粒相较于同级商用芯片能效比提升2倍以上,延迟控制在2.82μs以内,可以有效支持面向NP的Chiplet统一通信与集成.展开更多
电阻式随机存取存储器(Resistive Random Access Memory,RRAM)因具备存内计算能力,被认为是高效的神经网络加速器。剪枝技术通过去除冗余权重可有效压缩模型,从而节省基于RRAM的神经网络加速器的硬件资源。现有的针对RRAM的结构化剪枝...电阻式随机存取存储器(Resistive Random Access Memory,RRAM)因具备存内计算能力,被认为是高效的神经网络加速器。剪枝技术通过去除冗余权重可有效压缩模型,从而节省基于RRAM的神经网络加速器的硬件资源。现有的针对RRAM的结构化剪枝方法因其过粗的剪枝粒度易导致精度下降,且普遍忽视了权重之间的数值规律,导致这类潜在冗余未能被利用,难以在保证精度的同时进一步提升模型压缩率与硬件效率。为此,本文提出一种基于权重重构的忆阻神经网络剪枝方法,使用基于整数缩放的权重重构策略提取并共享权重中的数值共性,同时舍弃对精度影响较小的数值部分,仅映射权重关键信息至RRAM交叉阵列进行网络推理,实现权重的压缩表示。随后,使用渐进式重训练机制,将被舍弃的信息作为引导信号逐步衰减引入,从而在保持模型压缩率和硬件效率的同时有效恢复模型精度。实验结果表明,与现有方法相比,本文方法在模型压缩率、面积效率与能效方面实现了最多1.2倍、1.2倍与1.3倍的提升,且几乎不损失模型精度。展开更多
基金supported by the National Natural Science Foundation of China under Grant No.62172132.
文摘The surge of large-scale models in recent years has led to breakthroughs in numerous fields,but it has also introduced higher computational costs and more complex network architectures.These increasingly large and intricate networks pose challenges for deployment and execution while also exacerbating the issue of network over-parameterization.To address this issue,various network compression techniques have been developed,such as network pruning.A typical pruning algorithm follows a three-step pipeline involving training,pruning,and retraining.Existing methods often directly set the pruned filters to zero during retraining,significantly reducing the parameter space.However,this direct pruning strategy frequently results in irreversible information loss.In the early stages of training,a network still contains much uncertainty,and evaluating filter importance may not be sufficiently rigorous.To manage the pruning process effectively,this paper proposes a flexible neural network pruning algorithm based on the logistic growth differential equation,considering the characteristics of network training.Unlike other pruning algorithms that directly reduce filter weights,this algorithm introduces a three-stage adaptive weight decay strategy inspired by the logistic growth differential equation.It employs a gentle decay rate in the initial training stage,a rapid decay rate during the intermediate stage,and a slower decay rate in the network convergence stage.Additionally,the decay rate is adjusted adaptively based on the filter weights at each stage.By controlling the adaptive decay rate at each stage,the pruning of neural network filters can be effectively managed.In experiments conducted on the CIFAR-10 and ILSVRC-2012 datasets,the pruning of neural networks significantly reduces the floating-point operations while maintaining the same pruning rate.Specifically,when implementing a 30%pruning rate on the ResNet-110 network,the pruned neural network not only decreases floating-point operations by 40.8%but also enhances the classification accuracy by 0.49%compared to the original network.
基金Supported by the National Key Research and Development Program of China(No.2022ZD0119003)and the National Natural Science Founda-tion of China(No.61834005).
文摘The dynamic routing mechanism in evolvable networks enables adaptive reconfiguration of topol-ogical structures and transmission pathways based on real-time task requirements and data character-istics.However,the heightened architectural complexity and expanded parameter dimensionality in evolvable networks present significant implementation challenges when deployed in resource-con-strained environments.Due to the critical paths ignored,traditional pruning strategies cannot get a desired trade-off between accuracy and efficiency.For this reason,a critical path retention pruning(CPRP)method is proposed.By deeply traversing the computational graph,the dependency rela-tionship among nodes is derived.Then the nodes are grouped and sorted according to their contribu-tion value.The redundant operations are removed as much as possible while ensuring that the criti-cal path is not affected.As a result,computational efficiency is improved while a higher accuracy is maintained.On the CIFAR benchmark,the experimental results demonstrate that CPRP-induced pruning incurs accuracy degradation below 4.00%,while outperforming traditional feature-agnostic grouping methods by an average 8.98%accuracy improvement.Simultaneously,the pruned model attains a 2.41 times inference acceleration while achieving 48.92%parameter compression and 53.40%floating-point operations(FLOPs)reduction.
文摘Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs) that incorporate novel architectures to speedup packet processing, there is an increasing need for an efficient method to facilitate the study of their performance. In this paper, we present a tool called SimNP, which provides a flexible platform for the simulation of a network processing system in order to provide information for workload characterization, architecture development, and application implementation. The simulator models several architectural features that are commonly employed by NPs, including multiple processing engines (PEs), integrated network interface and memory controller, and hardware accelerators. ARM instruction set is emulated and a simple memory model is provided so that applications implemented in high level programming language such as C can be easily compiled into an executable binary using a common compiler like gcc. Moreover, new features or new modules can also be easily added into this simulator. Experiments have shown that our simulator provides abundant information for the study of network processing systems.
文摘Parkinson’s disease is a serious disease that causes death.Recently,a new dataset has been introduced on this disease.The aim of this study is to improve the predictive performance of the model designed for Parkinson’s disease diagnosis.By and large,original DNN models were designed by using specific or random number of neurons and layers.This study analyzed the effects of parameters,i.e.,neuron number and activation function on the model performance based on growing and pruning approach.In other words,this study addressed the optimum hidden layer and neuron numbers and ideal activation and optimization functions in order to find out the best Deep Neural Networks model.In this context of this study,several models were designed and evaluated.The overall results revealed that the Deep Neural Networks were significantly successful with 99.34%accuracy value on test data.Also,it presents the highest prediction performance reported so far.Therefore,this study presents a model promising with respect to more accurate Parkinson’s disease diagnosis.
文摘Aimed at the great computing complexity of optimal brain surgeon (OBS) process, a pruning algorithm with penalty OBS process is presented. Compared with sensitive and regularized methods, the penalty OBS algorithm not only avoids time-consuming defect and low pruning efficiency in OBS process, but also keeps higher generalization and pruning accuracy than Levenberg-Marquardt method.
基金supported by the National Natural Science Foundation of China(62073006)the Beijing Natural Science Foundation of China(4212032)
文摘Deep stochastic configuration networks(DSCNs)produce redundant hidden nodes and connections during training,which complicates their model structures.Aiming at the above problems,this paper proposes a double pruning structure design algorithm for DSCNs based on mutual information and relevance.During the training process,the mutual information algorithm is used to calculate and sort the importance scores of the nodes in each hidden layer in a layer-by-layer manner,the node pruning rate of each layer is set according to the depth of the DSCN at the current time,the nodes that contribute little to the model are deleted,and the network-related parameters are updated.When the model completes the configuration procedure,the correlation evaluation strategy is used to sort the global connection weights and delete insignificance connections;then,the network parameters are updated after pruning is completed.The experimental results show that the proposed structure design method can effectively compress the scale of a DSCN model and improve its modeling speed;the model accuracy loss is small,and fine-tuning for accuracy restoration is not needed.The obtained DSCN model has certain application value in the field of regression analysis.
基金supported in part by the National Natural Science Foundation of China under grants 62073085,61973330 and 62350055in part by the Shenzhen Science and Technology Program,China under grant JCYJ20230807093513027in part by the Fundamental Research Funds for the Central Universities,China under grant 1243300008。
文摘Filter pruning effectively compresses the neural network by reducing both its parameters and computational cost.Existing pruning methods typically rely on pre-designed pruning criteria to measure filter importance and remove those deemed unimportant.However,different layers of the neural network exhibit varying filter distributions,making it inappropriate to implement the same pruning criterion for all layers.Additionally,some approaches apply different criteria from the set of pre-defined pruning rules for different layers,but the limited space leads to the difficulty of covering all layers.If criteria for all layers are manually designed,it is costly and difficult to generalize to other networks.To solve this problem,we present a novel neural network pruning method based on the Criterion Learner and Attention Distillation(CLAD).Specifically,CLAD develops a differentiable criterion learner,which is integrated into each layer of the network.The learner can automatically learn the appropriate pruning criterion according to the filter parameters of each layer,thus the requirement of manual design is eliminated.Furthermore,the criterion learner is trained end-to-end by the gradient optimization algorithm to achieve efficient pruning.In addition,attention distillation,which fully utilizes the knowledge of unpruned networks to guide the optimization of the learner and improve the pruned network performance,is introduced in the process of learner optimization.Experiments conducted on various datasets and networks demonstrate the effectiveness of the proposed method.Notably,CLAD reduces the FLOPs of Res Net-110 by about 53%on the CIFAR-10 dataset,while simultaneously improves the network's accuracy by 0.05%.Moreover,it reduces the FLOPs of Res Net-50 by about 46%on the Image Net-1K dataset,and maintains a top-1 accuracy of 75.45%.
文摘5G,8K视频等新业务类型不断涌现,使得网络处理器(network processor,NP)的应用场景日趋复杂多样.为满足多样化网络应用在性能、灵活性以及服务质量保证等方面的差异化需求,传统NP试图在片上系统(system on chip,SoC)上集成大量处理器核、高速缓存、加速器等异质处理资源,提供面向多样化应用场景的敏捷可定制能力.然而,随着摩尔定律和登纳德缩放定律失效问题的逐渐凸显,单片NP芯片研制在研发周期、成本、创新迭代等方面面临巨大挑战,越来越难以为继.针对上述问题,提出新型敏捷可定制NP架构ChipletNP,基于芯粒化(Chiplet)技术解耦异质资源,在充分利用成熟芯片产品及工艺的基础上,通过多个芯粒组合,满足不同应用场景下NP的快速定制和演化发展需求.基于ChipletNP设计实现了一款集成商用CPU、FPGA(field programmable gate array)和自研敏捷交换芯粒的银河衡芯敏捷NP芯片(YHHX-NP).基于该芯片的应用部署与实验结果表明,ChipletNP可支持NP的快速敏捷定制,能够有效承载SRv6(segment routing over IPv6)等新型网络协议与网络功能部署.其中,核心的敏捷交换芯粒相较于同级商用芯片能效比提升2倍以上,延迟控制在2.82μs以内,可以有效支持面向NP的Chiplet统一通信与集成.
文摘电阻式随机存取存储器(Resistive Random Access Memory,RRAM)因具备存内计算能力,被认为是高效的神经网络加速器。剪枝技术通过去除冗余权重可有效压缩模型,从而节省基于RRAM的神经网络加速器的硬件资源。现有的针对RRAM的结构化剪枝方法因其过粗的剪枝粒度易导致精度下降,且普遍忽视了权重之间的数值规律,导致这类潜在冗余未能被利用,难以在保证精度的同时进一步提升模型压缩率与硬件效率。为此,本文提出一种基于权重重构的忆阻神经网络剪枝方法,使用基于整数缩放的权重重构策略提取并共享权重中的数值共性,同时舍弃对精度影响较小的数值部分,仅映射权重关键信息至RRAM交叉阵列进行网络推理,实现权重的压缩表示。随后,使用渐进式重训练机制,将被舍弃的信息作为引导信号逐步衰减引入,从而在保持模型压缩率和硬件效率的同时有效恢复模型精度。实验结果表明,与现有方法相比,本文方法在模型压缩率、面积效率与能效方面实现了最多1.2倍、1.2倍与1.3倍的提升,且几乎不损失模型精度。