Read-write dependency is an important factor restricting software efficiency.Timing Speculative(TS)is a processing architecture aiming to improve energy efficiency of microprocessors.Timing error rate,influenced by th...Read-write dependency is an important factor restricting software efficiency.Timing Speculative(TS)is a processing architecture aiming to improve energy efficiency of microprocessors.Timing error rate,influenced by the read-write dependency,bottlenecks the voltage down-scaling and so the energy efficiency of TS processors.We proposed a method called Read-Write Dependency Aware Register Allocation.It is based on the Read-Write Dependency aware Interference Graph(RWDIG)conception.Registers are reallocated to loosen the read-write dependencies,so resulting in a reduction of timing errors.The traditional no operation(Nop)padding method is also redesigned to increase the distance value to above 2.We analyzed the dependencies of registers and maximized the average distance value of read and write dependencies.Experimental results showed that we can reduce all read-write dependency by Nop padding,as well as the overhead timing errors.An energy saving of approximately 7%was achieved.展开更多
现有目标检测模型在边缘设备上部署时,其检测性能和推理速度的平衡有较大提升空间。针对此问题,本文基于YOLO(you can only look once)v8提出一种可部署到多类边缘设备上的目标检测模型。在模型的骨干网络部分,设计了EC2f(extended coar...现有目标检测模型在边缘设备上部署时,其检测性能和推理速度的平衡有较大提升空间。针对此问题,本文基于YOLO(you can only look once)v8提出一种可部署到多类边缘设备上的目标检测模型。在模型的骨干网络部分,设计了EC2f(extended coarse-to-fine)结构,在降低参数量和计算复杂度的同时降低数据读写量;在颈部网络部分,将颈部网络替换为YOLOv6-3.0版本的颈部网络,加速了模型推理,并将推理精度维持在较好水平;预测头网络部分设计了多尺度卷积检测头,进一步降低了模型的计算复杂度和参数度。设计了两个版本(n/s尺度)以适应不同的边缘设备。在X光数据集的实验表明,模型在推理精度上比同尺度的基准模型分别提升0.5/1.7百分点,推理速度上分别提升11.6%/11.2%。在其他数据集上的泛化性能测试表明,模型的推理速度提升了10%以上,精度降低控制在1.3%以内。实验证明,模型在推理精度和速度之间实现了良好的平衡。展开更多
基金This work was supported by the Project of Hunan Social Science Achievement Evaluation Committee(XSP20YBZ090,Sheng Xiao,2020).
文摘Read-write dependency is an important factor restricting software efficiency.Timing Speculative(TS)is a processing architecture aiming to improve energy efficiency of microprocessors.Timing error rate,influenced by the read-write dependency,bottlenecks the voltage down-scaling and so the energy efficiency of TS processors.We proposed a method called Read-Write Dependency Aware Register Allocation.It is based on the Read-Write Dependency aware Interference Graph(RWDIG)conception.Registers are reallocated to loosen the read-write dependencies,so resulting in a reduction of timing errors.The traditional no operation(Nop)padding method is also redesigned to increase the distance value to above 2.We analyzed the dependencies of registers and maximized the average distance value of read and write dependencies.Experimental results showed that we can reduce all read-write dependency by Nop padding,as well as the overhead timing errors.An energy saving of approximately 7%was achieved.
文摘现有目标检测模型在边缘设备上部署时,其检测性能和推理速度的平衡有较大提升空间。针对此问题,本文基于YOLO(you can only look once)v8提出一种可部署到多类边缘设备上的目标检测模型。在模型的骨干网络部分,设计了EC2f(extended coarse-to-fine)结构,在降低参数量和计算复杂度的同时降低数据读写量;在颈部网络部分,将颈部网络替换为YOLOv6-3.0版本的颈部网络,加速了模型推理,并将推理精度维持在较好水平;预测头网络部分设计了多尺度卷积检测头,进一步降低了模型的计算复杂度和参数度。设计了两个版本(n/s尺度)以适应不同的边缘设备。在X光数据集的实验表明,模型在推理精度上比同尺度的基准模型分别提升0.5/1.7百分点,推理速度上分别提升11.6%/11.2%。在其他数据集上的泛化性能测试表明,模型的推理速度提升了10%以上,精度降低控制在1.3%以内。实验证明,模型在推理精度和速度之间实现了良好的平衡。