RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现...RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现有优化方案难以良好地平衡运算精度和效率,要么侧重于低精度算法效率,要么通过牺牲效率实现高精度运算.本文利用RVV(RISC-V Vector instruction set extension,RVV)矢量扩展指令,设计并实现了一种基于无误差变换技术的高效、高精度求和与点积算法.首先避免使用规约指令以防止运算精度降低,实现并优化两类运算基于RVV的向量化算法;其次根据算法中的数据依赖关系,对寄存器配置参数进行优化.最后针对算法核心步骤进行汇编优化,增加指令级并行度,提高流水线利用率.实验结果表明,与两类运算操作的原始算法相比,优化后的算法运算效率分别提高了4.4和4.2倍.优化后的算法与多精度库MPFR中的四精度算法有相同精度,但其运算效率明显优于后者,其计算速度与OpenBLAS的双精度计算速度相当.展开更多
Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embe...Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embedded processor as much as possible,this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units,named RV16.The evaluation results show that,compared with the traditional 32-bit RISC-V processor with similar features,RV16 consumes fewer hardware resources and less power consumption.The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz,respectively,reaching 75%and 71%of traditional 32-bit processors,respectively.Moreover,a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.展开更多
近年来,以医疗为代表的领域对海量心电数据和人体动作数据的分类需求与日俱增。因此,该文基于XGBoost算法对Physionet Cinc 2017数据集的两种心电数据(窦性心律和房颤)和芯原IMU数据集的六种人体动作数据(行走、慢跑、静坐、挥手、深蹲...近年来,以医疗为代表的领域对海量心电数据和人体动作数据的分类需求与日俱增。因此,该文基于XGBoost算法对Physionet Cinc 2017数据集的两种心电数据(窦性心律和房颤)和芯原IMU数据集的六种人体动作数据(行走、慢跑、静坐、挥手、深蹲和开合跳)进行分类,并且在RISC-V硬件上实现对它们的分类。经过一系列的数据处理、使用贝叶斯优化算法对超参数进行优化及训练后,对于心电数据分类,模型预测房颤心电数据和窦性心律心电数据的准确率分别为86.5%和94%,整体预测准确率为93%;对于人体动作数据分类,模型预测静坐、挥手、行走、开合跳、慢跑和深蹲的准确率分别为93.7%、73.3%、93%、88.7%、83.2%和89.4%,整体预测准确率为87%。通过实验比较XGBoost模型与SVM(支持向量机)模型、GBDT(梯度提升树)模型的性能差别,证明了XG⁃Boost在心电和人体动作数据分类上的优越性。之后在基于蜂鸟v2 E203 RISC-V处理器核的Nuclei MCU200T开发板上实现了心电和人体动作数据的分类,实验表明,该硬件平台能够成功对心电和人体动作数据进行分类并输出相应结果。展开更多
文摘RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现有优化方案难以良好地平衡运算精度和效率,要么侧重于低精度算法效率,要么通过牺牲效率实现高精度运算.本文利用RVV(RISC-V Vector instruction set extension,RVV)矢量扩展指令,设计并实现了一种基于无误差变换技术的高效、高精度求和与点积算法.首先避免使用规约指令以防止运算精度降低,实现并优化两类运算基于RVV的向量化算法;其次根据算法中的数据依赖关系,对寄存器配置参数进行优化.最后针对算法核心步骤进行汇编优化,增加指令级并行度,提高流水线利用率.实验结果表明,与两类运算操作的原始算法相比,优化后的算法运算效率分别提高了4.4和4.2倍.优化后的算法与多精度库MPFR中的四精度算法有相同精度,但其运算效率明显优于后者,其计算速度与OpenBLAS的双精度计算速度相当.
基金the National Key Research and Development Project of China under Grant No.2021YFB0300300the National Natural Science Foundation of China under Grant Nos.62090023,61872374,61672526 and 62172430the Natural Science Foundation of Hunan Province of China under Grant No.2021JJ10052.
文摘Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embedded processor as much as possible,this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units,named RV16.The evaluation results show that,compared with the traditional 32-bit RISC-V processor with similar features,RV16 consumes fewer hardware resources and less power consumption.The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz,respectively,reaching 75%and 71%of traditional 32-bit processors,respectively.Moreover,a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.