RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现...RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现有优化方案难以良好地平衡运算精度和效率,要么侧重于低精度算法效率,要么通过牺牲效率实现高精度运算.本文利用RVV(RISC-V Vector instruction set extension,RVV)矢量扩展指令,设计并实现了一种基于无误差变换技术的高效、高精度求和与点积算法.首先避免使用规约指令以防止运算精度降低,实现并优化两类运算基于RVV的向量化算法;其次根据算法中的数据依赖关系,对寄存器配置参数进行优化.最后针对算法核心步骤进行汇编优化,增加指令级并行度,提高流水线利用率.实验结果表明,与两类运算操作的原始算法相比,优化后的算法运算效率分别提高了4.4和4.2倍.优化后的算法与多精度库MPFR中的四精度算法有相同精度,但其运算效率明显优于后者,其计算速度与OpenBLAS的双精度计算速度相当.展开更多
Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embe...Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embedded processor as much as possible,this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units,named RV16.The evaluation results show that,compared with the traditional 32-bit RISC-V processor with similar features,RV16 consumes fewer hardware resources and less power consumption.The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz,respectively,reaching 75%and 71%of traditional 32-bit processors,respectively.Moreover,a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.展开更多
文摘RISC-V作为一种新兴的开源精简指令集架构,是后摩尔时代处理器技术发展与创新的关键之一.浮点求和与点积运算是数值运算的基础组成部分,在众多领域应用广泛.目前RISC-V架构尚未适配兼具高精度和高效率的求和与点积运算算法,这是因为现有优化方案难以良好地平衡运算精度和效率,要么侧重于低精度算法效率,要么通过牺牲效率实现高精度运算.本文利用RVV(RISC-V Vector instruction set extension,RVV)矢量扩展指令,设计并实现了一种基于无误差变换技术的高效、高精度求和与点积算法.首先避免使用规约指令以防止运算精度降低,实现并优化两类运算基于RVV的向量化算法;其次根据算法中的数据依赖关系,对寄存器配置参数进行优化.最后针对算法核心步骤进行汇编优化,增加指令级并行度,提高流水线利用率.实验结果表明,与两类运算操作的原始算法相比,优化后的算法运算效率分别提高了4.4和4.2倍.优化后的算法与多精度库MPFR中的四精度算法有相同精度,但其运算效率明显优于后者,其计算速度与OpenBLAS的双精度计算速度相当.
基金the National Key Research and Development Project of China under Grant No.2021YFB0300300the National Natural Science Foundation of China under Grant Nos.62090023,61872374,61672526 and 62172430the Natural Science Foundation of Hunan Province of China under Grant No.2021JJ10052.
文摘Embedded and Internet of Things(IoT)devices have extremely strict requirements on the area and power consumption of the processor because of the limitation on its working environment.To reduce the overhead of the embedded processor as much as possible,this paper designs and implements a configurable 32-bit in-order RISC-V processor core based on the 16-bit data path and units,named RV16.The evaluation results show that,compared with the traditional 32-bit RISC-V processor with similar features,RV16 consumes fewer hardware resources and less power consumption.The maximum performance of RV16 running Dhrystone and CoreMark benchmarks is 0.92 DMIPS/MHz and 1.51 CoreMark/MHz,respectively,reaching 75%and 71%of traditional 32-bit processors,respectively.Moreover,a properly configured RV16 running program also consumes less energy than a traditional 32-bit processor.