摘要
提出了一种新型的线性脉动阵列结构用来实现基于Montgomery算法的并行模乘运算,对于n位模乘运算,需要2n+11个时钟周期完成,为了减少每一周期内的运算量,在处理单元内部实现了三级流水线结构,使得每一周期的串行运算量仅为一级全加器,同时,由于处理单元间只有局部互连,连线延迟很小,于是这种新结构脉动阵列模乘器能在很高的频率下工作。另一个方面,每个处理单元结构简单,仅由4个全加器和14个触发器构成,对于n位模乘运算,总的规模约为46n+184个门。所以,它在速度和面积上都是优化的,适于VLSI的实现。作为核心运算部件,能有效地用于如RSA等许多公钥密码体制的加解密运算。对于0.8μmCMOS工艺,200MHz时钟是完全可行的,在仅使用一个模乘器条件下,512位模幂乘加解密运算速度能达到129kbit/s。
A novel systolic linear array modular multiplier is presented which ideally performs the parallel modular multiplication based on the algorithm of Montgomery. The total execution time for an n bit modular multiplication is 2n+11 clock cycles. To further increase the throughput the three stage pipeline architecture is adopted inside the processing element, so that every one bit result outputs at one clock cycle when the pipeline is filled. Each pipeline stage only contains the operation of an one bit full adder. Moreover, with the purely nearest neighbor communication, the interconnect delay is also very short. Therefore it can work at a high clock frequency. On the other hand, every processing element is simple, mainly consisting of four full adders and fourteen flip flops. For n bit modular multiplication, the cost of the hardware is 46 n +184 gates. So this novel linear systolic array for modular multiplication is a speed and area optimized system, suitable for the VLSI implementation. It can be used for modular exponentiation which is a kernel operation in many public key cryptosystems such as RSA. With clock frequency of 200MHz by using 0.8μm CMOS technology, the throughput can reach 129kb/s with a single modular multiplier chip.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
1998年第3期11-15,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学重点基金
关键词
脉动阵列
模乘运算
模幂乘运算
公钥密码体制
systolic array
modular multiplication
modular exponentiation
pipeline architecture
public key cryptosystem