Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined...Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.展开更多
An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the...An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs.展开更多
As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they ...As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.展开更多
VLIW(Very Long Instruction Word)指令因为含有较多的空操作导致严重的代码体积膨胀问题,代码压缩是解决这一问题的有效措施.VLIW代码压缩需要解决三个关键问题,一是提高压缩率;二是降低解压操作对性能的影响;三是分支目标重定位.针对...VLIW(Very Long Instruction Word)指令因为含有较多的空操作导致严重的代码体积膨胀问题,代码压缩是解决这一问题的有效措施.VLIW代码压缩需要解决三个关键问题,一是提高压缩率;二是降低解压操作对性能的影响;三是分支目标重定位.针对流体系结构上的VLIW指令特点,提出了二维压缩,对VLIW进行垂直与水平两个方向上的压缩,且水平解压可以与代码执行并行,并通过设置堆栈寄存器缓存循环入口地址.实验结果表明二维压缩有效解决了VLIW代码体积膨胀问题,可以使指令存储器的面积减少36.48%,并使得整个CISP系统面积减少了7.85%.展开更多
密码专用处理器常采用分簇式超长指令字(Very Long Instruction Word,VLIW)架构,其性能的发挥依赖于编译器的实现.当前对于通用VLIW架构的编译后端优化方案,在密码专用处理器上都有一定的不适应性.为此,本文提出了一种面向密码专用处理...密码专用处理器常采用分簇式超长指令字(Very Long Instruction Word,VLIW)架构,其性能的发挥依赖于编译器的实现.当前对于通用VLIW架构的编译后端优化方案,在密码专用处理器上都有一定的不适应性.为此,本文提出了一种面向密码专用处理器的、同时进行簇指派、指令调度和寄存器分配的编译器后端优化方法.构造“定值-引用”链,求解变量的候选寄存器类型集合交集,确定其寄存器类型;实时评估可用资源,进行基于优先级的指令选择和基于平衡寄存器压力的簇指派;改进线性扫描算法,基于变量的“待引用次数”列表进行实时的寄存器分配.实验结果表明,本方法能够提升生成代码的性能,且算法是非启发式的,减小了编译所需的时间.展开更多
基金supported in part by open project foundation of State Key Laboratory of Cryptology National Natural Science Foundation of China (NSFC) under Grant No. 61272492, No. 61572521 and No. 61309008Natural Science Foundation for Young of Shaanxi Province under Grant No. 2013JQ8013
文摘Matrix multiplication plays a pivotal role in the symmetric cipher algorithms, but it is one of the most complex and time consuming units, its performance directly affects the efficiency of cipher algorithms. Combined with the characteristics of VLIW processor and matrix multiplication of symmetric cipher algorithms, this paper extracted the reconfigurable elements and analyzed the principle of matrix multiplication, then designed the reconfigurable architecture of matrix multiplication of VLIW processor further, at last we put forward single instructions for matrix multiplication between 4×1 and 4×4 matrix or two 4×4 matrix over GF(2~8), through the instructions extension, the instructions could support larger dimension operations. The experiment shows that the instructions we designed supports different dimensions matrix multiplication and improves the processing speed of multiplication greatly.
基金supported by National Natural Science Foundation of China with granted No.61404175
文摘An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs.
基金supported by National Natural Science Foundation of China with granted No.61404175
文摘As an important branch of information security algorithms,the efficient and flexible implementation of stream ciphers is vital.Existing implementation methods,such as FPGA,GPP and ASIC,provide a good support,but they could not achieve a better tradeoff between high speed processing and high flexibility.ASIC has fast processing speed,but its flexibility is poor,GPP has high flexibility,but the processing speed is slow,FPGA has high flexibility and processing speed,but the resource utilization is very low.This paper studies a stream cryptographic processor which can efficiently and flexibly implement a variety of stream cipher algorithms.By analyzing the structure model,processing characteristics and storage characteristics of stream ciphers,a reconfigurable stream cryptographic processor with special instructions based on VLIW is presented,which has separate/cluster storage structure and is oriented to stream cipher operations.The proposed instruction structure can effectively support stream cipher processing with multiple data bit widths,parallelism among stream cipher processing with different data bit widths,and parallelism among branch control and stream cipher processing with high instruction level parallelism;the designed separate/clustered special bit registers and general register heaps,key register heaps can satisfy cryptographic requirements.So the proposed processor not only flexibly accomplishes the combination of multiple basic stream cipher operations to finish stream cipher algorithms.It has been implemented with 0.18μm CMOS technology,the test results show that the frequency can reach 200 MHz,and power consumption is 310 mw.Ten kinds of stream ciphers were realized in the processor.The key stream generation throughput of Grain-80,W7,MICKEY,ACHTERBAHN and Shrink algorithm is 100 Mbps,66.67 Mbps,66.67 Mbps,50 Mbps and 800 Mbps,respectively.The test result shows that the processor presented can achieve good tradeoff between high performance and flexibility of stream ciphers.
文摘VLIW(Very Long Instruction Word)指令因为含有较多的空操作导致严重的代码体积膨胀问题,代码压缩是解决这一问题的有效措施.VLIW代码压缩需要解决三个关键问题,一是提高压缩率;二是降低解压操作对性能的影响;三是分支目标重定位.针对流体系结构上的VLIW指令特点,提出了二维压缩,对VLIW进行垂直与水平两个方向上的压缩,且水平解压可以与代码执行并行,并通过设置堆栈寄存器缓存循环入口地址.实验结果表明二维压缩有效解决了VLIW代码体积膨胀问题,可以使指令存储器的面积减少36.48%,并使得整个CISP系统面积减少了7.85%.
文摘密码专用处理器常采用分簇式超长指令字(Very Long Instruction Word,VLIW)架构,其性能的发挥依赖于编译器的实现.当前对于通用VLIW架构的编译后端优化方案,在密码专用处理器上都有一定的不适应性.为此,本文提出了一种面向密码专用处理器的、同时进行簇指派、指令调度和寄存器分配的编译器后端优化方法.构造“定值-引用”链,求解变量的候选寄存器类型集合交集,确定其寄存器类型;实时评估可用资源,进行基于优先级的指令选择和基于平衡寄存器压力的簇指派;改进线性扫描算法,基于变量的“待引用次数”列表进行实时的寄存器分配.实验结果表明,本方法能够提升生成代码的性能,且算法是非启发式的,减小了编译所需的时间.