Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance ...Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.展开更多
Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs...Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs should be designed. In this paper, we proposed three different designs for reversible 1-bit ALUs using our proposed 3×3 and 4×4 reversible gates called MEB3 and MEB4(Moallem Ehsanpour Bolhasani) gates, respectively. The first proposed reversible ALU consists of six logical operations. The second proposed ALU consists of eight operations, two arithmetic, and six logical operations. And finally, the third proposed ALU consists of sixteen operations, four arithmetic operations, and twelve logical operations. Our proposed ALUs can be used to construct efficient quantum computers in nanotechnology, because the proposed designs are better than the existing designs in terms of quantum cost, constant input, reversible gates used, hardware complexity, and functions generated.展开更多
A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many po...A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many power den-sity issues should be reduced by scaling threshold voltage and supply voltage.Initially,Complementary Metal Oxide Semiconductor(CMOS)technology sup-ports power saving up to 32 nm gate length,but further scaling causes short severe channel effects such as threshold voltage swing,mobility degradation,and more leakage power(less than 32)at gate length.Hence,it directly affects the arithmetic logic unit(ALU),which suffers a significant power density of the scaled multi-core architecture.Therefore,it losses reliability features to get overheating and increased temperature.This paper presents a novel power mini-mization technique for active 4-bit ALU operations using Fin Field Effect Tran-sistor(FinFET)at 22 nm technology.Based on this,a diode is directly connected to the load transistor,and it is active only at the saturation region as a function.Thereby,the access transistor can cutoff of the leakage current,and sleep transis-tors control theflow of leakage current corresponding to each instant ALU opera-tion.The combination of transistors(access and sleep)reduces the leakage current from micro to nano-ampere.Further,the power minimization is achieved by con-necting the number of transistors(6T and 10T)of the FinFET structure to ALU with 22 nm technology.For simulation concerns,a Tanner(T-Spice)with 22 nm technology implements the proposed design,which reduces threshold vol-tage swing,supply power,leakage current,gate length delay,etc.As a result,it is quite suitable for the ALU architecture of a high-speed multi-core processor.展开更多
Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) micro...Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) microprocessors. These technologies can not only shorten the calculation time but also solve data hazards. Among them, the proposed bypass technology is applicable to any 2n-bit ALU, whether it is bit-serial, bit-slice or bit-parallel. The high performance bit-slice ALU was implemented using the 6 kA/cm^(2) Nb/AlOx/Nb junction fabrication process from Superconducting Electronics Facility of Shanghai Institute of Microsystem and Information Technology. It consists of 1693 Josephson junctions with an area of 2.46 0.81 mm^(2). All ALU operations of the MIPS32 instruction set are implemented, including two extended instructions, i.e., addition with carry (ADDC) and subtraction with borrow (SUBB). All the ALU operations were successfully obtained in SFQ testing based on OCTOPUX and the measured DC bias current margin can reach 86% - 104%. The ALU achieves a 100 utilization rate, regardless of carry/borrow read-after-write correlations between instructions.展开更多
Register transfer level mapping (RTLM) algorithm for technology mapping at RT level is presented,which supports current design methodologies using high level design and design reuse.The mapping rules implement a sou...Register transfer level mapping (RTLM) algorithm for technology mapping at RT level is presented,which supports current design methodologies using high level design and design reuse.The mapping rules implement a source ALU using target ALU.The source ALUs and the target ALUs are all represented by the general ALUs and the mapping rules are applied in the algorithm.The mapping rules are described in a table fashion.The graph clustering algorithm is a branch and bound algorithm based on the graph formulation of the mapping algorithm.The mapping algorithm suits well mapping of regularly structured data path.Comparisons are made between the experimental results generated by 1 greedy algorithm and graphclustering algorithm,showing the feasibility of presented algorithm.展开更多
Recently, security in embedded system arises attentions because of modern electronic devices need cau- tiously either exchange or communicate with the sensitive data. Although security is classical research topic in...Recently, security in embedded system arises attentions because of modern electronic devices need cau- tiously either exchange or communicate with the sensitive data. Although security is classical research topic in world- wide communication, the researchers still face the problems of how to deal with these resource constraint devices and en- hance the features of assurance and certification. Therefore, some computations of cryptographic algorithms are built on hardware platforms, such as field program gate arrays (FPGAs). The commonly used cryptographic algorithms for digital signature algorithm (DSA) are rivest-shamir-adleman (RSA) and elliptic curve cryptosystems (ECC) which based on the presumed difficulty of factoring large integers and the algebraic structure of elliptic curves over finite fields. Usu- ally, RSA is computed over GF(p), and ECC is computed over GF(p) or GF(2P). Moreover, embedded applications need advance encryption standard (AES) algorithms to pro- cess encryption and decryption procedures. In order to reuse the hardware resources and meet the trade-off between area and performance, we proposed a new triple functional arith- metic unit for computing high radix RSA and ECC operations over GF(p) and GF(2P), which also can be extended to support AES operations. A new high radix signed digital (SD) adder has been proposed to eliminate the carry propagations over GF(p). The proposed unified design took up 28.7% less hardware resources than implementing RSA, ECC, and AES individually, and the experimental results show that our proposed architecture can achieve 141.8 MHz using approxi- mately 5.5k CLBs on Virtex-5 FPGA.展开更多
基金supported by the Fundamental Research Funds for the Central Universities of China under Grant No.BLX202015Beijing Municipal Natural Science Foundation under Grant No.6222038the National Natural Science Foundation of China under Grant No.92164203.
文摘Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers.Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing.However,as video/image processing and machine learning algorithms are widespread,the technique of approximate computing in these applications has become a hot topic.The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms,and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality.This paper presents a survey of approximate computing from arithmetic units design to high-level applications,in which we try to give researchers a comprehensive and insightful understanding of approximate computing.We believe that approximate computing will play an important role in the circuit and system design in the future,especially with the rapid development of artificial intelligence algorithms and their related applications.
文摘Arithmetic Logic Unit(ALU) as one of the main parts of any computing hardware plays an important role in digital computers. In quantum computers which can be realized by reversible logics and circuits, reversible ALUs should be designed. In this paper, we proposed three different designs for reversible 1-bit ALUs using our proposed 3×3 and 4×4 reversible gates called MEB3 and MEB4(Moallem Ehsanpour Bolhasani) gates, respectively. The first proposed reversible ALU consists of six logical operations. The second proposed ALU consists of eight operations, two arithmetic, and six logical operations. And finally, the third proposed ALU consists of sixteen operations, four arithmetic operations, and twelve logical operations. Our proposed ALUs can be used to construct efficient quantum computers in nanotechnology, because the proposed designs are better than the existing designs in terms of quantum cost, constant input, reversible gates used, hardware complexity, and functions generated.
文摘A design of a high-speed multi-core processor with compact size is a trending approach in the Integrated Circuits(ICs)fabrication industries.Because whenever device size comes down into narrow,designers facing many power den-sity issues should be reduced by scaling threshold voltage and supply voltage.Initially,Complementary Metal Oxide Semiconductor(CMOS)technology sup-ports power saving up to 32 nm gate length,but further scaling causes short severe channel effects such as threshold voltage swing,mobility degradation,and more leakage power(less than 32)at gate length.Hence,it directly affects the arithmetic logic unit(ALU),which suffers a significant power density of the scaled multi-core architecture.Therefore,it losses reliability features to get overheating and increased temperature.This paper presents a novel power mini-mization technique for active 4-bit ALU operations using Fin Field Effect Tran-sistor(FinFET)at 22 nm technology.Based on this,a diode is directly connected to the load transistor,and it is active only at the saturation region as a function.Thereby,the access transistor can cutoff of the leakage current,and sleep transis-tors control theflow of leakage current corresponding to each instant ALU opera-tion.The combination of transistors(access and sleep)reduces the leakage current from micro to nano-ampere.Further,the power minimization is achieved by con-necting the number of transistors(6T and 10T)of the FinFET structure to ALU with 22 nm technology.For simulation concerns,a Tanner(T-Spice)with 22 nm technology implements the proposed design,which reduces threshold vol-tage swing,supply power,leakage current,gate length delay,etc.As a result,it is quite suitable for the ALU architecture of a high-speed multi-core processor.
基金Strategic Priority Research Program of Chinese Academy of Sciences,under Grant XDA18000000.
文摘Two optimization technologies, namely, bypass and carry-control optimization, were demonstrated for enhancing the performance of a bit-slice Arithmetic Logic Unit (ALU) in 2n-bit Rapid Single-Flux-Quantum (RSFQ) microprocessors. These technologies can not only shorten the calculation time but also solve data hazards. Among them, the proposed bypass technology is applicable to any 2n-bit ALU, whether it is bit-serial, bit-slice or bit-parallel. The high performance bit-slice ALU was implemented using the 6 kA/cm^(2) Nb/AlOx/Nb junction fabrication process from Superconducting Electronics Facility of Shanghai Institute of Microsystem and Information Technology. It consists of 1693 Josephson junctions with an area of 2.46 0.81 mm^(2). All ALU operations of the MIPS32 instruction set are implemented, including two extended instructions, i.e., addition with carry (ADDC) and subtraction with borrow (SUBB). All the ALU operations were successfully obtained in SFQ testing based on OCTOPUX and the measured DC bias current margin can reach 86% - 104%. The ALU achieves a 100 utilization rate, regardless of carry/borrow read-after-write correlations between instructions.
文摘Register transfer level mapping (RTLM) algorithm for technology mapping at RT level is presented,which supports current design methodologies using high level design and design reuse.The mapping rules implement a source ALU using target ALU.The source ALUs and the target ALUs are all represented by the general ALUs and the mapping rules are applied in the algorithm.The mapping rules are described in a table fashion.The graph clustering algorithm is a branch and bound algorithm based on the graph formulation of the mapping algorithm.The mapping algorithm suits well mapping of regularly structured data path.Comparisons are made between the experimental results generated by 1 greedy algorithm and graphclustering algorithm,showing the feasibility of presented algorithm.
基金This work was supported by National Natural Science Foundation of China (Grant No. 61173036) and the Fundamental Research Funds for Chinese Central Universities.
文摘Recently, security in embedded system arises attentions because of modern electronic devices need cau- tiously either exchange or communicate with the sensitive data. Although security is classical research topic in world- wide communication, the researchers still face the problems of how to deal with these resource constraint devices and en- hance the features of assurance and certification. Therefore, some computations of cryptographic algorithms are built on hardware platforms, such as field program gate arrays (FPGAs). The commonly used cryptographic algorithms for digital signature algorithm (DSA) are rivest-shamir-adleman (RSA) and elliptic curve cryptosystems (ECC) which based on the presumed difficulty of factoring large integers and the algebraic structure of elliptic curves over finite fields. Usu- ally, RSA is computed over GF(p), and ECC is computed over GF(p) or GF(2P). Moreover, embedded applications need advance encryption standard (AES) algorithms to pro- cess encryption and decryption procedures. In order to reuse the hardware resources and meet the trade-off between area and performance, we proposed a new triple functional arith- metic unit for computing high radix RSA and ECC operations over GF(p) and GF(2P), which also can be extended to support AES operations. A new high radix signed digital (SD) adder has been proposed to eliminate the carry propagations over GF(p). The proposed unified design took up 28.7% less hardware resources than implementing RSA, ECC, and AES individually, and the experimental results show that our proposed architecture can achieve 141.8 MHz using approxi- mately 5.5k CLBs on Virtex-5 FPGA.