Virtual memory management is always a very essential issue of the modern microprocessor design. A memory management unit (MMU) is designed to implement a virtual machine for user programs, and provides a management me...Virtual memory management is always a very essential issue of the modern microprocessor design. A memory management unit (MMU) is designed to implement a virtual machine for user programs, and provides a management mechanism between the operating system and user programs. This paper analyzes the tradeoffs considered in the MMU design of Unity 11 CPU of Peking University, and introduces in detail the solution of pure hardware table walking with two level page table organization. The implementation takes care of required operations and high performances needed by modern operating systems and low costs needed by embedded systems. This solution has been silicon proven, and successfully porting the Linux 2.4.17 kernel, the XWindow system, GNOME and most application software onto the Unity platform.展开更多
Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this ...Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this approach is simultaneous development of the program tools and the target processor to match software application. An effective co design flow must therefore support automatic software toolkits generation, without loss of optimizing efficiency. This has resulted in a paradigm shift towards a language based design methodology for microprocessor optimization and exploration. This paper proposes a formal grammar, UNI SPEC, which supports the automatic generation of assemblers, to describe the translation rules from assembly to binary. Based on UNI SPEC, it implements two typical applications, i.e., automatically generating the assembler and the test suites.展开更多
Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing as...Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing assembly instructions by bare hands, which costs a lot of programming efforts and causes very low efficiency. One solution is to integrate the deep learning processors as a new back-end into one prevalent high-level deep learning framework (e.g., TPU (tensor processing unit) is integrated into Tensorflow directly). However, this will obstruct other frameworks to profit from the programming interface, The alternative approach is to design a framework-independent low-level library for deep learning processors (e.g., the deep learning library for GPU, cuDNN). In this fashion, the library could be conveniently invoked in high-level programming frameworks and provides more generality. In order to allow more deep learning frameworks to gain benefits from this environment, we envision it as a low-level library which could be easily embedded into current high-level frameworks and provide high performance. Three major issues of designing such a library are discussed. The first one is the design of data structures. Data structures should be as few as possible while being able to support all possible operations. This will allow us to optimize the data structures easier without compromising the generality. The second one is the selection of operations, which should provide a rather wide range of operations to support various types of networks with high efficiency. The third is the design of the API, which should provide a flexible and user-friendly programming model and should be easy to be embedded into existing deep learning frameworks. Considering all the above issues, we propose DLPIib, a tensor-filter based library designed specific for deep learning processors. It contains two major data structures, tensor and filter, and a set of operators including basic neural network primitives and matrix/vector operations. It provides a descriptor-based API exposed as a C++ interface. The library achieves a speedup of 0.79x compared with the performance of hand-written assembly instructions.展开更多
Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms...Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms of soft error mitigation take actions to deal with soft errors only after they have been detected. Instead of the passive responses, this paper proposes a novel mechanism which proactively prevents from the occurrence of soft errors via architecture elasticity. In the light of a predictive model, we adapt the processor architectures h01istically and dynamically. The predictive model provides the ability to quickly and accurately predict the simulation target across different program execution phases on any architecture configurations by leveraging an artificial neural network model. Experimental results on SPEC CPU 2000 benchmarks show that our method inherently reduces the soft error rate by 33.2% and improves the energy efficiency by 18.3% as compared with the static configuration processor.展开更多
文摘Virtual memory management is always a very essential issue of the modern microprocessor design. A memory management unit (MMU) is designed to implement a virtual machine for user programs, and provides a management mechanism between the operating system and user programs. This paper analyzes the tradeoffs considered in the MMU design of Unity 11 CPU of Peking University, and introduces in detail the solution of pure hardware table walking with two level page table organization. The implementation takes care of required operations and high performances needed by modern operating systems and low costs needed by embedded systems. This solution has been silicon proven, and successfully porting the Linux 2.4.17 kernel, the XWindow system, GNOME and most application software onto the Unity platform.
文摘Microprocessor development emphasizes hardware and software co design. Hw/Sw co design is a modern technique aimed at shortening the time to market in designing the real time and embedded systems. Key feature of this approach is simultaneous development of the program tools and the target processor to match software application. An effective co design flow must therefore support automatic software toolkits generation, without loss of optimizing efficiency. This has resulted in a paradigm shift towards a language based design methodology for microprocessor optimization and exploration. This paper proposes a formal grammar, UNI SPEC, which supports the automatic generation of assemblers, to describe the translation rules from assembly to binary. Based on UNI SPEC, it implements two typical applications, i.e., automatically generating the assembler and the test suites.
基金This work is partially supported by the National Natural Science Foundation of China under Grant Nos. 61432016, 61472396, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, and 61602446, the National Basic Research 973 Program of China under Grant No. 2015CB358800, and the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDB02040009.
文摘Recently, deep learning processors have become one of the most promising solutions of accelerating deep learning algorithms. Currently, the only method of programming the deep learning processors is through writing assembly instructions by bare hands, which costs a lot of programming efforts and causes very low efficiency. One solution is to integrate the deep learning processors as a new back-end into one prevalent high-level deep learning framework (e.g., TPU (tensor processing unit) is integrated into Tensorflow directly). However, this will obstruct other frameworks to profit from the programming interface, The alternative approach is to design a framework-independent low-level library for deep learning processors (e.g., the deep learning library for GPU, cuDNN). In this fashion, the library could be conveniently invoked in high-level programming frameworks and provides more generality. In order to allow more deep learning frameworks to gain benefits from this environment, we envision it as a low-level library which could be easily embedded into current high-level frameworks and provide high performance. Three major issues of designing such a library are discussed. The first one is the design of data structures. Data structures should be as few as possible while being able to support all possible operations. This will allow us to optimize the data structures easier without compromising the generality. The second one is the selection of operations, which should provide a rather wide range of operations to support various types of networks with high efficiency. The third is the design of the API, which should provide a flexible and user-friendly programming model and should be easy to be embedded into existing deep learning frameworks. Considering all the above issues, we propose DLPIib, a tensor-filter based library designed specific for deep learning processors. It contains two major data structures, tensor and filter, and a set of operators including basic neural network primitives and matrix/vector operations. It provides a descriptor-based API exposed as a C++ interface. The library achieves a speedup of 0.79x compared with the performance of hand-written assembly instructions.
基金supported by the National Science and Technology Major Project under Grant Nos.2009ZX01028-002-003,2009ZX01029-001-003the National Natural Science Foundation of China under Grant Nos.61221062,61100163,61133004,61232009,61222204,61221062,61303158+1 种基金the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDA06010403the Ten Thousand Talent Program of China
文摘Due to the decreasing threshold voltages, shrinking feature size, as well as the exponential growth of on-chip transistors, modern processors are increasingly vulnerable to soft errors. However, traditional mechanisms of soft error mitigation take actions to deal with soft errors only after they have been detected. Instead of the passive responses, this paper proposes a novel mechanism which proactively prevents from the occurrence of soft errors via architecture elasticity. In the light of a predictive model, we adapt the processor architectures h01istically and dynamically. The predictive model provides the ability to quickly and accurately predict the simulation target across different program execution phases on any architecture configurations by leveraging an artificial neural network model. Experimental results on SPEC CPU 2000 benchmarks show that our method inherently reduces the soft error rate by 33.2% and improves the energy efficiency by 18.3% as compared with the static configuration processor.