The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ...The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.展开更多
Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly ...Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.展开更多
Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. ...Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.展开更多
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa...As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.展开更多
When it comes to decreasing margins and increasing energy effi-ciency in near-threshold and sub-threshold processors,timing error resilience may be viewed as a potentially lucrative alternative to examine.On the other...When it comes to decreasing margins and increasing energy effi-ciency in near-threshold and sub-threshold processors,timing error resilience may be viewed as a potentially lucrative alternative to examine.On the other hand,the currently employed approaches have certain restrictions,including high levels of design complexity,severe time constraints on error consolidation and propagation,and uncontaminated architectural registers(ARs).The design of near-threshold circuits,often known as NT circuits,is becoming the approach of choice for the construction of energy-efficient digital circuits.As a result of the exponentially decreased driving current,there was a reduction in performance,which was one of the downsides.Numerous studies have advised the use of NT techniques to chip multiprocessors as a means to preserve outstanding energy efficiency while minimising performance loss.Over the past several years,there has been a clear growth in interest in the development of artificial intelligence hardware with low energy consumption(AI).This has resulted in both large corporations and start-ups producing items that compete on the basis of varying degrees of performance and energy use.This technology’s ultimate goal was to provide levels of efficiency and performance that could not be achieved with graphics processing units or general-purpose CPUs.To achieve this objective,the technology was created to integrate several processing units into a single chip.To accomplish this purpose,the hardware was designed with a number of unique properties.In this study,an Energy Effi-cient Hyperparameter Tuned Deep Neural Network(EEHPT-DNN)model for Variation-Tolerant Near-Threshold Processor was developed.In order to improve the energy efficiency of artificial intelligence(AI),the EEHPT-DNN model employs several AI techniques.The notion focuses mostly on the repercussions of embedded technologies positioned at the network’s edge.The presented model employs a deep stacked sparse autoencoder(DSSAE)model with the objective of creating a variation-tolerant NT processor.The time-consuming method of modifying hyperparameters through trial and error is substituted with the marine predators optimization algorithm(MPO).This method is utilised to modify the hyperparameters associated with the DSSAE model.To validate that the proposed EEHPT-DNN model has a higher degree of functionality,a full simulation study is conducted,and the results are analysed from a variety of perspectives.This was completed so that the enhanced performance could be evaluated and analysed.According to the results of the study that compared numerous DL models,the EEHPT-DNN model performed significantly better than the other models.展开更多
Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. A...Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.展开更多
This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Adv...This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Advanced Encryption Standard (AES), 3 Data Encryption Standard (3 DES) and Secure Hash Algorithm 1 (SHA - 1 ) security engines, but also involves a new memory encryption scheme. The new memory encryption scheme is implemented by a memory encryption cache (MEC), which protects the confidentiality of the memory by AES encryption. The experi- ments show that the new secure design only causes 1.9% additional delay on the critical path and cuts 25.7% power consumption when the processor writes data back. The new processor balances the performance overhead, the power consumption and the security and fully meets the wireless sensor environment requirement. After physical design, the new encryption embedded processor has been successfully tape-out.展开更多
A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuch...A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.展开更多
目前,多核实时系统中同步任务的节能调度研究主要针对的是同构多核处理器平台,而异构多核处理器架构能够更有效地发挥系统性能。将现有的研究直接应用于异构多核系统,在保证可调度性的情况下会导致能耗变高。对此,通过使用动态电压与频...目前,多核实时系统中同步任务的节能调度研究主要针对的是同构多核处理器平台,而异构多核处理器架构能够更有效地发挥系统性能。将现有的研究直接应用于异构多核系统,在保证可调度性的情况下会导致能耗变高。对此,通过使用动态电压与频率调节(Dynamic Voltage Frequency Scaling,DVFS)技术,研究异构多核实时系统中基于任务同步的节能调度问题,提出同步感知的最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First,SA-LESF)。该算法针对所有任务的速度配置进行迭代优化,直至所有任务均达到其最大限度节能的速度配置。此外,进一步提出基于动态松弛时间回收的同步感知最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First with Dynamic Reclamation,SA-LESF-DR)。该算法在保证实时任务可调度的同时,实施相应的回收策略,进一步降低系统能耗。实验结果表明,SA-LESF与SA-LESF-DR算法在能耗表现上具有优势,在相同任务集下,相比其他算法可节省高达30%的能耗。展开更多
基金Project supported by the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No.60921062)the National Natural Science Foundation of China (Grant No.60873014)the Young Scientists Fund of the National Natural Science Foundation of China (Grant Nos.61003082 and 60903059)
文摘The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.
文摘Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.
文摘Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.
文摘As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.
文摘When it comes to decreasing margins and increasing energy effi-ciency in near-threshold and sub-threshold processors,timing error resilience may be viewed as a potentially lucrative alternative to examine.On the other hand,the currently employed approaches have certain restrictions,including high levels of design complexity,severe time constraints on error consolidation and propagation,and uncontaminated architectural registers(ARs).The design of near-threshold circuits,often known as NT circuits,is becoming the approach of choice for the construction of energy-efficient digital circuits.As a result of the exponentially decreased driving current,there was a reduction in performance,which was one of the downsides.Numerous studies have advised the use of NT techniques to chip multiprocessors as a means to preserve outstanding energy efficiency while minimising performance loss.Over the past several years,there has been a clear growth in interest in the development of artificial intelligence hardware with low energy consumption(AI).This has resulted in both large corporations and start-ups producing items that compete on the basis of varying degrees of performance and energy use.This technology’s ultimate goal was to provide levels of efficiency and performance that could not be achieved with graphics processing units or general-purpose CPUs.To achieve this objective,the technology was created to integrate several processing units into a single chip.To accomplish this purpose,the hardware was designed with a number of unique properties.In this study,an Energy Effi-cient Hyperparameter Tuned Deep Neural Network(EEHPT-DNN)model for Variation-Tolerant Near-Threshold Processor was developed.In order to improve the energy efficiency of artificial intelligence(AI),the EEHPT-DNN model employs several AI techniques.The notion focuses mostly on the repercussions of embedded technologies positioned at the network’s edge.The presented model employs a deep stacked sparse autoencoder(DSSAE)model with the objective of creating a variation-tolerant NT processor.The time-consuming method of modifying hyperparameters through trial and error is substituted with the marine predators optimization algorithm(MPO).This method is utilised to modify the hyperparameters associated with the DSSAE model.To validate that the proposed EEHPT-DNN model has a higher degree of functionality,a full simulation study is conducted,and the results are analysed from a variety of perspectives.This was completed so that the enhanced performance could be evaluated and analysed.According to the results of the study that compared numerous DL models,the EEHPT-DNN model performed significantly better than the other models.
基金Sponsored by the National Defence Research Foundation of China(Grant No.413460303).
文摘Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.
文摘This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Advanced Encryption Standard (AES), 3 Data Encryption Standard (3 DES) and Secure Hash Algorithm 1 (SHA - 1 ) security engines, but also involves a new memory encryption scheme. The new memory encryption scheme is implemented by a memory encryption cache (MEC), which protects the confidentiality of the memory by AES encryption. The experi- ments show that the new secure design only causes 1.9% additional delay on the critical path and cuts 25.7% power consumption when the processor writes data back. The new processor balances the performance overhead, the power consumption and the security and fully meets the wireless sensor environment requirement. After physical design, the new encryption embedded processor has been successfully tape-out.
文摘A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.
文摘目前,多核实时系统中同步任务的节能调度研究主要针对的是同构多核处理器平台,而异构多核处理器架构能够更有效地发挥系统性能。将现有的研究直接应用于异构多核系统,在保证可调度性的情况下会导致能耗变高。对此,通过使用动态电压与频率调节(Dynamic Voltage Frequency Scaling,DVFS)技术,研究异构多核实时系统中基于任务同步的节能调度问题,提出同步感知的最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First,SA-LESF)。该算法针对所有任务的速度配置进行迭代优化,直至所有任务均达到其最大限度节能的速度配置。此外,进一步提出基于动态松弛时间回收的同步感知最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First with Dynamic Reclamation,SA-LESF-DR)。该算法在保证实时任务可调度的同时,实施相应的回收策略,进一步降低系统能耗。实验结果表明,SA-LESF与SA-LESF-DR算法在能耗表现上具有优势,在相同任务集下,相比其他算法可节省高达30%的能耗。