Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.展开更多
The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ...The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.展开更多
Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly ...Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.展开更多
Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to acc...Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.展开更多
Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. ...Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.展开更多
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa...As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.展开更多
Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. A...Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.展开更多
This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Adv...This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Advanced Encryption Standard (AES), 3 Data Encryption Standard (3 DES) and Secure Hash Algorithm 1 (SHA - 1 ) security engines, but also involves a new memory encryption scheme. The new memory encryption scheme is implemented by a memory encryption cache (MEC), which protects the confidentiality of the memory by AES encryption. The experi- ments show that the new secure design only causes 1.9% additional delay on the critical path and cuts 25.7% power consumption when the processor writes data back. The new processor balances the performance overhead, the power consumption and the security and fully meets the wireless sensor environment requirement. After physical design, the new encryption embedded processor has been successfully tape-out.展开更多
Space-division multiplexing(SDM)utilizing uncoupled multi-core fibers(MCF)is considered a promising candidate for nextgeneration high-speed optical transmission systems due to its huge capacity and low inter-core cros...Space-division multiplexing(SDM)utilizing uncoupled multi-core fibers(MCF)is considered a promising candidate for nextgeneration high-speed optical transmission systems due to its huge capacity and low inter-core crosstalk.In this paper,we demonstrate a realtime high-speed SDM transmission system over a field-deployed 7-core MCF cable using commercial 400 Gbit/s backbone optical transport network(OTN)transceivers and a network management system.The transceivers employ a high noise-tolerant quadrature phase shift keying(QPSK)modulation format with a 130 Gbaud rate,enabled by optoelectronic multi-chip module(OE-MCM)packaging.The network management system can effectively manage and monitor the performance of the 7-core SDM OTN system and promptly report failure events through alarms.Our field trial demonstrates the compatibility of uncoupled MCF with high-speed OTN transmission equipment and network management systems,supporting its future deployment in next-generation high-speed terrestrial cable transmission networks.展开更多
The accuracy of present flatness predictive method is limited and it just belongs to software simulation. In order to improve it, a novel flatness predictive model via T-S cloud reasoning network implemented by digita...The accuracy of present flatness predictive method is limited and it just belongs to software simulation. In order to improve it, a novel flatness predictive model via T-S cloud reasoning network implemented by digital signal processor(DSP) is proposed. First, the combination of genetic algorithm(GA) and simulated annealing algorithm(SAA) is put forward, called GA-SA algorithm, which can make full use of the global search ability of GA and local search ability of SA. Later, based on T-S cloud reasoning neural network, flatness predictive model is designed in DSP. And it is applied to 900 HC reversible cold rolling mill. Experimental results demonstrate that the flatness predictive model via T-S cloud reasoning network can run on the hardware DSP TMS320 F2812 with high accuracy and robustness by using GA-SA algorithm to optimize the model parameter.展开更多
Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its str...Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its striking advantage in flexibility and re-configurability.In this paper,both the advantages and challenges of GPP platform are detailed analyzed.Furthermore,both GPP based software and hardware architectures for open 5G are presented and the performances of real-time signal processing and power consumption are also evaluated.The evaluation results indicate that turbo and power consumption may be another challengeable problem should be further solved to meet the requirements of realistic deployments.展开更多
After an introduction to the implementation of supervisory computer control (SCC) through networks and the relevant security issues, this paper centers on the core of network security design: intelligent front-end pro...After an introduction to the implementation of supervisory computer control (SCC) through networks and the relevant security issues, this paper centers on the core of network security design: intelligent front-end processor (FEP), encryption/decryption method and authentication protocol. Some other system-specific security measures are also proposed. Although these are examples only, the techniques discussed can also be used in and provide reference for other remote control systems.展开更多
Scheduling algorithm always plays an important role in the spatial architecture for the contradiction between the finite network bandwidth and the abundant execution resources. This article provides a simple method to...Scheduling algorithm always plays an important role in the spatial architecture for the contradiction between the finite network bandwidth and the abundant execution resources. This article provides a simple method to solve the contention for network resource in one of the spatial architecture, i.e. the tera-op, reliable, intelligently adaptive processing system(TRIPS) processor. The method improves the performance of network by increasing the bypass bandwidth which can transmit the data in the internal of every execution unit, and converting the proportion of remote communication by the deep scheduling algorithm. The deeply optimized algorithm is realized to verify the validity of the method, and the performance increase 9% for floating point spec2000 benchmark is got.展开更多
Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs...Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs) that incorporate novel architectures to speedup packet processing, there is an increasing need for an efficient method to facilitate the study of their performance. In this paper, we present a tool called SimNP, which provides a flexible platform for the simulation of a network processing system in order to provide information for workload characterization, architecture development, and application implementation. The simulator models several architectural features that are commonly employed by NPs, including multiple processing engines (PEs), integrated network interface and memory controller, and hardware accelerators. ARM instruction set is emulated and a simple memory model is provided so that applications implemented in high level programming language such as C can be easily compiled into an executable binary using a common compiler like gcc. Moreover, new features or new modules can also be easily added into this simulator. Experiments have shown that our simulator provides abundant information for the study of network processing systems.展开更多
This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, th...This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, the IPSec processing in- cluding the crypto-operation, the database query, and IPSec header processing are integrated in the design. The in-line NSP is implemented using 65 nm CMOS technology and the layout area is 2.5 mm^3 mm with 360 million gates. A configurable crossbar data transfer skeleton implementing an iSLIP scheduling algorithm is proposed, which enables simultaneous data transfer between the heterogeneous multiple cores. There are, in addition, a high speed input/output data buffering mechanism and design of high performance hardware structures for modules, wherein the transfer efficiency and the resource utilization are maximized and the IPSec protocol processing achieves 10 Gbps line speed. A high speed and low power hardware look-up method is proposed, which effectively reduces the area and power dissipation. The post simulation results demonstrate that the design gives a peak throughput for the Authentication Header (AH) transport mode of 10.06 Gbps with the average test packet length of 512 bytes under the clock rate of 250 MHz, and power dissipation less than 1 W is obtained. An FPGA prototype is constructed to verify the function of the design. A test bench is being set up for performance and function verification.展开更多
Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile term...Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile terminal system in an intellectualized building. It can provide its holder such functions: 1) Accurate Positioning 2) Intelligent Navigation 3) Video Monitoring 4) Wireless Communication. The inno-vative point for this paper is to apply the multi-core computing on the embedded system to promote its com-puting speed and give a real-time performance and apply this system into the indoor environment for the purpose of emergent event or rescuing.展开更多
Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including...Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.展开更多
基金Project(2008AA01A201) supported the National High-tech Research and Development Program of ChinaProjects(60833004, 60633050) supported by the National Natural Science Foundation of China
文摘Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
基金Project supported by the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No.60921062)the National Natural Science Foundation of China (Grant No.60873014)the Young Scientists Fund of the National Natural Science Foundation of China (Grant Nos.61003082 and 60903059)
文摘The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.
文摘Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.
文摘Modern shared-memory multi-core processors typically have shared Level 2(L2)or Level 3(L3)caches.Cache bottlenecks and replacement strategies are the main problems of such architectures,where multiple cores try to access the shared cache simultaneously.The main problem in improving memory performance is the shared cache architecture and cache replacement.This paper documents the implementation of a Dual-Port Content Addressable Memory(DPCAM)and a modified Near-Far Access Replacement Algorithm(NFRA),which was previously proposed as a shared L2 cache layer in a multi-core processor.Standard Performance Evaluation Corporation(SPEC)Central Processing Unit(CPU)2006 benchmark workloads are used to evaluate the benefit of the shared L2 cache layer.Results show improved performance of the multicore processor’s DPCAM and NFRA algorithms,corresponding to a higher number of concurrent accesses to shared memory.The new architecture significantly increases system throughput and records performance improvements of up to 8.7%on various types of SPEC 2006 benchmarks.The miss rate is also improved by about 13%,with some exceptions in the sphinx3 and bzip2 benchmarks.These results could open a new window for solving the long-standing problems with shared cache in multi-core processors.
文摘Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.
文摘As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture.
基金Sponsored by the National Defence Research Foundation of China(Grant No.413460303).
文摘Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modem network processor before making chip. In order to achieve the tradeoff between performance and power, the processor simulator is used to design the architecture of network processor. Using Netbeneh, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance lnteI IXP 2800 Network processor eonfignration, optimized instruction fetch width and speed ,instruction issue width, instruction window size are analyzed and selected. Simulation resuits show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.
文摘This paper presents a new encryption embedded processor aimed at the application requirement of wireless sensor network (WSN). The new encryption embedded processor not only offers Rivest Shamir Adlemen (RSA), Advanced Encryption Standard (AES), 3 Data Encryption Standard (3 DES) and Secure Hash Algorithm 1 (SHA - 1 ) security engines, but also involves a new memory encryption scheme. The new memory encryption scheme is implemented by a memory encryption cache (MEC), which protects the confidentiality of the memory by AES encryption. The experi- ments show that the new secure design only causes 1.9% additional delay on the critical path and cuts 25.7% power consumption when the processor writes data back. The new processor balances the performance overhead, the power consumption and the security and fully meets the wireless sensor environment requirement. After physical design, the new encryption embedded processor has been successfully tape-out.
文摘Space-division multiplexing(SDM)utilizing uncoupled multi-core fibers(MCF)is considered a promising candidate for nextgeneration high-speed optical transmission systems due to its huge capacity and low inter-core crosstalk.In this paper,we demonstrate a realtime high-speed SDM transmission system over a field-deployed 7-core MCF cable using commercial 400 Gbit/s backbone optical transport network(OTN)transceivers and a network management system.The transceivers employ a high noise-tolerant quadrature phase shift keying(QPSK)modulation format with a 130 Gbaud rate,enabled by optoelectronic multi-chip module(OE-MCM)packaging.The network management system can effectively manage and monitor the performance of the 7-core SDM OTN system and promptly report failure events through alarms.Our field trial demonstrates the compatibility of uncoupled MCF with high-speed OTN transmission equipment and network management systems,supporting its future deployment in next-generation high-speed terrestrial cable transmission networks.
基金Project(E2015203354)supported by Natural Science Foundation of Steel United Research Fund of Hebei Province,ChinaProject(ZD2016100)supported by the Science and the Technology Research Key Project of High School of Hebei Province,China+1 种基金Project(LJRC013)supported by the University Innovation Team of Hebei Province Leading Talent Cultivation,ChinaProject(16LGY015)supported by the Basic Research Special Breeding of Yanshan University,China
文摘The accuracy of present flatness predictive method is limited and it just belongs to software simulation. In order to improve it, a novel flatness predictive model via T-S cloud reasoning network implemented by digital signal processor(DSP) is proposed. First, the combination of genetic algorithm(GA) and simulated annealing algorithm(SAA) is put forward, called GA-SA algorithm, which can make full use of the global search ability of GA and local search ability of SA. Later, based on T-S cloud reasoning neural network, flatness predictive model is designed in DSP. And it is applied to 900 HC reversible cold rolling mill. Experimental results demonstrate that the flatness predictive model via T-S cloud reasoning network can run on the hardware DSP TMS320 F2812 with high accuracy and robustness by using GA-SA algorithm to optimize the model parameter.
基金funded in part by National Natural Science Foundation of China(grant NO.61471347)National S&T Mayor Project of the Ministry of S&T of China(grant NO.2016ZX03001020-003)+1 种基金key program for international S&T Cooperation Program of China(grant NO.2014DFA11640)Shanghai Natural Science Foundation(grant NO.16ZR1435100)
文摘Due to 5G's stringent and uncertainty traffic requirements,open ecosystem would be one inevitable way to develop 5G.On the other hand,GPP based mobile communication becomes appealing recently attributed to its striking advantage in flexibility and re-configurability.In this paper,both the advantages and challenges of GPP platform are detailed analyzed.Furthermore,both GPP based software and hardware architectures for open 5G are presented and the performances of real-time signal processing and power consumption are also evaluated.The evaluation results indicate that turbo and power consumption may be another challengeable problem should be further solved to meet the requirements of realistic deployments.
文摘After an introduction to the implementation of supervisory computer control (SCC) through networks and the relevant security issues, this paper centers on the core of network security design: intelligent front-end processor (FEP), encryption/decryption method and authentication protocol. Some other system-specific security measures are also proposed. Although these are examples only, the techniques discussed can also be used in and provide reference for other remote control systems.
文摘Scheduling algorithm always plays an important role in the spatial architecture for the contradiction between the finite network bandwidth and the abundant execution resources. This article provides a simple method to solve the contention for network resource in one of the spatial architecture, i.e. the tera-op, reliable, intelligently adaptive processing system(TRIPS) processor. The method improves the performance of network by increasing the bypass bandwidth which can transmit the data in the internal of every execution unit, and converting the proportion of remote communication by the deep scheduling algorithm. The deeply optimized algorithm is realized to verify the validity of the method, and the performance increase 9% for floating point spec2000 benchmark is got.
文摘Network processing plays an important role in the development of Internet as more and more complicated applications are deployed throughout the network. With the advent of new platforms such as network processors (NPs) that incorporate novel architectures to speedup packet processing, there is an increasing need for an efficient method to facilitate the study of their performance. In this paper, we present a tool called SimNP, which provides a flexible platform for the simulation of a network processing system in order to provide information for workload characterization, architecture development, and application implementation. The simulator models several architectural features that are commonly employed by NPs, including multiple processing engines (PEs), integrated network interface and memory controller, and hardware accelerators. ARM instruction set is emulated and a simple memory model is provided so that applications implemented in high level programming language such as C can be easily compiled into an executable binary using a common compiler like gcc. Moreover, new features or new modules can also be easily added into this simulator. Experiments have shown that our simulator provides abundant information for the study of network processing systems.
基金Project (No. 2011ZX01034-002-002-003) supported by the National Science and Technology Major Projects of the Ministry of Industry and Information Technology, China
文摘This paper deals with an in-line network security processor (NSP) design that implements the Intemet Protocol Security (IPSec) protocol processing for the 10 Gbps Ethernet. The 10 Gbps high speed data transfer, the IPSec processing in- cluding the crypto-operation, the database query, and IPSec header processing are integrated in the design. The in-line NSP is implemented using 65 nm CMOS technology and the layout area is 2.5 mm^3 mm with 360 million gates. A configurable crossbar data transfer skeleton implementing an iSLIP scheduling algorithm is proposed, which enables simultaneous data transfer between the heterogeneous multiple cores. There are, in addition, a high speed input/output data buffering mechanism and design of high performance hardware structures for modules, wherein the transfer efficiency and the resource utilization are maximized and the IPSec protocol processing achieves 10 Gbps line speed. A high speed and low power hardware look-up method is proposed, which effectively reduces the area and power dissipation. The post simulation results demonstrate that the design gives a peak throughput for the Authentication Header (AH) transport mode of 10.06 Gbps with the average test packet length of 512 bytes under the clock rate of 250 MHz, and power dissipation less than 1 W is obtained. An FPGA prototype is constructed to verify the function of the design. A test bench is being set up for performance and function verification.
文摘Established on the Intel Multi-Core Embedded platform, using 802.11 Wireless Network protocols as the communication medium, combining with Radio Frequency-Communication and Ultrasonic Ranging, imple-ment a mobile terminal system in an intellectualized building. It can provide its holder such functions: 1) Accurate Positioning 2) Intelligent Navigation 3) Video Monitoring 4) Wireless Communication. The inno-vative point for this paper is to apply the multi-core computing on the embedded system to promote its com-puting speed and give a real-time performance and apply this system into the indoor environment for the purpose of emergent event or rescuing.
基金The National High-Tech Research and Development Program of China(863 Program)2014AA01A705
文摘Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.