By numerical propagation of the coupled Gross–Pitaevskii equations, the ground state phase of a SU(3) spin–orbit coupled Bose gas with nonlocal soft-core interactions has been investigated within the all parameter s...By numerical propagation of the coupled Gross–Pitaevskii equations, the ground state phase of a SU(3) spin–orbit coupled Bose gas with nonlocal soft-core interactions has been investigated within the all parameter space, showing strong dependence on the strength of SU(3) spin–orbit coupling, nonlocal soft-core interactions, spin-exchange interactions and Rydberg blockade radius. More specially, we also perform a detailed study of the dependence of soft-core interaction on the Rydberg blockade radius at the point of rotational symmetry breaking. Our results show that under the combined effects of such parameters, the ground state shows a threefold-degenerate magnetized state for ferromagnetic spin interaction, while a variety of lattice phases for antiferromagnetic spin interaction.展开更多
The implementation method of the IEEE 802.11 Medium Access Control (MAC) protocol is mainly based on DSP (Digital Signal Processor)/ ARM (Advanced Reduced instruction set computer Machine) processor or DSP/ARM IP (Int...The implementation method of the IEEE 802.11 Medium Access Control (MAC) protocol is mainly based on DSP (Digital Signal Processor)/ ARM (Advanced Reduced instruction set computer Machine) processor or DSP/ARM IP (Intellectual Property) core. This paper presents a method based on Nios II soft-core processor embedded in Altera’s Cyclone FPGA (Field Programmable Gate Array) and MicroC/OS-II RTOS (Real-Time Operation System). The benefits and drawbacks of above methods are compared, and then the method presented in this paper is described. The hardware and software partitioning are discussed; the hardware architecture is also illustrated and the MAC software programming is described in detail. The presented method has some advantages, such as low cost, easy-implementation and very suitable for the implementation of IEEE 802.11 MAC in research stage.展开更多
The historical significance of the Stern–Gerlach(SG)experiment lies in its provision of the initial evidence for space quantization.Over time,its sequential form has evolved into an elegant paradigm that effectively ...The historical significance of the Stern–Gerlach(SG)experiment lies in its provision of the initial evidence for space quantization.Over time,its sequential form has evolved into an elegant paradigm that effectively illustrates the fundamental principles of quantum theory.To date,the practical implementation of the sequential SG experiment has not been fully achieved.In this study,we demonstrate the capability of programmable quantum processors to simulate the sequential SG experiment.The specific parametric shallow quantum circuits,which are suitable for the limitations of current noisy quantum hardware,are given to replicate the functionality of SG devices with the ability to perform measurements in different directions.Surprisingly,it has been demonstrated that Wigner’s SG interferometer can be readily implemented in our sequential quantum circuit.With the utilization of the identical circuits,it is also feasible to implement Wheeler’s delayed-choice experiment.We propose the utilization of cross-shaped programmable quantum processors to showcase sequential experiments,and the simulation results demonstrate a strong alignment with theoretical predictions.With the rapid advancement of cloud-based quantum computing,such as BAQIS Quafu,it is our belief that the proposed solution is well-suited for deployment on the cloud,allowing for public accessibility.Our findings not only expand the potential applications of quantum computers,but also contribute to a deeper comprehension of the fundamental principles underlying quantum theory.展开更多
A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity...A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity of contemporary high-performance spacecraft processors.To harness these non-uniform access behaviors,an efficient cache replacement framework featuring an auxiliary cache specifically designed to retain evicted hot data was proposed.This framework reconstructs the cache replacement policy,facilitating data migration between the main cache and the auxiliary cache.Unlike traditional cacheline-granularity policies,the approach excels at identifying and evicting infrequently used data,thereby optimizing cache utilization.The evaluation shows impressive performance improvement,especially on workloads with irregular access patterns.Benefiting from fine granularity,the proposal achieves superior storage efficiency compared with commonly used cache management schemes,providing a potential optimization opportunity for modern resource-constrained processors,such as spacecraft processors.Furthermore,the framework complements existing modern cache replacement policies and can be seamlessly integrated with minimal modifications,enhancing their overall efficacy.展开更多
A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuch...A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.展开更多
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ...The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.展开更多
Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly ...Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.展开更多
There are varieties of embedded systems in the world. It is a big challenge to optimize the instruction sets of System on Chips (SoCs) according to different systems' working environments. The idea of programmable...There are varieties of embedded systems in the world. It is a big challenge to optimize the instruction sets of System on Chips (SoCs) according to different systems' working environments. The idea of programmable instruction set is an effective method to gain embedded system's re-configurability. This letter presents a logic module for Java processor to be capable of using programmable instruction set. Cost (area, power, and timing) of the module is trivial. Such module is also reusable for other embedded system solutions besides Java systems.展开更多
This paper reviews some of our recent works on phase behaviors of particulate systems with a soft-core interaction potential. The potential is purely repulsive and bounded, i.e., it is finite even when two particles c...This paper reviews some of our recent works on phase behaviors of particulate systems with a soft-core interaction potential. The potential is purely repulsive and bounded, i.e., it is finite even when two particles completely overlap. The one-sided linear spring (harmonic) potential is one of the representatives. This model system has been successively employed to study the jamming transition, i.e., the formation of rigid and disordered packings of hard particles, and establish the jamming physics. This is actually based on the "hard" aspect of the potential, because at low densities and when particle overlap is tiny the potential resembles the hard sphere limit. At high densities, the potential exhibits its "soft" aspect: with the increase of density, there are successive reentrant crystallizations with many types of solid phases. Taking advantage of the dual nature of the potential, we investigate the criticality of the jamming transition from different perspectives, extend the jamming scenario to high densities, reveal the novel density evolution of two-dimensional melting, and find unexpected formation of quasicrystals. It is surprising that such a simple potential can exhibit so rich and unexpected phenomena in phase transitions. The phase behaviors discussed in this paper are also highly regarded in polymer science, which may thus shed light on our understanding of polymeric systems or inspire new ideas in studies of polymers.展开更多
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t...Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.展开更多
Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. ...Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.展开更多
Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including...Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.展开更多
In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware m...In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.展开更多
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 12175129, 12475004, 12175027, and 12005125)the Key Research Program of Frontier Sciences of the Chinese Academy of Sciences (Grant No. ZDBSLY7016)+3 种基金the Shaanxi Fundamental Science Research Project for Mathematics and Physics (Grant No. 22JSY034)the Key Research and Development Projects of Shaanxi Province, China (Grant No. 2024GX-YBXM-564)the Scientific Research Program Funded by Shaanxi Provincial Education Department (Grant No. 23JP020)the Youth Innovation Team of Shaanxi Universities。
文摘By numerical propagation of the coupled Gross–Pitaevskii equations, the ground state phase of a SU(3) spin–orbit coupled Bose gas with nonlocal soft-core interactions has been investigated within the all parameter space, showing strong dependence on the strength of SU(3) spin–orbit coupling, nonlocal soft-core interactions, spin-exchange interactions and Rydberg blockade radius. More specially, we also perform a detailed study of the dependence of soft-core interaction on the Rydberg blockade radius at the point of rotational symmetry breaking. Our results show that under the combined effects of such parameters, the ground state shows a threefold-degenerate magnetized state for ferromagnetic spin interaction, while a variety of lattice phases for antiferromagnetic spin interaction.
文摘The implementation method of the IEEE 802.11 Medium Access Control (MAC) protocol is mainly based on DSP (Digital Signal Processor)/ ARM (Advanced Reduced instruction set computer Machine) processor or DSP/ARM IP (Intellectual Property) core. This paper presents a method based on Nios II soft-core processor embedded in Altera’s Cyclone FPGA (Field Programmable Gate Array) and MicroC/OS-II RTOS (Real-Time Operation System). The benefits and drawbacks of above methods are compared, and then the method presented in this paper is described. The hardware and software partitioning are discussed; the hardware architecture is also illustrated and the MAC software programming is described in detail. The presented method has some advantages, such as low cost, easy-implementation and very suitable for the implementation of IEEE 802.11 MAC in research stage.
基金supported by Beijing Academy of Quantum Information Sciencessupported by the State Key Laboratory of Low Dimensional Quantum Physics+2 种基金the Start-up Fund provided by Tsinghua Universitythe financial support provided by the National Natural Science Foundation of China(Grant No.92065113)the Anhui Initiative in Quantum Information Technologies。
文摘The historical significance of the Stern–Gerlach(SG)experiment lies in its provision of the initial evidence for space quantization.Over time,its sequential form has evolved into an elegant paradigm that effectively illustrates the fundamental principles of quantum theory.To date,the practical implementation of the sequential SG experiment has not been fully achieved.In this study,we demonstrate the capability of programmable quantum processors to simulate the sequential SG experiment.The specific parametric shallow quantum circuits,which are suitable for the limitations of current noisy quantum hardware,are given to replicate the functionality of SG devices with the ability to perform measurements in different directions.Surprisingly,it has been demonstrated that Wigner’s SG interferometer can be readily implemented in our sequential quantum circuit.With the utilization of the identical circuits,it is also feasible to implement Wheeler’s delayed-choice experiment.We propose the utilization of cross-shaped programmable quantum processors to showcase sequential experiments,and the simulation results demonstrate a strong alignment with theoretical predictions.With the rapid advancement of cloud-based quantum computing,such as BAQIS Quafu,it is our belief that the proposed solution is well-suited for deployment on the cloud,allowing for public accessibility.Our findings not only expand the potential applications of quantum computers,but also contribute to a deeper comprehension of the fundamental principles underlying quantum theory.
文摘A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity of contemporary high-performance spacecraft processors.To harness these non-uniform access behaviors,an efficient cache replacement framework featuring an auxiliary cache specifically designed to retain evicted hot data was proposed.This framework reconstructs the cache replacement policy,facilitating data migration between the main cache and the auxiliary cache.Unlike traditional cacheline-granularity policies,the approach excels at identifying and evicting infrequently used data,thereby optimizing cache utilization.The evaluation shows impressive performance improvement,especially on workloads with irregular access patterns.Benefiting from fine granularity,the proposal achieves superior storage efficiency compared with commonly used cache management schemes,providing a potential optimization opportunity for modern resource-constrained processors,such as spacecraft processors.Furthermore,the framework complements existing modern cache replacement policies and can be seamlessly integrated with minimal modifications,enhancing their overall efficacy.
文摘A team of researchers from the University of Science and Technology of China(USTC)of the Chinese Academy of Sciences(CAS)and its partners have made significant advancements in random quantum circuit sampling with Zuchongzhi-3,a superconducting quantum computing prototype featuring 105 qubits and 182 couplers.
基金Project supported by the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No.60921062)the National Natural Science Foundation of China (Grant No.60873014)the Young Scientists Fund of the National Natural Science Foundation of China (Grant Nos.61003082 and 60903059)
文摘The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research.
文摘Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device.
基金Supported by the Guangzhou Key Technology R&D Program (No.2007Z2-D0011)
文摘There are varieties of embedded systems in the world. It is a big challenge to optimize the instruction sets of System on Chips (SoCs) according to different systems' working environments. The idea of programmable instruction set is an effective method to gain embedded system's re-configurability. This letter presents a logic module for Java processor to be capable of using programmable instruction set. Cost (area, power, and timing) of the module is trivial. Such module is also reusable for other embedded system solutions besides Java systems.
基金the support from the National Natural Science Foundation of China (Nos. 11734014, 11574278, 21325418, 11074228, and 91027001)the National Basic Research Program of China (973 Program) (No. 2012CB821500)+1 种基金CAS 100Talent Program (No. 2030020004)Fundamental Research Funds for the Central Universities (Nos. 2340000034 and 2030020028)
文摘This paper reviews some of our recent works on phase behaviors of particulate systems with a soft-core interaction potential. The potential is purely repulsive and bounded, i.e., it is finite even when two particles completely overlap. The one-sided linear spring (harmonic) potential is one of the representatives. This model system has been successively employed to study the jamming transition, i.e., the formation of rigid and disordered packings of hard particles, and establish the jamming physics. This is actually based on the "hard" aspect of the potential, because at low densities and when particle overlap is tiny the potential resembles the hard sphere limit. At high densities, the potential exhibits its "soft" aspect: with the increase of density, there are successive reentrant crystallizations with many types of solid phases. Taking advantage of the dual nature of the potential, we investigate the criticality of the jamming transition from different perspectives, extend the jamming scenario to high densities, reveal the novel density evolution of two-dimensional melting, and find unexpected formation of quasicrystals. It is surprising that such a simple potential can exhibit so rich and unexpected phenomena in phase transitions. The phase behaviors discussed in this paper are also highly regarded in polymer science, which may thus shed light on our understanding of polymeric systems or inspire new ideas in studies of polymers.
基金Project(2008AA01A201) supported the National High-tech Research and Development Program of ChinaProjects(60833004, 60633050) supported by the National Natural Science Foundation of China
文摘Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores.
文摘Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger.
基金The National High-Tech Research and Development Program of China(863 Program)2014AA01A705
文摘Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers.
基金supported partially by the National High Technical Research and Development Program of China (863 Program) under Grants No. 2011AA040101, No. 2008AA01Z134the National Natural Science Foundation of China under Grants No. 61003251, No. 61172049, No. 61173150+2 种基金the Doctoral Fund of Ministry of Education of China under Grant No. 20100006110015Beijing Municipal Natural Science Foundation under Grant No. Z111100054011078the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science under Grant No. Z121101002812005
文摘In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.