期刊文献+
共找到330篇文章
< 1 2 17 >
每页显示 20 50 100
Efficient cache replacement framework based on access hotness for spacecraft processors
1
作者 GAO Xin NIAN Jiawei +1 位作者 LIU Hongjin YANG Mengfei 《中国空间科学技术(中英文)》 CSCD 北大核心 2024年第2期74-88,共15页
A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity... A notable portion of cachelines in real-world workloads exhibits inner non-uniform access behaviors.However,modern cache management rarely considers this fine-grained feature,which impacts the effective cache capacity of contemporary high-performance spacecraft processors.To harness these non-uniform access behaviors,an efficient cache replacement framework featuring an auxiliary cache specifically designed to retain evicted hot data was proposed.This framework reconstructs the cache replacement policy,facilitating data migration between the main cache and the auxiliary cache.Unlike traditional cacheline-granularity policies,the approach excels at identifying and evicting infrequently used data,thereby optimizing cache utilization.The evaluation shows impressive performance improvement,especially on workloads with irregular access patterns.Benefiting from fine granularity,the proposal achieves superior storage efficiency compared with commonly used cache management schemes,providing a potential optimization opportunity for modern resource-constrained processors,such as spacecraft processors.Furthermore,the framework complements existing modern cache replacement policies and can be seamlessly integrated with minimal modifications,enhancing their overall efficacy. 展开更多
关键词 spacecraft processors cache management replacement policy storage efficiency memory hierarchy MICROARCHITECTURE
在线阅读 下载PDF
Proposal for sequential Stern-Gerlach experiment with programmable quantum processors
2
作者 胡孟军 缪海兴 张永生 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第2期131-136,共6页
The historical significance of the Stern–Gerlach(SG)experiment lies in its provision of the initial evidence for space quantization.Over time,its sequential form has evolved into an elegant paradigm that effectively ... The historical significance of the Stern–Gerlach(SG)experiment lies in its provision of the initial evidence for space quantization.Over time,its sequential form has evolved into an elegant paradigm that effectively illustrates the fundamental principles of quantum theory.To date,the practical implementation of the sequential SG experiment has not been fully achieved.In this study,we demonstrate the capability of programmable quantum processors to simulate the sequential SG experiment.The specific parametric shallow quantum circuits,which are suitable for the limitations of current noisy quantum hardware,are given to replicate the functionality of SG devices with the ability to perform measurements in different directions.Surprisingly,it has been demonstrated that Wigner’s SG interferometer can be readily implemented in our sequential quantum circuit.With the utilization of the identical circuits,it is also feasible to implement Wheeler’s delayed-choice experiment.We propose the utilization of cross-shaped programmable quantum processors to showcase sequential experiments,and the simulation results demonstrate a strong alignment with theoretical predictions.With the rapid advancement of cloud-based quantum computing,such as BAQIS Quafu,it is our belief that the proposed solution is well-suited for deployment on the cloud,allowing for public accessibility.Our findings not only expand the potential applications of quantum computers,but also contribute to a deeper comprehension of the fundamental principles underlying quantum theory. 展开更多
关键词 sequential Stern-Gerlach quantum circuit quantum processor
原文传递
Speeding up the MATLAB complex networks package using graphic processors 被引量:1
3
作者 张百达 唐玉华 +1 位作者 吴俊杰 李鑫 《Chinese Physics B》 SCIE EI CAS CSCD 2011年第9期460-467,共8页
The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks ... The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3x. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. 展开更多
关键词 complex networks graphic processors unit MATLAB Jacket Toolbox
原文传递
SDN-Based Switch Implementation on Network Processors 被引量:1
4
作者 Yunchun Li Guodong Wang 《Communications and Network》 2013年第3期434-437,共4页
Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly ... Virtualization is the key technology of cloud computing. Network virtualization plays an important role in this field. Its performance is very relevant to network virtualizing. Nowadays its implementations are mainly based on the idea of Software Define Network (SDN). Open vSwitch is a sort of software virtual switch, which conforms to the OpenFlow protocol standard. It is basically deployed in the Linux kernel hypervisor. This leads to its performance relatively poor because of the limited system resource. In turn, the packet process throughput is very low.In this paper, we present a Cavium-based Open vSwitch implementation. The Cavium platform features with multi cores and couples of hard ac-celerators. It supports zero-copy of packets and handles packet more quickly. We also carry some experiments on the platform. It indicates that we can use it in the enterprise network or campus network as convergence layer and core layer device. 展开更多
关键词 SDN OPEN vSwitch Network processors OpenFlow
在线阅读 下载PDF
High-Level Portable Programming Language for Optimized Memory Use of Network Processors
5
作者 Yasusi Kanada 《Communications and Network》 2015年第1期55-69,共15页
Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. ... Network processors (NPs) are widely used for programmable and high-performance networks;however, the programs for NPs are less portable, the number of NP program developers is small, and the development cost is high. To solve these problems, this paper proposes an open, high-level, and portable programming language called “Phonepl”, which is independent from vendor-specific proprietary hardware and software but can be translated into an NP program with high performance especially in the memory use. A common NP hardware feature is that a whole packet is stored in DRAM, but the header is cached in SRAM. Phonepl has a hardware-independent abstraction of this feature so that it allows programmers mostly unconscious of this hardware feature. To implement the abstraction, four representations of packet data type that cover all the packet operations (including substring, concatenation, input, and output) are introduced. Phonepl have been implemented on Octeon NPs used in plug-ins for a network-virtualization environment called the VNode Infrastructure, and several packet-handling programs were evaluated. As for the evaluation result, the conversion throughput is close to the wire rate, i.e., 10 Gbps, and no packet loss (by cache miss) occurs when the packet size is 256 bytes or larger. 展开更多
关键词 NETWORK processors PORTABILITY HIGH-LEVEL Language Hardware INDEPENDENCE MEMORY Usage DRAM SRAM NETWORK Virtualization
在线阅读 下载PDF
Export Processors to be Pooled
6
《China's Foreign Trade》 2000年第5期45-45,共1页
关键词 Export processors to be Pooled
在线阅读 下载PDF
处理器Processors
7
《个人电脑》 2004年第1期110-110,共1页
关键词 微处理器 芯片组 主板 AMD ATHLON 64 processors
在线阅读 下载PDF
Processors处理器
8
《个人电脑》 2004年第1期89-89,共1页
2003年里,我们在处理器市场上看到了一幕幕重头戏的上演,基于Hyper Threading技术的Pentium4处理器、Pentium 4 EE、AMD Athlon 64、AMD Athlon 64FX系列,这些新产品的快速推出让人有目不暇接的感觉。
关键词 processors 微处理器 体系结构 CPU 纳米制造工艺
在线阅读 下载PDF
LOGIC STRUCTURE OF PROGRAMMABLE INSTRUCTIONS FOR JAVA PROCESSORS 被引量:2
9
作者 Chen Zhirui Tan Hongzhou 《Journal of Electronics(China)》 2009年第5期711-714,共4页
There are varieties of embedded systems in the world. It is a big challenge to optimize the instruction sets of System on Chips (SoCs) according to different systems' working environments. The idea of programmable... There are varieties of embedded systems in the world. It is a big challenge to optimize the instruction sets of System on Chips (SoCs) according to different systems' working environments. The idea of programmable instruction set is an effective method to gain embedded system's re-configurability. This letter presents a logic module for Java processor to be capable of using programmable instruction set. Cost (area, power, and timing) of the module is trivial. Such module is also reusable for other embedded system solutions besides Java systems. 展开更多
关键词 Programmable instructions Java processor System on Chips (SoCs)
在线阅读 下载PDF
Multi-core optimization for conjugate gradient benchmark on heterogeneous processors
10
作者 邓林 窦勇 《Journal of Central South University》 SCIE EI CAS 2011年第2期490-498,共9页
Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at t... Developing parallel applications on heterogeneous processors is facing the challenges of 'memory wall',due to limited capacity of local storage,limited bandwidth and long latency for memory access. Aiming at this problem,a parallelization approach was proposed with six memory optimization schemes for CG,four schemes of them aiming at all kinds of sparse matrix-vector multiplication (SPMV) operation. Conducted on IBM QS20,the parallelization approach can reach up to 21 and 133 times speedups with size A and B,respectively,compared with single power processor element. Finally,the conclusion is drawn that the peak bandwidth of memory access on Cell BE can be obtained in SPMV,simple computation is more efficient on heterogeneous processors and loop-unrolling can hide local storage access latency while executing scalar operation on SIMD cores. 展开更多
关键词 multi-core processor NAS parallelization CG memory optimization
在线阅读 下载PDF
Trends of Communication Processors
11
作者 LIU Dake CAI Zhaoyun WANG Wei 《China Communications》 SCIE CSCD 2016年第1期1-16,共16页
Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including... Processors have been playing important roles in both communication infrastructure systems and terminals.In this paper,both application specific and general purpose processors for communications are discussed including the roles,the history,the current situations,and the trends.One trend is that ASIPs(Application Specific Instruction-set Processors) are taking over ASICs(Application Specific Integrated Circuits) because of the increasing needs both on performance and compatibility of multi-modes.The trend opened opportunities for researchers crossing the boundary between communications and computer architecture.Another trend is the serverlization,i.e.,more infrastructure equipments are replaced by servers.The trend opened opportunities for researchers working towards high performance computing for communication,such as research on communication algorithm kernels and real time programming methods on servers. 展开更多
关键词 ASIP baseband processor network processor application processor server processor
在线阅读 下载PDF
Dynamic Power Dissipation Control Method for Real-Time Processors Based on Hardware Multithreading
12
作者 罗新强 齐悦 +1 位作者 王磊 王沁 《China Communications》 SCIE CSCD 2013年第5期156-166,共11页
In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware m... In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance. 展开更多
关键词 dynamic power dissipation control real-time processor hardware multithread low power design energy efficiency
在线阅读 下载PDF
Broadband unidirectional visible imaging using wafer-scale nano-fabrication of multi-layer diffractive optical processors
13
作者 Che-Yung Shen Paolo Batoni +6 位作者 Xilin Yang Jingxi Li Kun Liao Jared Stack Jeff Gardner Kevin Welch Aydogan Ozcan 《Light(Science & Applications)》 2025年第9期2821-2838,共18页
We present a broadband and polarization-insensitive unidirectional imager that operates at the visible part of the spectrum,where image formation occurs in one direction,while in the opposite direction,it is blocked.T... We present a broadband and polarization-insensitive unidirectional imager that operates at the visible part of the spectrum,where image formation occurs in one direction,while in the opposite direction,it is blocked.This approach is enabled by deep learning-driven diffractive optical design with wafer-scale nano-fabrication using high-purity fused silica to ensure optical transparency and thermal stability.Our design achieves unidirectional imaging across three visible wavelengths(covering red,green,and blue parts of the spectrum),and we experimentally validated this broadband unidirectional imager by creating high-fidelity images in the forward direction and generating weak,distorted output patterns in the backward direction,in alignment with our numerical simulations.This work demonstrates wafer-scale production of diffractive optical processors,featuring 16 levels of nanoscale phase features distributed across two axially aligned diffractive layers for visible unidirectional imaging.This approach facilitates mass-scale production of~0.5 billion nanoscale phase features per wafer,supporting high-throughput manufacturing of hundreds to thousands of multi-layer diffractive processors suitable for large apertures and parallel processing of multiple tasks.Beyond broadband unidirectional imaging in the visible spectrum,this study establishes a pathway for artificial-intelligence-enabled diffractive optics with versatile applications,signaling a new era in optical device functionality with industrial-level,massively scalable fabrication. 展开更多
关键词 deep learning wafer scale fabrication multi layer diffractive optical processors broadband imaging unidirectional imaging polarization insensitive high purity fused silica diffractive optical design
原文传递
Optimizing frequency allocation for superconducting quantum processors with frequency-tunable qubits
14
作者 Bi-Ying Wang Wuxin Liu +3 位作者 Xiangyu Chen Shu Xu Jiangyu Cui Man-Hong Yung 《Science China(Physics,Mechanics & Astronomy)》 2025年第2期23-36,共14页
As superconducting quantum processors scale,a key challenge is maintaining high coherence times and fidelity control over numerous qubits.We propose an automatic frequency allocation method for frequency-tunable qubit... As superconducting quantum processors scale,a key challenge is maintaining high coherence times and fidelity control over numerous qubits.We propose an automatic frequency allocation method for frequency-tunable qubits that equally considers coherence-limited fidelity and crosstalk-induced control errors during the allocation process.By employing a weighted average of the objective functions for coherence time and crosstalk,we numerically calculate gate fidelity to establish an open-loop optimization for determining suitable weight factors.This results in an efficient objective function for frequency optimization.We apply our method to frequency-tunable transmon qubits with tunable couplers,both theoretically and experimentally.The numerical results demonstrate significant advantages,including substantial reductions in gate errors and faster operation times,especially at higher qubit counts.Experimentally,our approach successfully achieves approximately 99.9%single-qubit fidelity on a nine-qubit chip. 展开更多
关键词 frequency allocation transmon qubits superconducting processors
原文传递
Parallel Implementation of Radiation Hydrodynamics Coupled with Particle Transport on Software Infrastructure JASMIN
15
作者 REN Jian WEI Junxia CAO Xiaolin 《计算物理》 北大核心 2025年第5期608-618,共11页
In this work,we present a parallel implementation of radiation hydrodynamics coupled with particle transport,utilizing software infrastructure JASMIN(J Adaptive Structured Meshes applications INfrastructure)which enca... In this work,we present a parallel implementation of radiation hydrodynamics coupled with particle transport,utilizing software infrastructure JASMIN(J Adaptive Structured Meshes applications INfrastructure)which encapsulates high-performance technology for the numerical simulation of complex applications.Two serial codes,radiation hydrodynamics RH2D and particle transport Sn2D,have been integrated into RHSn2D on JASMIN infrastructure,which can efficiently use thousands of processors to simulate the complex multi-physics phenomena.Moreover,the non-conforming processors strategy has ensured RHSn2D against the serious load imbalance between radiation hydrodynamics and particle transport for large scale parallel simulations.Numerical results show that RHSn2D achieves a parallel efficiency of 17.1%using 90720 cells on 8192 processors compared with 256 processors in the same problem. 展开更多
关键词 processors strategy parallel performance radiation hydrodynamics particle transport multi-physics models software infrastructure
原文传递
High Performance General-Purpose Microprocessors: Past and Future 被引量:5
16
作者 胡伟武 侯锐 +1 位作者 肖俊华 章隆宾 《Journal of Computer Science & Technology》 SCIE EI CSCD 2006年第5期631-640,共10页
It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to sim... It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to simple, and new innovative architecture will emerge to utilize the continuously increasing transistor budgets. The growing importance of wire delays, changing workloads, power consumption, and design/verification complexity will drive the forthcoming era of Chip Multiprocessors (CMPs). Furthermore, typical CMP projects both from industries and from academics are investigated. Through going into depths for some primary theoretical and implementation problems of CMPs, the great challenges and opportunities to future CMPs are presented and discussed. Finally, the Godson series microprocessors designed in China are introduced. 展开更多
关键词 high performance general-purpose microprocessor instruction level parallelism data level parallelism thread level parallelism chip multiprocessors Godson processor
原文传递
PARAMETRIC BOUNDS FOR LPT SCHEDULING ON UNIFORM PROCESSORS 被引量:2
17
作者 陈礴 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 1991年第1期67-73,共7页
The nonpreemptive assignment of independent tasks to a system of m uniform processors isexamined with the objective of minimizing the makespan. Using r_m, the ratio of the fasest speed tothe slowest speed of the syste... The nonpreemptive assignment of independent tasks to a system of m uniform processors isexamined with the objective of minimizing the makespan. Using r_m, the ratio of the fasest speed tothe slowest speed of the system, as a parameter, we assess the performance of LPT (largestprocessing time) schedule with respect to optimal schedules. It is shown thet the worst-case boundfor the ratio of the two schedule lengths is between 展开更多
关键词 LPT NG Pr PARAMETRIC BOUNDS FOR LPT SCHEDULING ON UNIFORM processors
原文传递
Fault Tolerance Mechanism in Chip Many-Core Processors 被引量:1
18
作者 张磊 韩银和 +1 位作者 李华伟 李晓维 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第S1期169-174,共6页
As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performan... As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors’ yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors’ performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time. 展开更多
关键词 chip many-core processors YIELD fault tolerance RECONFIGURATION NETWORK-ON-CHIP
原文传递
Taxonomy of Data Prefetching for Multicore Processors 被引量:1
19
作者 Surendra Byna 陈勇 孙贤和 《Journal of Computer Science & Technology》 SCIE EI CSCD 2009年第3期405-417,共13页
Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support,... Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream. While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well. 展开更多
关键词 taxonomy of prefetching strategies multicore processors data prefetching memory hierarchy
原文传递
Energy Efficient Block-Partitioned Multicore Processors for Parallel Applications
20
作者 祁轩 朱大开 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第3期418-433,共16页
Due to the increasing power consumption in modern computing systems, energy management has become an important research area in the last decade. Recently, multicore has emerged to be an energy efficient architecture t... Due to the increasing power consumption in modern computing systems, energy management has become an important research area in the last decade. Recently, multicore has emerged to be an energy efficient architecture that exploits parallelisms in modern applications. However, as the number of cores on a single chip continues to increase, it has been a grand challenge on how to effectively manage the energy efficiency of multicore-based systems. In this paper, based on the voltage island and dynamic voltage and frequency scaling (DVFS) techniques, we investigate the energy efficiency of block-partitioned multieore processors, where cores are grouped into blocks with the cores on one block sharing a DVFS- enabled power supply. Depending on the number of cores on each block, we study both symmetric and asymmetric block configurations. We develop a system-level power model (which can support various power management techniques) and derive both block- and system-wide energy-efficient frequencies for systems with block-partitioned multieore processors. Based on the power model, we prove that, for embarrassingly parallel applications, having all cores on a single block can achieve the same energy savings as that of the individual block configuration (where each core forms a single block and has its own power supply). However, for applications with limited degrees of parallelism, we show the superiority of the buddy-asymmetric block configuration, where the number of required blocks (and power supplies) is logarithmically related to the number of cores on the chip, in that it can achieve the same amount of energy savings as that of the individual block configuration. The energy efficiency of different block configurations is further evaluated through extensive simulations with both synthetic as well as a real life application. 展开更多
关键词 multicore processors dynamic voltage and frequency scaling (DVFS) voltage islands parallel applications
原文传递
上一页 1 2 17 下一页 到第
使用帮助 返回顶部