期刊文献+
共找到169篇文章
< 1 2 9 >
每页显示 20 50 100
Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture 被引量:13
1
作者 郑方 李宏亮 +3 位作者 吕晖 过锋 许晓红 谢向辉 《Journal of Computer Science & Technology》 SCIE EI CSCD 2015年第1期145-162,共18页
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which h... Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS. 展开更多
关键词 heterogeneous many-core processor data stream transfer register-level communication mechanism hardwaresynchronization technique processor prototype
原文传递
Typhoon Case Comparison Analysis Between Heterogeneous Many-Core and Homogenous Multicore Supercomputing Platforms
2
作者 LIU Xin YU Xiaolin +5 位作者 ZHAO Haoran HAN Qiqi ZHANG Jie WANG Chengzhi MA Weiwei XU Da 《Journal of Ocean University of China》 SCIE CAS CSCD 2023年第2期324-334,共11页
In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homoge... In this paper,a typical experiment is carried out based on a high-resolution air-sea coupled model,namely,the coupled ocean-atmosphere-wave-sediment transport(COAWST)model,on both heterogeneous many-core(SW)and homogenous multicore(Intel)supercomputing platforms.We construct a hindcast of Typhoon Lekima on both the SW and Intel platforms,compare the simulation results between these two platforms and compare the key elements of the atmospheric and ocean modules to reanalysis data.The comparative experiment in this typhoon case indicates that the domestic many-core computing platform and general cluster yield almost no differences in the simulated typhoon path and intensity,and the differences in surface pressure(PSFC)in the WRF model and sea surface temperature(SST)in the short-range forecast are very small,whereas a major difference can be identified at high latitudes after the first 10 days.Further heat budget analysis verifies that the differences in SST after 10 days are mainly caused by shortwave radiation variations,as influenced by subsequently generated typhoons in the system.These typhoons generated in the hindcast after the first 10 days attain obviously different trajectories between the two platforms. 展开更多
关键词 heterogeneous many-core supercomputing platform homogenous multicore supercomputing platform comparison analysis typhoon case
在线阅读 下载PDF
Optimization Task Scheduling Using Cooperation Search Algorithm for Heterogeneous Cloud Computing Systems 被引量:2
3
作者 Ahmed Y.Hamed M.Kh.Elnahary +1 位作者 Faisal S.Alsubaei Hamdy H.El-Sayed 《Computers, Materials & Continua》 SCIE EI 2023年第1期2133-2148,共16页
Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the ... Cloud computing has taken over the high-performance distributed computing area,and it currently provides on-demand services and resource polling over the web.As a result of constantly changing user service demand,the task scheduling problem has emerged as a critical analytical topic in cloud computing.The primary goal of scheduling tasks is to distribute tasks to available processors to construct the shortest possible schedule without breaching precedence restrictions.Assignments and schedules of tasks substantially influence system operation in a heterogeneous multiprocessor system.The diverse processes inside the heuristic-based task scheduling method will result in varying makespan in the heterogeneous computing system.As a result,an intelligent scheduling algorithm should efficiently determine the priority of every subtask based on the resources necessary to lower the makespan.This research introduced a novel efficient scheduling task method in cloud computing systems based on the cooperation search algorithm to tackle an essential task and schedule a heterogeneous cloud computing problem.The basic idea of thismethod is to use the advantages of meta-heuristic algorithms to get the optimal solution.We assess our algorithm’s performance by running it through three scenarios with varying numbers of tasks.The findings demonstrate that the suggested technique beats existingmethods NewGenetic Algorithm(NGA),Genetic Algorithm(GA),Whale Optimization Algorithm(WOA),Gravitational Search Algorithm(GSA),and Hybrid Heuristic and Genetic(HHG)by 7.9%,2.1%,8.8%,7.7%,3.4%respectively according to makespan. 展开更多
关键词 heterogeneous processors cooperation search algorithm task scheduling cloud computing
在线阅读 下载PDF
MT-office:parallel password recovery program for office on domestic heterogeneous multi-core processor
4
作者 Yongtao Luo Bo Yang +5 位作者 Jie Liu Ruibo Wang Jinmin Wen Tiaojie Xiao Xuguang Chen Chunye Gong 《CCF Transactions on High Performance Computing》 2023年第3期231-244,共14页
With the improvement of security awareness,in order to guarantee information security,more advanced and secure encryption algorithms are applied to Microsoft Office.People also set more complex encryption passwords.Ho... With the improvement of security awareness,in order to guarantee information security,more advanced and secure encryption algorithms are applied to Microsoft Office.People also set more complex encryption passwords.However,once the initial password is forgotten,the encrypted information needs to be retrieved.The conventional brute force cracking methods and password recovery programs can hardly meet the actual deciphering needs.To this end,we develop a distributed parallel password recovery program(MT-Office)for Microsoft Office on the domestic heterogeneous multi-core processor(MT-3000).MT-Office takes full advantage of the multi-core and heterogeneous features of MT-3000,and is optimized and improved in both vectorization and global computing.At the same time,MT-Office provides multiple recovery strategies in password generation to improve the recovery efficiency.Compared with other platforms(e.g.,Intel platforms and FT platforms),MT-3000 heterogeneous platform can achieve 60×–218×speedup ratio.For Office2010,we perform a strong scalability test on the new-generation supercomputer in National Supercomputer Center in Tianjin.MT-Office not only extends to 65,536 acceleration clusters on this system,shows good scalability,but also achieves almost linear speedup ratio.For Office2007,compared with other password recovery programs,MT-Office can achieve 2.5×–131.1×speedup ratio.It can be seen that MT-Office can better exploit the advantages of MT-3000,which not only has good scalability and parallelism,but also has faster deciphering speed and can be applied to practical engineering application. 展开更多
关键词 Office password recovery heterogeneous multi-core processor SIMD heterogeneous computing
在线阅读 下载PDF
基于任务同步的异构多核实时系统节能调度算法
5
作者 赵小松 黄超 +1 位作者 李鉴 康玉龙 《计算机科学》 北大核心 2026年第1期241-251,共11页
目前,多核实时系统中同步任务的节能调度研究主要针对的是同构多核处理器平台,而异构多核处理器架构能够更有效地发挥系统性能。将现有的研究直接应用于异构多核系统,在保证可调度性的情况下会导致能耗变高。对此,通过使用动态电压与频... 目前,多核实时系统中同步任务的节能调度研究主要针对的是同构多核处理器平台,而异构多核处理器架构能够更有效地发挥系统性能。将现有的研究直接应用于异构多核系统,在保证可调度性的情况下会导致能耗变高。对此,通过使用动态电压与频率调节(Dynamic Voltage Frequency Scaling,DVFS)技术,研究异构多核实时系统中基于任务同步的节能调度问题,提出同步感知的最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First,SA-LESF)。该算法针对所有任务的速度配置进行迭代优化,直至所有任务均达到其最大限度节能的速度配置。此外,进一步提出基于动态松弛时间回收的同步感知最大能耗节省优先算法(Synchronization Aware-Largest Energy Saved First with Dynamic Reclamation,SA-LESF-DR)。该算法在保证实时任务可调度的同时,实施相应的回收策略,进一步降低系统能耗。实验结果表明,SA-LESF与SA-LESF-DR算法在能耗表现上具有优势,在相同任务集下,相比其他算法可节省高达30%的能耗。 展开更多
关键词 实时系统 异构多核处理器 任务同步 节能调度
在线阅读 下载PDF
Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors
6
作者 Feng Zhang Weifeng Liu +2 位作者 Ningxuan Feng Jidong Zhai Xiaoyong Du 《CCF Transactions on High Performance Computing》 2019年第2期131-143,共13页
Heterogeneous processors integrate very distinct compute resources such as CPUs and GPUs into the same chip,thus can exploit the advantages and avoid disadvantages of those compute units.We in this work evaluate and a... Heterogeneous processors integrate very distinct compute resources such as CPUs and GPUs into the same chip,thus can exploit the advantages and avoid disadvantages of those compute units.We in this work evaluate and analyze eight sparse matrix and graph kernels on an AMD CPU-GPU heterogeneous processor by using 956 sparse matrices.Five characteristics,i.e.,load balancing,indirect addressing,memory reallocation,atomic operations,and dynamic characteristics are our major considerations.The experimental results show that although the CPU and GPU parts access the same DRAM,very different performance behaviors are observed.For example,though the GPU part in general outperforms the CPU part,it cannot achieve the best performance in all cases given by the CPU part.Moreover,the bandwidth utilization of atomic operations on heterogeneous processors can be much higher than a high-end discrete GPU. 展开更多
关键词 heterogeneous processor Performance analysis Sparse matrix computation
在线阅读 下载PDF
swDaCe:一种申威众核处理器上以数据为中心的并行编程模型设计与实现
7
作者 沈沛祺 陈俊仕 安虹 《小型微型计算机系统》 北大核心 2026年第3期751-759,共9页
高性能科学计算是超级计算机的核心应用领域,包括粒子模拟、气候分析等关键任务.然而,随着摩尔定律逐渐失效,超级计算机体系结构日益趋向异构和复杂,导致科学计算应用的开发和优化变得更加困难.为解决这一问题,本文基于新一代申威超级... 高性能科学计算是超级计算机的核心应用领域,包括粒子模拟、气候分析等关键任务.然而,随着摩尔定律逐渐失效,超级计算机体系结构日益趋向异构和复杂,导致科学计算应用的开发和优化变得更加困难.为解决这一问题,本文基于新一代申威超级计算平台,提出并实现了一种以数据为中心的并行编程模型——swDaCe.该模型通过解耦数据流图优化与原始程序,使得编程人员可以使用Python描述计算逻辑,并最终生成适配申威众核架构的高性能C++代码.此外,本文提出了一系列针对申威架构的数据流优化方法,包括从核任务映射、向量化并行以及DMA访存优化,以充分利用申威众核处理器的计算能力.实验结果表明,swDaCe生成的代码在稀疏矩阵计算等典型应用中实现了显著的性能提升,单核组加速比达到25倍以上,验证了该框架在申威架构上的有效性. 展开更多
关键词 新一代神威平台 异构众核处理器 数据流编程 并行计算 稀疏矩阵乘
在线阅读 下载PDF
基于AG32异构处理器的数字锁相放大器设计
8
作者 刘国福 柳革命 +1 位作者 李岩 刘婵娟 《仪表技术》 2026年第1期13-16,77,共5页
锁相放大器因其优异的噪声抑制能力而被广泛应用于精密测量。为满足现场应用对设备便携性、低成本及小体积的需求,基于国产AG32系列异构双核(RISC-V+FPGA)处理器,设计了一款集成混合型数字锁相放大器。该设计利用AG32的外设资源简化了... 锁相放大器因其优异的噪声抑制能力而被广泛应用于精密测量。为满足现场应用对设备便携性、低成本及小体积的需求,基于国产AG32系列异构双核(RISC-V+FPGA)处理器,设计了一款集成混合型数字锁相放大器。该设计利用AG32的外设资源简化了系统结构,借助其FPGA资源提升了频率测量精度,并通过RISC-V处理器增强了系统功能。实验表明,当信噪比为1时,在1 Hz~10 kHz信号频率范围内,该放大器的幅度相对误差绝对值≤1.25%,相位绝对误差绝对值≤0.5°;当信噪比为0.1时,幅度相对误差绝对值≤4.50%,相位绝对误差绝对值≤2.0°。研究成果为矢量电压测量、频谱分析等领域提供了新的技术途径。 展开更多
关键词 数字锁相放大器 异构双核处理器 第五代精简指令集架构 现场可编程逻辑门阵列
原文传递
基于多源数据整合的小提琴音乐风格识别系统构建
9
作者 杨阳冰 林广乐 《山西师范大学学报(自然科学版)》 2026年第1期24-29,共6页
研究采用双处理器对工作顺序设置以及工作程序运作进行分配,让音乐播放的顺畅且实现稳定的可视化,通过音频读取模块和音频采集模块对信号数据进行读取和采集.对多种颜色的灯管在不同的光时、光波进行交替操作实现混光效果;然后通过多源... 研究采用双处理器对工作顺序设置以及工作程序运作进行分配,让音乐播放的顺畅且实现稳定的可视化,通过音频读取模块和音频采集模块对信号数据进行读取和采集.对多种颜色的灯管在不同的光时、光波进行交替操作实现混光效果;然后通过多源数据整合计算对野值数据修正与填充,对多源异构数据进行校对,在分解法的处理下完成音频数据的去噪;最后在获取多源数据的特征值获取识别分类结果的基础上,通过对旋律片断、音符序列、节奏的分析,实现小提琴音乐风格识别.实验表明,文章系统在识别小提琴音乐风格时,最高用时将近11.2 s远低于其他系统;数据整合准确率最高为98.96%;对小提琴的旋律片断、音符序列、节奏识别效果精准率高达99.76%,多源数据整合精准度方面最优,且识别系统可靠性最强. 展开更多
关键词 双处理器 小提琴音乐 风格识别 系统构建 多源异构数据
在线阅读 下载PDF
An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many‑core architecture
10
作者 Libo Zhang Xingquan Mao +2 位作者 Hongtao You Long Gu Xiaocheng Jiang 《CCF Transactions on High Performance Computing》 2020年第4期323-331,共9页
Now the OpenACC has become a popular programming interface for many-core application programming.Internationally,a lot of research have been done on OpenACC for CPU+GPU heterogeneous many-core architecture.Among them,... Now the OpenACC has become a popular programming interface for many-core application programming.Internationally,a lot of research have been done on OpenACC for CPU+GPU heterogeneous many-core architecture.Among them,the PGI OpenACC compiler developed by NVIDIA is the most advanced one.But there are few research on OpenACC related to the Home Grown Heterogeneous Many-Core(HGHM)Architecture that is different from GPU.This paper proposes an automatic mapping technique for OpenACC kernel code based on the OpenACC compiler to a heterogeneous and deeply fused many-core architecture.Our approach uses the static analysis and feedback dynamic analysis of the compiler to perform the automatic mapping of the program parallel kernel code to many-core devices,and it greatly improves the transformation quality of the compiler.Experimental results show that this technique can greatly improve the efficiency of using OpenACC to port applications to heterogeneous and fused many-core system without impacting program acceleration performance. 展开更多
关键词 Supercomputer heterogeneous many-core Fused OpenACC Data layout Automatic mapping
在线阅读 下载PDF
Parallel programming models for heterogeneous many‑cores:a comprehensive survey
11
作者 Jianbin Fang Chun Huang +1 位作者 Tao Tang Zheng Wang 《CCF Transactions on High Performance Computing》 2020年第4期382-400,共19页
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers.While heterogeneous many-core design offers the potential for energy-efficient high-perfor... Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers.While heterogeneous many-core design offers the potential for energy-efficient high-performance,such potential can only be unlocked if the application programs are suitably parallel and can be made to match the underlying heterogeneous platform.In this article,we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability.We examine various software optimization techniques for minimizing the communicating overhead between heterogeneous computing devices.We provide a road map for a wide variety of different research areas.We conclude with a discussion on open issues in the area and potential research directions.This article provides both an accessible introduction to the fast-moving area of heterogeneous programming and a detailed bibliography of its main achievements. 展开更多
关键词 heterogeneous computing many-core architectures Parallel programming models
在线阅读 下载PDF
一种新的异构多核平台下多类型DAG调度方法 被引量:3
12
作者 左俊杰 肖锋 +3 位作者 黄姝娟 沈超 郝鹏涛 陈磊 《计算机应用研究》 北大核心 2025年第2期514-518,共5页
异构多核处理器在异构环境中受限于处理器种类,只能在特定处理器上执行。现有调度方法通常使用多类型DAG(directed acyclic graph)任务模型进行模拟,但调度方法往往忽略不同核上的通信开销,或未考虑处理器与节点的对应关系,导致调度时... 异构多核处理器在异构环境中受限于处理器种类,只能在特定处理器上执行。现有调度方法通常使用多类型DAG(directed acyclic graph)任务模型进行模拟,但调度方法往往忽略不同核上的通信开销,或未考虑处理器与节点的对应关系,导致调度时间开销较大,处理器资源未充分利用,任务效率低。针对上述问题,提出了PNIF(processor-node impact factor)算法。该算法引入了两个对节点优先级具有重大影响的比例因子,将它们加入到节点优先级的计算中从而确定任务执行顺序。实验结果表明,PNIF比PEFT、HEFT、CPOP在调度长度上分别平均提升5.902%、19.402%、25.831%,有效缩短了整体调度长度,提升了处理器资源利用率。 展开更多
关键词 异构多核处理器 多类型DAG任务 任务调度 影响因子 PNIF算法
在线阅读 下载PDF
面向智能物联网异构嵌入式芯片的自适应算子并行分割方法 被引量:2
13
作者 林政 刘思聪 +2 位作者 郭斌 丁亚三 於志文 《计算机科学》 北大核心 2025年第2期299-309,共11页
随着人民生活质量的持续提升与科技发展的日新月异,智能手机等移动设备在全球范围内得到了广泛普及。在这一背景下,深度神经网络在移动端的部署与应用成为了研究的热点。深度神经网络不仅推动了移动应用领域的显著进步,同时也对使用电... 随着人民生活质量的持续提升与科技发展的日新月异,智能手机等移动设备在全球范围内得到了广泛普及。在这一背景下,深度神经网络在移动端的部署与应用成为了研究的热点。深度神经网络不仅推动了移动应用领域的显著进步,同时也对使用电池供电的移动设备的能效管理提出了更高要求。当今移动设备中异构处理器的兴起给优化能效带来了新的挑战,在不同处理器间分配计算任务以实现深度神经网络并行处理和加速,并不一定能够优化能耗,甚至可能会增加能耗。针对这一问题,提出了一种能效优化的深度神经网络自适应并行计算调度系统。该系统包括一个运行时能耗分析器与在线算子划分执行器,能够根据动态设备条件动态调整算子分配,在保持高响应性的同时,优化了移动设备异构处理器上的计算能效。实验结果证明,相比基准方法,能效优化的深度神经网络自适应并行计算调度系统在移动设备深度神经网络上的平均能耗和平均时延减少了5.19%和9.0%,最大能耗和最大时延减少了18.35%和21.6%。 展开更多
关键词 深度神经网络 移动设备 能效优化 异构处理器 能耗预测
在线阅读 下载PDF
面向昇腾处理器的高性能同步原语自动插入方法
14
作者 李帅江 张馨元 +4 位作者 赵家程 田行辉 石曦予 徐晓忻 崔慧敏 《计算机研究与发展》 北大核心 2025年第8期1962-1978,共17页
指令级并行(instruction level parallism,ILP)是处理器体系结构研究的经典难题.以昇腾为代表的领域定制架构将更多的流水线细节暴露给上层软件,由编译器/程序员显式控制流水线之间的同步来优化ILP,但是流水线之间的物理同步资源是有限... 指令级并行(instruction level parallism,ILP)是处理器体系结构研究的经典难题.以昇腾为代表的领域定制架构将更多的流水线细节暴露给上层软件,由编译器/程序员显式控制流水线之间的同步来优化ILP,但是流水线之间的物理同步资源是有限的,限制了ILP的提升.针对这一问题,提出一种面向昇腾处理器的高性能同步原语自动插入方法,通过引入“虚拟同步资源”的抽象将同步原语的插入和物理同步资源的选择进行解耦.首先提出了一种启发式算法在复杂的控制流图上进行虚拟同步原语的插入,随后通过虚拟同步原语合并等技术,将虚拟同步资源映射到有限数量的物理同步资源上,并同时在满足程序正确性与严苛硬件资源限制的前提下,根据指令间的偏序关系删除程序中冗余的同步原语.使用指令级与算子级基准测试程序在昇腾910A平台上的实验表明,该方法自动插入同步原语的程序在保证正确性的基础上,整体性能与专家程序员手动插入同步原语接近或持平. 展开更多
关键词 昇腾处理器 同步原语 异构编程 领域定制架构 自动插入
在线阅读 下载PDF
面向天河新一代超算系统的大规模精确对角化方法
15
作者 李彪 刘杰 王庆林 《计算机研究与发展》 北大核心 2025年第6期1347-1362,共16页
精确对角化(exact diagonalization)方法是一种在量子物理、凝聚态物理等领域广泛应用的数值计算方法,是最直接求得量子系统基态的数值方法.仅从哈密顿矩阵的对称性出发,利用无矩阵(matrix-free)方法、分层通信模型以及适配于MT-3000的... 精确对角化(exact diagonalization)方法是一种在量子物理、凝聚态物理等领域广泛应用的数值计算方法,是最直接求得量子系统基态的数值方法.仅从哈密顿矩阵的对称性出发,利用无矩阵(matrix-free)方法、分层通信模型以及适配于MT-3000的数据级并行算法,提出了面向天河新一代超算系统上的超大稀疏哈密顿矩阵向量乘异构并行算法,可以实现基于一维Hubbard模型的大规模精确对角化.提出的并行算法在天河新一代超算系统上进行了测试,其中在1400亿维度矩阵规模上,8192进程相比256进程强扩展效率为55.27%,而弱扩展到7300亿维度矩阵规模上,13740个进程相比64进程的弱扩展效率保持在51.25%以上. 展开更多
关键词 精确对角化 HUBBARD模型 异构并行计算 MT-3000处理器 量子多体系统
在线阅读 下载PDF
一种板级异构核间多模通信的软硬件设计方法 被引量:2
16
作者 李锐 杜彬 王远波 《汽车电器》 2025年第6期100-102,共3页
随着车联网技术的高速发展和车载电控单元复杂性的提升,传统的单处理器难以满足数据交互与处理日益复杂和多样化的需求。文章提出一种板级异构核间多模通信机制,设计集成高实时性MCU和高性能SOC的硬件平台,并对异构多模通信的硬件结构... 随着车联网技术的高速发展和车载电控单元复杂性的提升,传统的单处理器难以满足数据交互与处理日益复杂和多样化的需求。文章提出一种板级异构核间多模通信机制,设计集成高实时性MCU和高性能SOC的硬件平台,并对异构多模通信的硬件结构进行阐述。在此基础上,提出分层、低耦合、高内聚的轻量级组件化软件设计方案,阐明驱动层、接口层、网络层、协议层、传输层和应用层的通信机制。该机制在提升异构多核环境运算效率的同时,实现处理器性能的优化,提高通信传输数据的品质。 展开更多
关键词 车联网 核间通信 MCU SOC 异构处理器
在线阅读 下载PDF
Fault Tolerance Mechanism in Chip Many-Core Processors 被引量:1
17
作者 张磊 韩银和 +1 位作者 李华伟 李晓维 《Tsinghua Science and Technology》 SCIE EI CAS 2007年第S1期169-174,共6页
As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performan... As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors’ yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors’ performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time. 展开更多
关键词 chip many-core processors YIELD fault tolerance RECONFIGURATION NETWORK-ON-CHIP
原文传递
一种异构多核系统动态调度协处理器设计 被引量:1
18
作者 曾树铭 倪伟 《合肥工业大学学报(自然科学版)》 北大核心 2025年第2期185-195,共11页
为研究异构多核片上系统(multi-processor system on chip,MPSoC)在密集并行计算任务中的潜力,文章设计并实现了一种适用于粗粒度数据特征、面向任务级并行应用的异构多核系统动态调度协处理器,采用了片上缓存、任务输出的多级写回管理... 为研究异构多核片上系统(multi-processor system on chip,MPSoC)在密集并行计算任务中的潜力,文章设计并实现了一种适用于粗粒度数据特征、面向任务级并行应用的异构多核系统动态调度协处理器,采用了片上缓存、任务输出的多级写回管理、任务自动映射、通讯任务乱序执行等机制。实验结果表明,该动态调度协处理器不仅能够实现任务级乱序执行等基本设计目标,还具有极低的调度开销,相较于基于动态记分牌算法的调度器,运行多个子孔径距离压缩算法的时间降低达17.13%。研究结果证明文章设计的动态调度协处理器能够有效优化目标场景下的任务调度效果。 展开更多
关键词 动态调度 硬件调度器 异构多核系统 任务级并行 编程模型 片上缓存 片上网络
在线阅读 下载PDF
Parallelization and sustainability of distributed genetic algorithms on many-core processors
19
作者 Yuji Sato Mikiko Sato 《International Journal of Intelligent Computing and Cybernetics》 EI 2014年第1期2-23,共22页
Purpose–The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core pr... Purpose–The purpose of this paper is to propose a fault-tolerant technology for increasing the durability of application programs when evolutionary computation is performed by fast parallel processing on many-core processors such as graphics processing units(GPUs)and multi-core processors(MCPs).Design/methodology/approach–For distributed genetic algorithm(GA)models,the paper proposes a method where an island’s ID number is added to the header of data transferred by this island for use in fault detection.Findings–The paper has shown that the processing time of the proposed idea is practically negligible in applications and also shown that an optimal solution can be obtained even with a single stuck-at fault or a transient fault,and that increasing the number of parallel threads makes the system less susceptible to faults.Originality/value–The study described in this paper is a new approach to increase the sustainability of application program using distributed GA on GPUs and MCPs. 展开更多
关键词 Evolutionary computation Genetic algorithms Fault identification many-core processors PARALLELIZATION
在线阅读 下载PDF
面向人工智能的半导体加速单元架构设计
20
作者 孙彦德 《电子工业专用设备》 2025年第3期70-74,共5页
设计了一种适用于深度学习和大型语言模型的高效半导体加速单元架构。通过设计并行计算单元结构、建立多级片上存储体系、优化数据流传输以及实现异构系统互联与功耗管理等方法,构建了完整的加速器架构系统。实验结果表明,该架构在8 nm... 设计了一种适用于深度学习和大型语言模型的高效半导体加速单元架构。通过设计并行计算单元结构、建立多级片上存储体系、优化数据流传输以及实现异构系统互联与功耗管理等方法,构建了完整的加速器架构系统。实验结果表明,该架构在8 nm工艺下实现了3.8 TOPS/mm^(2)的计算密度和12.5 TOPS/W的功耗效率,可支持ResNet-50等典型神经网络模型的高效处理。研究证实,所提出的加速单元架构能够满足现代人工智能应用的计算需求,具有重要的实践价值。 展开更多
关键词 半导体技术 AI加速器架构 并行计算优化 神经网络处理器 片上存储系统 异构计算 功耗管理
在线阅读 下载PDF
上一页 1 2 9 下一页 到第
使用帮助 返回顶部