摘要
采用分布式存储结构来解决阵列处理器片内访问延迟等"存储墙"问题已经成为研究主流。针对阵列处理器中分布式存储簇内互连问题,设计了一种电路结构简单、使用效率高和延迟低的簇内全访问电路结构,实现了簇内16个处理单元对存储单元的并行访问。实验结果表明,在无冲突情况下,最高频率达223 MHz,访问峰值带宽可达7.42 GB/S.测试结果表明,相比于行列交叉互连结构,全访问结构具有更小的访问延迟。通过对256×256和512×512边缘检测canny算法在该结构上进行并行化实现和性能比较发现,相比于CPU+GPU结构的处理时间,加速比分别提升了2.84倍和2.91倍。
Using distributed storage structure to solve access delay has become a mainstream in chip of array processor.Aimed at the interconnect problem in clusters of distributed storage,Intra-cluster Full-Switch architecture is designed which has simple circuit structure,high efficiency and low delay.The structure achieves parallel access to memory cells by 16 processing elements within a cluster.The experimental results show that in the case of no conflict,the highest frequency is 223 MHz with access to the peak bandwidth of 7.42 GB/S.Compared to the Line Row two-stage Switch structure,Full-Switch architecture has smaller average access latency.Finally,the 256×256 and 512×512 canny edge detection algorithm is mapped and compared.The acceleration ratio is increased by 2.84 and 2.91 times,respectively compared to with GPU+CPU architecture.
作者
蒋林
刘鹏
山蕊
刘阳
JIANG Lin;LIU Peng;SHAN Rui;LIU Yang(Integrated Circuit Design Laboratory,Xi'an University of Science and Technology,Xi'an 710054,China;School of Electronic Engineering,Xi'an University of Posts and Telecommunications,Xi'an 710121,China;School of Computer,Xi'an University of Posts and Telecommunications,Xi'an 710121,China)
出处
《西安科技大学学报》
CAS
北大核心
2018年第4期656-662,共7页
Journal of Xi’an University of Science and Technology
基金
国家自然科学基金(61772417
61272120
61634004
61602377)
陕西省自然科学基金(2015JM6326)
陕西省科技统筹创新工程(2016KTZDGY02-04-02)
陕西省教育厅自然科学研究(17JK0689)
陕西省重点研发计划(2017GY-060)
关键词
阵列处理器
分布式存储
访问延迟
并行访问
array processor
distributed storage
access delay
parallel access