一种基于VLIW DSP架构的高性能取指流水线被引量：1

A High-performance Fetch Pipeline Based On the VLIW DSP Architecture

下载PDF

导出

摘要以超长指令字(VLIW)数字信号处理器(DSP)作为平台,针对现有提高单线程取指流水线效率的方法中存在的弊端,提出了一种高性能的取指流水架构。该架构支持无效取指的检测与作废,从而降低不必要的cache访问,减少取指流水停顿周期,该结构还引入专用硬件支持编译调度的循环软流水,有效提高指令并行性,降低代码存储空间,由此释放出的单线程取指流水线的空闲周期约达46.34%。实验结果表明,相比优化前的取指流水而言,代码空间压缩约11.93%,执行周期缩短约8.67%,cache访问次数下降约12.84%,指令cache暂停周期缩短约7.86%,处理器单线程的指令吞吐率平均提高约11.7%。 For the drawbacks existent in single-thread fetch pipeline to improve the efficiency, a high-pefformance fetch pipeline structure is proposed in this paper based on the platform of the VLIW digital signal processor （DSP）. It can support the detection and void for the invalid fetch, bypass for the missing fetch, which reduces the unnecessary cache access and fetch pipeline stall. The structure also inducts dedicated hardware which supports the software pipeline of scheduled compilation to improve the parallelism of instruction. It reduces the code memory space, and the idle cycles of released single-threaded pipeline is reached up to about 46.34%. Compared with the fetch pipeline before optimized, experiment results show that the code storage space is reduced about 11.93%, the average execution cycle is shortened about 8.67%, the cache access times is decreased about 12.84%, the suspension period of instruction cache is shortened about 7.86%, and the single-threaded instruction throughput of processor is increased by 11.7%.

作者杨惠陈书明万江华

机构地区国防科技大学计算机学院

出处《国防科技大学学报》 EI CAS CSCD 北大核心 2011年第4期102-106,共5页 Journal of National University of Defense Technology

基金国家科技重大专门资助项目(2009ZX01034-001-006)

关键词数字信号处理器无效取指软件流水循环缓冲 digital signal processor invalid instruction fetch software pipeline loop buffer

分类号 TP368.1 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献14

1Hu Y L, Wang Y M. Task Scheduling and Management in Single- Chip Multi-Mrocessor System [C]//International Conference on Electronic Packaging Technology and High Density Packaging, 2008(7) : 1 - 4.
2Chishfi Z, Vijaykumar T N. Optimal PowedPerformance Pipeline Depth for SMT in Scaled Technologies[J]. IEEE Transactions on Coasters, 2008,57(1): 69-81.
3Deng Q, Zhang M. A Parallel Infrastructure on Dynamic EPIC SMT[ C]//Algoiithms and Architectures for Parallel Processing, 2007,4494:165 - 176.
4Rowen C, Nuth P. A DSP Architecture Optimized for Wireless Baseband[C]//Proceedings of the 11th international conference on System-on-chip, Tampere, Finland, IEEE Press, 2009: 151- 156.
5万江华,陈书明.一种提高同时多线程VLIW处理器中取指单元吞吐率的方法[J].计算机工程与科学,2007,29(6):97-101. 被引量：2
6沈钲,孙义和.一种支持同时多线程的VLIW DSP架构[J].电子学报,2010,38(2):352-358. 被引量：12
7Jin T S, Dong K, Kim, et al. Static Branch Prediction Method And Code Execution Method For Pipeline Processor, And Code Compiling Method For Static Branch Prediction [ P ]. US: 20100205405, 2010.12.8.
8沈钲,何虎,杨旭,贾迪,孙义和.Architecture Design of a Variable Length Instruction Set VLIW DSP[J].Tsinghua Science and Technology,2009,14(5):561-569. 被引量：11
9Texas Instrument Incorporated. Very Long Instruction Word Mioroprocessor with Execution Packet Spanning Two or More Fetch Packets with Pre-dispatch Instruction Selection from Two Latches According to Instruction Bit[P]. US :7039790, 2000.10.31.
10Knijnenburg P M W. Branch Classification to Control Instruction Fetch in Simultaneous Multithrcaded Architectures[ C]//Proc of the Int 'l Workshop on Innovative Architecture for Future Generation High-Performance Procssors and Systems, 2002:67 - 76.

二级参考文献33

1Rau B, Fisher J. Instruction-Level Parallel Processing: History, Overview, and Perspective [ J ]. Journal of Supercomputing. 1993,7(21 ):9 - 50.
2Wall D W. Limits of instruction-level parallelism[ A]. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems [C]. Santa Clara,United States, 1991,26(4) : 176 - 189.
3Lam M S. Limits of control flow on parallelism [ A ]. In Proceedings of the 18th International Symposium on Computer Architecture[ C]. Queensland, Australia, 1992,20(2) :46 - 57.
4Butler M. Single instruction stream parallelism is greater than two[A]. In Proceedings of the 18th International Symposium on Computer Architecture [ C ]. Toronto, Canada, 1991. 276 - 286.
5Tullsen D M. Simultaneous multithreading: maximizing onchip parallelism[A]. In Proceedings of the 22nd Annual International Symposium on Computer Architecture[ C ]. Barcelona, Spain, 1995. 392 - 403.
6Bayoumi M A. Parallel Algorithms and Architectures for DSP Applications[ M]. Kluwer Academic Publishers, Norwell, USA, 1991. 352 - 369.
7Kaxiras S. Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads[A]. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems[ C ]. Atlanta, USA, 2001. 211 - 220.
8Suijkerbuijk S, Juurlink. Implementing Hardware Multithreading in a VLIW Processor[ A ], In Proceedings of the 17th International Conference on Parallel and Distributed Computing and Systems[ C] .Las Vegas, USA, 2005.674 - 679.
9Bharath Iyer. Extended Split- Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs[ A ]. In Proceedings of the 31nd Annual International Symposium on Computer Architecture [ C ]. Munchen, Germany, 2004, 32 (2) : 364 - 369.
10BDTI Benchmark Results [ OL ]. http://www. bdti. com, 2005.

共引文献22

1郭阳,甄体智,李勇.YHFT-DX高性能DSP指令控制流水线设计与优化[J].计算机工程与应用,2010,46(7):69-71. 被引量：1
2詹英,吴春明,王宝军.一种与缓冲区紧耦合的环形循环滑动窗口的数据流抽取算法[J].电子学报,2011,39(4):894-898. 被引量：10
3李桂菊.利用DSP底层结构提高MPEG-4编码的实时性[J].中国光学,2011,4(5):461-467. 被引量：1
4赵建仁,邢玲,陈蕾.基于CEVA平台的WMV视频解码器优化[J].电子设计工程,2011,19(23):181-184.
5徐劲松,王志新,严迎建.ECC专用指令处理器软硬件协同设计[J].计算机工程与设计,2012,33(3):916-920. 被引量：2
6詹英,吴春明,王宝军.基于RCSW的数据流速度异常检测算法研究[J].电子学报,2012,40(4):674-680. 被引量：2
7喻庆东,周莉,朱玥,胡哲琨,陈杰.可编程的帧内预测器结构设计与实现[J].电子科技大学学报,2012,41(4):605-610.
8陈敏超,李丽斯,何虎,麻军平,许杰.实时高清多媒体监控系统的安全实现[J].计算机工程与设计,2013,34(1):42-48. 被引量：1
9陈峰扬,杜勇,郭德源,何虎.一种VLIW-Superscalar混合微处理器结构[J].微电子学与计算机,2013,30(11):1-5. 被引量：2
10郑顺义,马电,桂力,王晓南.一种ARM+DSP架构的机载影像实时拼接方法[J].武汉大学学报（信息科学版）,2014,39(1):1-7. 被引量：7

同被引文献6

1LEE R B, SHI Z, YANG X.Efficient permutation instruetions for fast software cryptography[J].IEEE Micro, 2001,21 (6) : 56-69.
2WOLFE A,CHANIN A.Executing compressed programs on an embedded RISC architecture[C].Proceedings of the 25th International Symposium on Micrcmarchitecture, 1992 : 81-91.
3管茂林,何义,杨乾明,张春元.基于程序特征分析的流处理器VLIW压缩技术与解压实现[J].国防科技大学学报,2012,34(1):138-143. 被引量：1
4王昭顺,张建林,曹文彬.VLIW体系结构微处理器的一种设计方法[J].计算机科学,2000,27(8):40-42. 被引量：2
5俞磊,罗金平,周兴铭.VLIW技术的最新发展[J].计算机工程,2002,28(1):1-3. 被引量：2
6朱少波,姚庆栋,洪享,史册.一种面向VLIW指令压缩的寄存器分配算法[J].计算机工程,2003,29(20):154-156. 被引量：1

引证文献1

1姬忠宁,陈迅,徐金甫,张鹏.基于指令前缀的专用VLIW压缩技术研究与实现[J].电子技术应用,2013,39(4):22-25. 被引量：2

二级引证文献2

1李功丽,戴紫彬,徐进辉,王寿成,朱玉飞,李丹.基于流体系结构的VLIW二维压缩及并行解压[J].电子学报,2017,45(9):2256-2262. 被引量：2
2许江宁.中断与跳转操作对指令串的影响[J].科技传播,2015,7(20):110-111.

1张怡,张丛.基于面向对象的机房电力监控系统的设计与实现[J].测控与通信,2008,32(2):56-61.
2张怡,张丛,黄健.基于面向对象的机房电力监控系统的设计与实现[J].航空计算技术,2009,39(6):81-84. 被引量：1
3研华推出新一代SoftMotion运动控制卡[J].现代制造,2012(32):33-33.
4陈纪孝,李勇.软件流水循环缓冲的设计与实现[J].计算机科学,2013,40(4):35-37. 被引量：4
5王晓华,庞春江,孟建良.基于指令并行性的VLIW的控制流[J].计算机工程与应用,2002,38(15):99-101.
6戚玉华,吴学智,顿新平.高速网络数据流分类系统[J].电子测量技术,2006,29(5):148-150. 被引量：2
7姚炜,马中.一种高速数据采集系统的设计[J].计算机与数字工程,1997,25(5):27-30.
8胡定磊,陈书明.降低指令存储器功耗的一种有效方法:循环缓冲[J].计算机工程与科学,2007,29(6):93-96. 被引量：2
9沈旭昆,王双全,赵沁平.类比推理协处理器中的流水线技术[J].计算机研究与发展,1998,35(5):393-397. 被引量：1
10CPU与散热器[J].现代计算机（中旬刊）,2006(5):62-63.

国防科技大学学报

2011年第4期

浏览历史

内容加载中请稍等...

一种基于VLIW DSP架构的高性能取指流水线被引量：1

参考文献14

二级参考文献33

共引文献22

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于VLIW DSP架构的高性能取指流水线 被引量：1

参考文献14

二级参考文献33

共引文献22

同被引文献6

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

一种基于VLIW DSP架构的高性能取指流水线被引量：1