期刊文献+

一种基于VLIW DSP架构的高性能取指流水线 被引量:1

A High-performance Fetch Pipeline Based On the VLIW DSP Architecture
在线阅读 下载PDF
导出
摘要 以超长指令字(VLIW)数字信号处理器(DSP)作为平台,针对现有提高单线程取指流水线效率的方法中存在的弊端,提出了一种高性能的取指流水架构。该架构支持无效取指的检测与作废,从而降低不必要的cache访问,减少取指流水停顿周期,该结构还引入专用硬件支持编译调度的循环软流水,有效提高指令并行性,降低代码存储空间,由此释放出的单线程取指流水线的空闲周期约达46.34%。实验结果表明,相比优化前的取指流水而言,代码空间压缩约11.93%,执行周期缩短约8.67%,cache访问次数下降约12.84%,指令cache暂停周期缩短约7.86%,处理器单线程的指令吞吐率平均提高约11.7%。 For the drawbacks existent in single-thread fetch pipeline to improve the efficiency, a high-pefformance fetch pipeline structure is proposed in this paper based on the platform of the VLIW digital signal processor (DSP). It can support the detection and void for the invalid fetch, bypass for the missing fetch, which reduces the unnecessary cache access and fetch pipeline stall. The structure also inducts dedicated hardware which supports the software pipeline of scheduled compilation to improve the parallelism of instruction. It reduces the code memory space, and the idle cycles of released single-threaded pipeline is reached up to about 46.34%. Compared with the fetch pipeline before optimized, experiment results show that the code storage space is reduced about 11.93%, the average execution cycle is shortened about 8.67%, the cache access times is decreased about 12.84%, the suspension period of instruction cache is shortened about 7.86%, and the single-threaded instruction throughput of processor is increased by 11.7%.
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2011年第4期102-106,共5页 Journal of National University of Defense Technology
基金 国家科技重大专门资助项目(2009ZX01034-001-006)
关键词 数字信号处理器 无效取指 软件流水 循环缓冲 digital signal processor invalid instruction fetch software pipeline loop buffer
  • 相关文献

参考文献14

  • 1Hu Y L, Wang Y M. Task Scheduling and Management in Single- Chip Multi-Mrocessor System [C]//International Conference on Electronic Packaging Technology and High Density Packaging, 2008(7) : 1 - 4.
  • 2Chishfi Z, Vijaykumar T N. Optimal PowedPerformance Pipeline Depth for SMT in Scaled Technologies[J]. IEEE Transactions on Coasters, 2008,57(1): 69-81.
  • 3Deng Q, Zhang M. A Parallel Infrastructure on Dynamic EPIC SMT[ C]//Algoiithms and Architectures for Parallel Processing, 2007,4494:165 - 176.
  • 4Rowen C, Nuth P. A DSP Architecture Optimized for Wireless Baseband[C]//Proceedings of the 11th international conference on System-on-chip, Tampere, Finland, IEEE Press, 2009: 151- 156.
  • 5万江华,陈书明.一种提高同时多线程VLIW处理器中取指单元吞吐率的方法[J].计算机工程与科学,2007,29(6):97-101. 被引量:2
  • 6沈钲,孙义和.一种支持同时多线程的VLIW DSP架构[J].电子学报,2010,38(2):352-358. 被引量:12
  • 7Jin T S, Dong K, Kim, et al. Static Branch Prediction Method And Code Execution Method For Pipeline Processor, And Code Compiling Method For Static Branch Prediction [ P ]. US: 20100205405, 2010.12.8.
  • 8沈钲,何虎,杨旭,贾迪,孙义和.Architecture Design of a Variable Length Instruction Set VLIW DSP[J].Tsinghua Science and Technology,2009,14(5):561-569. 被引量:11
  • 9Texas Instrument Incorporated. Very Long Instruction Word Mioroprocessor with Execution Packet Spanning Two or More Fetch Packets with Pre-dispatch Instruction Selection from Two Latches According to Instruction Bit[P]. US :7039790, 2000.10.31.
  • 10Knijnenburg P M W. Branch Classification to Control Instruction Fetch in Simultaneous Multithrcaded Architectures[ C]//Proc of the Int 'l Workshop on Innovative Architecture for Future Generation High-Performance Procssors and Systems, 2002:67 - 76.

二级参考文献33

  • 1Rau B, Fisher J. Instruction-Level Parallel Processing: History, Overview, and Perspective [ J ]. Journal of Supercomputing. 1993,7(21 ):9 - 50.
  • 2Wall D W. Limits of instruction-level parallelism[ A]. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems [C]. Santa Clara,United States, 1991,26(4) : 176 - 189.
  • 3Lam M S. Limits of control flow on parallelism [ A ]. In Proceedings of the 18th International Symposium on Computer Architecture[ C]. Queensland, Australia, 1992,20(2) :46 - 57.
  • 4Butler M. Single instruction stream parallelism is greater than two[A]. In Proceedings of the 18th International Symposium on Computer Architecture [ C ]. Toronto, Canada, 1991. 276 - 286.
  • 5Tullsen D M. Simultaneous multithreading: maximizing onchip parallelism[A]. In Proceedings of the 22nd Annual International Symposium on Computer Architecture[ C ]. Barcelona, Spain, 1995. 392 - 403.
  • 6Bayoumi M A. Parallel Algorithms and Architectures for DSP Applications[ M]. Kluwer Academic Publishers, Norwell, USA, 1991. 352 - 369.
  • 7Kaxiras S. Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads[A]. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems[ C ]. Atlanta, USA, 2001. 211 - 220.
  • 8Suijkerbuijk S, Juurlink. Implementing Hardware Multithreading in a VLIW Processor[ A ], In Proceedings of the 17th International Conference on Parallel and Distributed Computing and Systems[ C] .Las Vegas, USA, 2005.674 - 679.
  • 9Bharath Iyer. Extended Split- Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs[ A ]. In Proceedings of the 31nd Annual International Symposium on Computer Architecture [ C ]. Munchen, Germany, 2004, 32 (2) : 364 - 369.
  • 10BDTI Benchmark Results [ OL ]. http://www. bdti. com, 2005.

共引文献22

同被引文献6

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部