分布内存系统中流水并行代码的自动生成被引量：4

Automatic Generation of Pipeline Parallel Code in Distributed Memory System

下载PDF

导出

摘要并行循环分为DOALL和DOACROSS。DOACROSS循环携带数据依赖,在并行执行时需要通信支持,对于可以精确分析依赖关系的DOACROSS循环可通过流水并行方式提高性能。该文针对流水并行代码的自动生成进行讨论,包括数据依赖关系图和流水关系图的建立、流水并行判别准则和流水代码的自动生成等。实验证明流水并行后能获得较好的加速比。 Parallel loops are divided into two kinds DOALL and DOACROSS. Loops with data dependencies are often referred as DOACROSS loops. If the dependencies of DOACROSS loop can be precisely determined by compiler, pipeline parallel code for them can be created to improve the performance. This paper discusses the algorithms of creating the data dependence relation graph and pipeline relation graph, the discrimination rules of the pipeline parallel, and how to create the pipeline parallel code automatically. Experimental results show that the speedup ratio is satisfied with pipeline parallel.

作者龚雪容陆林生赵荣彩

机构地区解放军信息工程大学计算机科学与技术系江南计算技术研究所

出处《计算机工程》 CAS CSCD 北大核心 2008年第11期77-79,共3页 Computer Engineering

基金国家部委科研基金资助重点项目

关键词流水并行数据依赖关系图流水关系图流水通信 pipeline parallel data dependence relation graph pipeline relation graph, pipeline communication

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1Stanford SUIF Compiler Group. SUIF: A Parallelizing Optimizing Research Compiler[R]. Computer Systems Lab, Stanford University, Tech, Rep.: CSL-TR-94-620, 1994-05.
2钟洪涛,舒继武,温冬婵,郑纬民.基于区域图数据流分析的通信优化算法[J].软件学报,2003,14(2):175-182. 被引量：5
3董春丽,韩林,赵荣彩.并行编译中一种线性数据和计算划分算法[J].计算机工程,2006,32(24):26-28. 被引量：5
4Hurson A, Lim J T, Parallelization of DOALL and DOACROSS Loops A Survey. [S. l.]: Academic Press, 1997.
5陆林生,董超群,王玲秋,史涛.并行程序概念设计方法的研究[J].计算机学报,2003,26(9):1086-1093. 被引量：6

二级参考文献21

1朱传琪,臧斌宇,陈彤.程序自动并行化系统[J].软件学报,1996,7(3):180-186. 被引量：34
2王玲秋.并行程序的表格编辑环境设计[A]..见:第7届全国并行计算学术交流会会议论文集[C].,2002..
3陆林生等.DPHI：面向科学计算的数据并行高层建模语言[J].计算机研究与发展,2001,38:153-159.
4[1]Balasundaram V, Fox G, Kennedy K, Kremer U. An interactive environment for data partitioning and distribution. In: Charleston SC, ed. Proceedings of the 5th Distributed Memory Computing Conference. New York: ACM Press, 1990.
5[2]Banerjee P, Chandy JA, Gupta M, Hodges IVEW, Holm JG, Lain A, Palermo DJ, Ramaswamy S, Su E. The PARADIGM compiler for distributed-memory multicomputers. IEEE Computer, 1995,28(10):37～47.
6[3]Adve V, Mellor-Crummey J, Sethi A. An integer set framework for HPF analysis and code generation. Technical Report, TR97-275, Computer Science Department, Rice University. 1997.
7[4]Knuth DE. An empirical study of FORTRAN programs. Software-Practice and Experience, 1971,1(2):105～134.
8[5]Johnson SP, Ierotheou CS, Cross M. Automatic parallel code generation for message passing on distributed memory systems. Parallel Computing, 1996,22(2):227～258.
9[6]Allen FE, Cocke J. A program data flow analysis procedure. Communications of the ACM, 1978,19(3):137～174.
10[7]Tarjan RE. Finding dominators in directed graphs. SIAM Journal of Computing, 1974,3(1):62～89.

共引文献13

1史涛,陆林生,饶若楠,蔡涛.PPCDS集成开发环境的设计与实现[J].计算机工程与应用,2005,41(5):106-109.
2刘鑫,陆林生.CFD计算中某些并行方法的讨论[J].信息工程大学学报,2005,6(4):74-77. 被引量：1
3范.2006年中国零售业IT总投资规模将达84．31亿元[J].信息与电脑,2006,18(9):5-5.
4刘鑫,陆林生.数据不规则问题全相关Block递归方程组多维流水线并行技术研究[J].计算机学报,2006,29(10):1750-1756. 被引量：2
5史晓华,吴甘沙,金茂忠,LUEH Guei-Yuan,刘超,王雷.在开放世界中实现逃逸分析[J].软件学报,2008,19(3):522-532. 被引量：1
6闫昭,刘磊.基于多线程LL(1)分析表自动生成的并行算法[J].吉林大学学报（信息科学版）,2009,27(1):85-89. 被引量：1
7王军委,赵荣彩,李妍.基于Define-Use分析的冗余通信消除算法[J].计算机工程,2009,35(4):85-87. 被引量：1
8吴悦,雷超付,杨洪斌.选择性循环的并行方法[J].计算机工程,2010,36(9):35-37. 被引量：1
9闫昭,刘磊.基于任务量划分的紧嵌套循环自动并行化方法[J].吉林大学学报（理学版）,2010,48(4):631-635.
10丁锐,赵荣彩,韩林.一种基于数组生命期的数据分解算法[J].软件学报,2013,24(12):2843-2858.

同被引文献38

1丁强,臧斌宇,朱传琪.一种动态分布数组的数据划分模式[J].计算机工程与设计,2005,26(5):1135-1139. 被引量：1
2董春丽,韩林,赵荣彩.并行编译中一种线性数据和计算划分算法[J].计算机工程,2006,32(24):26-28. 被引量：5
3AKHTERS.ROBERTSJ.多核程序设计技术[M].李宝峰,富弘毅,李韬,译.北京:电子工业出版社.2007.
4Johnson T A,Eigenmann R,Vijaykumar T N.Speculative Thread Decomposition Through Empirical Optimization[C]//Proc.of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York,USA:ACM Press,2007:205-214.
5de Alba M R,Kaeli D R.Runtime Predictability of Loops[C]//Proc.of IEEE International Workshop on Workload Characterization.Washington D.C.,USA:IEEE Press,2001:91-98.
6Mafijul I M.Predicting Loop Termination to Boost Speculative Thread-level Parallelism in Embedded Applications[C]//Proc.of the 19th IEEE Int'l Workshop on Computer Architecture and High Performance Computing.Aizu,Japan:IEEE Press,2007:54-61.
7Mafijul I M,Busck A,Engbom M,et al.Limits on Thread-level Parallelism in Embedded Applications[C]//Proc.of the 11th IEEE Int'l Workshop on Interaction Between Compliers and Computer Architectures.Phoenix,Arizona,USA:IEEE Press,2007:40-49.
8Wang Yaobin,An Hong,Liang Bo,et al.Balancing Thread Partition for Efficiently Exploiting Speculative Thread-level Parallelism[C]// Proc.of International Symposium on Advances in Visual Computing.Lake Tahoe,NV,USA:[s.n.],2007.
9Wang Shengyue,Dai Xiaoru,Yellajyosula K S.et al.Loop Selection for Thread-level Speculation[C]//Proc.of International Workshop on Languages and Compilers for Parallel Computing.Hawthorne,NY,USA:[s.n.],2005.
10Zhong Hongtao,Mehrara M,Lieberman S,et al.Uncovering Hidden Loop Level Parallelism in Sequential Applications[C]// Proc.of the 14th International Symposium on High-performance Computer Architecture.Salt Lake City,USA:[s.n.],2008.

引证文献4

1吴悦,雷超付,杨洪斌.选择性循环的并行方法[J].计算机工程,2010,36(9):35-37. 被引量：1
2闫昭,刘磊.基于任务量划分的紧嵌套循环自动并行化方法[J].吉林大学学报（理学版）,2010,48(4):631-635.
3周雷,陈克非.基于符号运算的归纳变量识别与约化[J].计算机工程,2010,36(24):71-73. 被引量：2
4张庆花,赵荣彩,李朋远.一种面向规则DOACROSS循环的自动并行化框架[J].小型微型计算机系统,2016,37(6):1365-1370.

二级引证文献3

1郝云龙,赵荣彩,侯永生,朱嘉风.反馈式编译在循环级性能分析中的应用[J].计算机工程,2011,37(9):32-34. 被引量：3
2秦书锴,王震宇,汪永红,徐凯杰.符号执行中非线性归纳变量循环优化分析方法[J].信息工程大学学报,2017,18(5):630-634. 被引量：1
3封栋,陈晓.POF表项指令和动作的合法性检测[J].计算机与现代化,2020(10):23-30.

1应家驹,何永强.基于双DSP的大视场红外目标实时检测与跟踪系统设计[J].微计算机信息,2006,22(02Z):170-171. 被引量：2
2赵捷,赵荣彩,韩林,许瑾晨.循环携带反依赖的MPI自动并行化研究[J].计算机科学,2012,39(6):297-300.
3王振宇,王义和,郭福顺.并行循环的识别[J].哈尔滨工业大学学报,1992,24(1):40-46.
4曹倩,胡长军,李士刚.Cell异构多核处理器上流水并行优化技术[J].计算机应用研究,2011,28(9):3344-3347. 被引量：1
5刘少伟,孔令梅,任开军,宋君强,邓科峰,冷洪泽.云环境下优化科学工作流执行性能的两阶段数据放置与任务调度策略[J].计算机学报,2011,34(11):2121-2130. 被引量：64
6李朝海,罗超,王杰峰.多客户端频谱监测接收机系统设计[J].电子技术应用,2014,40(9):68-71. 被引量：3
7刘晓娴,赵荣彩,丁锐,李雁冰.基于循环分块的流水粒度优化算法[J].计算机应用,2013,33(8):2171-2176. 被引量：1
8李士刚,胡长军,王珏,李建江.异构多核上多级并行模型支持及性能优化[J].软件学报,2013,24(12):2782-2796. 被引量：4
9张平,赵荣彩,李清宝,董春丽.共享内存结构OpenMP并行程序的自动生成[J].计算机科学,2004,31(12):189-191.
10张奋翔,陈华辉,钱江波,董一鸿.HSSM:一种流数据分层次模最大化方法[J].计算机研究与发展,2016,53(8):1792-1805.

计算机工程

2008年第11期

浏览历史

内容加载中请稍等...

分布内存系统中流水并行代码的自动生成被引量：4

参考文献5

二级参考文献21

共引文献13

同被引文献38

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

分布内存系统中流水并行代码的自动生成 被引量：4

参考文献5

二级参考文献21

共引文献13

同被引文献38

引证文献4

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

分布内存系统中流水并行代码的自动生成被引量：4