稀疏矩阵LU分解的FPGA实现

Implementation of sparse LU decomposition using FPGAs

下载PDF

导出

摘要研究了直接法求解稀疏线性方程组过程中最耗时的稀疏矩阵LU分解的数值计算,提出了一种稀疏矩阵LU分解并行算法,该算法可通过动态的相关性检测来开发更多的并行性。同时提出了基于现场可编程门阵列(FPGA)实现该并行算法的硬件结构,该结构不依赖于分解因子的稀疏结构信息,分解因子的数据结构可动态生成。与相关工作比较,这种新的硬件结构具有更好的通用性。实验结果表明,这种新的结构的性能优于通用处理器的软件实现。 The most time-consuming numerical computation in sparse LU decomposition with the direct method was studied,and a parallel sparse LU decomposition algorithm was presented,with which more parallelisms can be developed by dynamic dependence analysis.And a hardwate structure implemented using the parallel sparse LU decomposition algorithm based on field programmable gate orrays （FPGAs）was proposed.The design of the hardware structure does not need the sparsity structural informafion of the decomposition factors,and the data structures of decomposition factors are generated dynamically.The proposed design is more general than that proposed in related work.The experimental results show that this new LU decomposition design outperforms the software implementation on the general-purpose processors.

作者邬贵明王森谢向辉窦勇

机构地区数学工程与先进计算国家重点实验室江南计算技术研究所国防科学技术大学计算机学院

出处《高技术通讯》 CAS CSCD 北大核心 2013年第8期789-796,共8页 Chinese High Technology Letters

基金国家自然科学基金(61125201)资助项目

关键词稀疏矩阵 LU分解并行算法现场可编程门阵列（FPGA）任务并行 sparse matrix LU decomposition parallel algorithm FPGA task parallelism

分类号 O241.6 [理学—计算数学]

引文网络
相关文献

参考文献13

1Wang X,Ziavras S. Parallel LU factorization of sparse ma- trices on FPGA-based configurable computing engines. Concurrency and Computation:Practice and Experience, 2004,16(4) :319-343.
2Chagnon T, Johnson J, Vachranukunkiet P, et al. Sparse LU decomposition using FPGA. In:Proceedings of the 9th Imemational Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondeim, Norway,2008.
3Vachranukunkiet P. Power-flow computation using Field Programmable Gate Arrays: [ Ph. D dissertation ]. Drexel University, 2007.
4Chagnon T. Architectural support for direct sparse LU al- gorithms:[ Master dissertation]. Drexel University,2010.
5Nagel L. SPICE2:a computer program to simulate semi- conductor circuits: [ Ph. D dissertation]. University of Cal- ifornia, Berkeley, 1975.
6Kapre N, DeHon A. Parallelizing sparse matrix solve for SPICE circuit simulation using FPGAs. In:Proceedings of the 2009 IEEE International Conference on Field-Pro- grammable Technology, Sydney, Australia,2009. 190-198.
7Demmel J. Applied Numerical Linear Algebra. The Society of Industrial and Applied Mathematics. 1997.
8Wu G, Dou Y, Lei Y, et al. A fine-grained pipeline imple- mentation of the LINPACK benchmark on FPGAs. In: Pro- ceedings of the 2009 IEEE Symposium on Field-Program- mable Custom Computing Machines, Napa, California, USA,2009. 183-190.
9Kurzak J, Dongarra J. Fully dynamic scheduler for numeri-cal computing on muhicore processors. University of Ten- nessee LAPACK Working Note #220,2010.
10Fu C, Jiao X, Yang T. Efficient sparse LU factorization with partial pivoting on distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 1998,9(2) :109-125.

1蒋涛,李自勤.基于FPGA的实时图像中值滤波算法及实现[J].微计算机信息,2012,28(10):196-197. 被引量：3
2王绍徐.基于FPGA的坐标转换及实现[J].上海电力学院学报,2012,28(4):353-356. 被引量：2
3卢文凯,景丽萍,杨柳.截断式鲁棒非负矩阵分解算法[J].南京大学学报（自然科学版）,2016,52(4):714-723.
4李卫,王杉,魏急波.SDRAM控制器的FPGA设计与实现[J].电子工程师,2004,30(10):29-32. 被引量：13
5左定喜,吴帆,李肯立.一种改进的并行Orthodir(m)算法[J].计算机科学,2013,40(3):126-127.
6崔嵬,李承恕.线性反馈移位寄存器的改进算法及其电路实现[J].北京交通大学学报,2004,28(5):69-72. 被引量：8
7赵昀初,丁友东.实现逼近细分模式的统一分解架构[J].计算机应用与软件,2005,22(6):78-80.
8建造量子计算机的蓝图[J].程序员,2013(11):6-7.
9吴建平,马怀发,赵军,宋君强,张卫民.区域分解型并行预条件的一种粗网格校正算法[J].计算机应用与软件,2013,30(9):10-11.
10王天云,陈秋菊.基于FPGA的步进电机控制器设计[J].舰船电子对抗,2013,36(2):104-109. 被引量：1

高技术通讯

2013年第8期

浏览历史

内容加载中请稍等...

稀疏矩阵LU分解的FPGA实现

参考文献13

相关作者

相关机构

相关主题

浏览历史