Apache Spark has been the most popular in-memory processing framework for big data applications deployed in data centers.As a CPU-only parallel programming framework,Spark can satisfy the requirement of computing reso...Apache Spark has been the most popular in-memory processing framework for big data applications deployed in data centers.As a CPU-only parallel programming framework,Spark can satisfy the requirement of computing resource by scaling up the nodes of clusters.However,it lacks the ability of utilizing powerful GPU/FPGA/xPU-based accelerators that have been increasingly popular in data centers.In this paper,we present a CPU+FPGA heterogeneous computing architecture to accelerate typical Spark operators,such as K-means,PageRank,and sorting.We explore the classic divide-and-conquer paradigm to accelerate these Spark operators with multiple FPGA processing units,and directly shuffle intermediate results to destination servers for aggregation based on FPGA’s RDMA networks.Thus,our architecture can shorten the critical path of data transmission by offloading the Spark shuffling operation on FPGAs.Moreover,we exploit pipelining,loop unrolling,FPGA BRAM partitioning,and a dataflow execution model to maximize task/data parallelism of Spark operators on each FPGA.Experimental results show that our FPGA-accelerated Spark architecture can significantly improve the performance of Spark operators by 3.5−112×compared with the native Spark.展开更多
This study introduces a methodology for forecasting accelerator performance in Particle Physics algorithms.Accelerating applications can require significant engineering effort,prototyping and measuring the speedup tha...This study introduces a methodology for forecasting accelerator performance in Particle Physics algorithms.Accelerating applications can require significant engineering effort,prototyping and measuring the speedup that might finally result in disappointing accelerator performance.The proposed methodology involves performance modelling and forecasting,enabling the prediction of potential speedup,identification of promising acceleration candidates,prior to any significant programming investment.By predicting worst-case scenarios,the methodology assists developers in deciding whether an application can benefit from acceleration,thus optimising effort.A Monte Carlo simulation example demonstrates the effectiveness of the proposed methodology.The result shows that the methodology provides a reasonable estimate for GPUs and,in the context of FPGAs,the predictions are extremely accurate,within 2%of the realised execution time.展开更多
基金supported jointly by National Key Research and Development Program of China under grant No.2022YFB4500303National Natural Science Foundation of China(NSFC)under grants No.62072198,61825202,61929103.
文摘Apache Spark has been the most popular in-memory processing framework for big data applications deployed in data centers.As a CPU-only parallel programming framework,Spark can satisfy the requirement of computing resource by scaling up the nodes of clusters.However,it lacks the ability of utilizing powerful GPU/FPGA/xPU-based accelerators that have been increasingly popular in data centers.In this paper,we present a CPU+FPGA heterogeneous computing architecture to accelerate typical Spark operators,such as K-means,PageRank,and sorting.We explore the classic divide-and-conquer paradigm to accelerate these Spark operators with multiple FPGA processing units,and directly shuffle intermediate results to destination servers for aggregation based on FPGA’s RDMA networks.Thus,our architecture can shorten the critical path of data transmission by offloading the Spark shuffling operation on FPGAs.Moreover,we exploit pipelining,loop unrolling,FPGA BRAM partitioning,and a dataflow execution model to maximize task/data parallelism of Spark operators on each FPGA.Experimental results show that our FPGA-accelerated Spark architecture can significantly improve the performance of Spark operators by 3.5−112×compared with the native Spark.
基金supported by the CRUK Convergence Science Centre at The Institute of Cancer Research,London,and Imperial College London(A26234)funding from the Cancer Research UK programme grant C33589/A19727.
文摘This study introduces a methodology for forecasting accelerator performance in Particle Physics algorithms.Accelerating applications can require significant engineering effort,prototyping and measuring the speedup that might finally result in disappointing accelerator performance.The proposed methodology involves performance modelling and forecasting,enabling the prediction of potential speedup,identification of promising acceleration candidates,prior to any significant programming investment.By predicting worst-case scenarios,the methodology assists developers in deciding whether an application can benefit from acceleration,thus optimising effort.A Monte Carlo simulation example demonstrates the effectiveness of the proposed methodology.The result shows that the methodology provides a reasonable estimate for GPUs and,in the context of FPGAs,the predictions are extremely accurate,within 2%of the realised execution time.