Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally in...Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally inefficient for graph algorithms,and dedicated architectures can provide high efficiency,but lack flexibility.To address these challenges,this paper proposes ParaGraph,a reduced instruction set computing-five(RISC-V)-based software-hardware co-designed graph computing accelerator that can process graph algorithms in parallel,and also establishes a performance evaluation model to assess the efficiency of co-acceleration.ParaGraph handles parallel processing of typical graph algorithms on the hardware side,while performing overall functional control on the software side with custom designed instructions.ParaGraph is verified on the XCVU440 field-programmable gate array(FPGA)board with E203,a RISC-V processor.Compared with current mainstream graph computing accelerators,ParaGraph consumes 7.94%less block RAM(BRAM)resources than ThunderGP.Its power consumption is reduced by 86.90%,24.90%,and 76.38%compared with ThunderGP,HitGraph,and GraphS,respectively.The power efficiency of connected components(CC)and degree centrality(DC)algorithms is improved by an average of 6.50 times over ThunderGP,2.51 times over HitGraph,and 3.99 times over GraphS.The software-hardware co-design acceleration performance indicators H/W.Cap for CC and DC are 13.02 and 14.02,respectively.展开更多
Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, i...Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, imbalanced workload distribution, and inefficient memory access when executing graph computing tasks. Graph computing accelerator, GraphApp, based on a reconfigurable processing element(PE) array was proposed to address the challenges above. GraphApp utilizes 16 reconfigurable PEs for parallel computation and employs tiled data. By reasonably dividing the data into tiles, load balancing is achieved and the overall efficiency of parallel computation is enhanced. Additionally, it preprocesses graph data using the compressed sparse columns independently(CSCI) data compression format to alleviate the issue of low memory access efficiency caused by the high memory access-to-computation ratio. Lastly, GraphApp is evaluated using triangle counting(TC) and depth-first search(DFS) algorithms. Performance analysis is conducted by measuring the execution time of these algorithms in GraphApp against existing typical graph frameworks, Ligra, and GraphBIG, using six datasets from the Stanford Network Analysis Project(SNAP) database. The results show that GraphApp achieves a maximum performance improvement of 30.86% compared to Ligra and 20.43% compared to GraphBIG when processing the same datasets.展开更多
基金Supported by the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005)+1 种基金the Shaanxi Province Key R&D Plan(No.2022GY-027,2021GY-029)the Key Scientific Research Project of Shaanxi Department of Education(No.22JY060).
文摘Graph computing has become pervasive in many applications due to its capacity to represent complex relationships among different objects in the big data era.However,general-purpose architectures are computationally inefficient for graph algorithms,and dedicated architectures can provide high efficiency,but lack flexibility.To address these challenges,this paper proposes ParaGraph,a reduced instruction set computing-five(RISC-V)-based software-hardware co-designed graph computing accelerator that can process graph algorithms in parallel,and also establishes a performance evaluation model to assess the efficiency of co-acceleration.ParaGraph handles parallel processing of typical graph algorithms on the hardware side,while performing overall functional control on the software side with custom designed instructions.ParaGraph is verified on the XCVU440 field-programmable gate array(FPGA)board with E203,a RISC-V processor.Compared with current mainstream graph computing accelerators,ParaGraph consumes 7.94%less block RAM(BRAM)resources than ThunderGP.Its power consumption is reduced by 86.90%,24.90%,and 76.38%compared with ThunderGP,HitGraph,and GraphS,respectively.The power efficiency of connected components(CC)and degree centrality(DC)algorithms is improved by an average of 6.50 times over ThunderGP,2.51 times over HitGraph,and 3.99 times over GraphS.The software-hardware co-design acceleration performance indicators H/W.Cap for CC and DC are 13.02 and 14.02,respectively.
基金supported by the National Science and Technology Major Project (2022ZD0119001)the National Natural Science Foundation of China (61834005)+1 种基金the Shaanxi Key Research and Development Project (2022GY-027)the Key Scientific Research Project of Shaanxi Department of Education (22JY060)。
文摘Due to the diversity of graph computing applications, the power-law distribution of graph data, and the high compute-to-memory ratio, traditional architectures face significant challenges regarding poor flexibility, imbalanced workload distribution, and inefficient memory access when executing graph computing tasks. Graph computing accelerator, GraphApp, based on a reconfigurable processing element(PE) array was proposed to address the challenges above. GraphApp utilizes 16 reconfigurable PEs for parallel computation and employs tiled data. By reasonably dividing the data into tiles, load balancing is achieved and the overall efficiency of parallel computation is enhanced. Additionally, it preprocesses graph data using the compressed sparse columns independently(CSCI) data compression format to alleviate the issue of low memory access efficiency caused by the high memory access-to-computation ratio. Lastly, GraphApp is evaluated using triangle counting(TC) and depth-first search(DFS) algorithms. Performance analysis is conducted by measuring the execution time of these algorithms in GraphApp against existing typical graph frameworks, Ligra, and GraphBIG, using six datasets from the Stanford Network Analysis Project(SNAP) database. The results show that GraphApp achieves a maximum performance improvement of 30.86% compared to Ligra and 20.43% compared to GraphBIG when processing the same datasets.