期刊文献+

面向超级计算机系统的大规模图遍历优化 被引量:2

Optimization of large-scale graph traversal for supercomputers
在线阅读 下载PDF
导出
摘要 现实中的数据问题通常被抽象为图。在大数据时代,图数据趋于复杂,这是因为数据量大幅提升,所需要的计算规模迅速增长。大规模的图数据问题对超算平台的存储运算能力具有广泛需求,并对此提出了更高的要求。为了高效地处理大规模图数据,发挥天河超级计算机实验平台的图处理能力,基于现实世界中图结构的小世界性和无尺度性特征,面向评测超级计算机图处理能力的重要基准Graph500,提出一种主要应用于大规模图的图遍历优化方法。这一方法结合了天河平台的体系结构特征,在图结构上应用了顶点排序和优先缓存策略,即将图中顶点按度数从高到低排序,令程序在图遍历阶段优先访问高度数邻居顶点,并将部分关键高度数顶点缓存至天河系统核组内的高速缓存中,以此来减少Graph500基准程序中的无效访存,降低进程间的通信开销,提高访存带宽利用率,从而有效地提升Graph500基准测试程序在天河平台上的性能。面向天河超级计算机系统实验平台提出的应用顶点排序与优先缓存优化方法的VS-Graph500程序,其加速的效果显著,可扩展性好。当图测试规模为2^(37)时,全系统稳定测试性能为2547.13 GTEPS,超过2020年11月Graph500国际排名榜上第7名的数据。 In the big data era,with the significant development of graph data,the demand for computing resources is growing rapidly.Supercomputers are applied to process large-scale graph data,which puts forward higher requirements for the storage and computing capabilities of supercomputers.In order to efficiently process large-scale graph data and evaluate the graph processing capabilities of the Tianhe supercomputer,in this paper we propose a graph traversal optimization technique for improving the efficiency of the benchmark program of Graph500,an important benchmark for evaluating graph processing capabilities of supercomputer.The technique mainly adopts the vertex sorting and priority caching strategy,where the vertices in the graph are sorted by degree in a descending order and some key vertices are stored in the cache of the core group of the Tianhe system.Therefore,this technique cuts down on invalid memory access and reduces the communication overhead between processes for maximizing the usage of the bandwidth for the supercomputer system.In order to validate graph traversal based on vertex sorting and buffering,an optimized graph500 version named VS-graph500 is customized for the Tianhe supercomputer,experimental results demonstrate that the VS-graph500 has a significant acceleration and good scalability in the supercomputers testing system,and attains a stable testing performance at 2547.13EGTEPS when the graph testing scale is 37,which is superior to the 7th in Graph500 list in June 2020.
作者 谭雯 甘新标 白皓 肖调杰 陈旭光 雷书梦 刘杰 TAN Wen;GAN Xinbiao;BAI Hao;XIAO Tiaojie;CHEN Xuguang;LEI Shumeng;LIU Jie(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China;College of General Education,Information College of Hunan,Changsha 410217,China)
出处 《西安电子科技大学学报》 EI CAS CSCD 北大核心 2021年第6期84-95,共12页 Journal of Xidian University
基金 国家重点研究与发展计划(2018YFB0204301) 国家自然科学基金(61902411) 国家数值风洞项目(NNW2019ZT6-B21,NNW2019ZT6-B20,NNW2019ZT5-A10) PDL基金(6142110190206,6142110180203) 湖南省自然科学基金(2020JJ4669)。
关键词 Graph500基准 图结构 顶点排序 优先缓存 超级计算机系统 Graph500 graph structures vertex sorting buffer storage supercomputers
  • 相关文献

参考文献3

二级参考文献28

  • 1Ramey C. Tile-gxl00 manycore processor: Acceleration interfaces and architecture [OL]. San Jose, CA: Tilera Corporation, 2011 [2014-10-25]. http://www, hotchips, org/ wp eontent/uploads/hc archives/hc23/HC23. 18. 2-security/ HC23, 18, 220-TILE-GX100 Ramey Tilera-e. pdf.
  • 2Mitsuhisa S. Feasibility study on future HPC infrastructure [OL]. Tsukuba, Janpan: University of Tsukuba, 2014 [2014-10-25]. http://www, ccs. tsukuba, ac. jpjfiles/exreview/FS-ccs eval-2014. pdf.
  • 3Gwennup L. Adapteva: More flops, less watts lOLl. Mountain View, CA: The Linley Group, 2011 [2014-10-25]. http://www, adapteva, com/wp content/uploads/2011/ 06/adapteva mpr. pdf.
  • 4Dinechin B D, Ayrignac R, Beaucamps P E, et al. A clustered manycore processor architecture for embedded and accelerated applications [C] //Proc of the 17th IEEE Conf on High Performance Extreme Computing. Piscataway, NJ: IEEE, 2013: 1-6.
  • 5Merrill D G, Grimshaw A S. Revisiting sorting for GPGPU stream architectures [C] //Proc of the 19th Int Conf on Parallel Architectures and Compilation Techniques. New York: ACM, 2010: 545-546.
  • 6Davidson A, Tarjan D, Garland M, et al. Efficient parallel merge sort for fixed and variable length keys [C] //Proc of Innovative Parallel Computing. Piscataway, NJ: IEEE, 2012, 1-9.
  • 7Satish N, Kim C, Chhugani J, et al. Fast sort on CPUs, GPUs and Intel MIC architectures [OL]. Santa Clara, CA: Intel I.abs, 2010 [2014-10-25]. http://www, intel, corn/ content/www/us/en/research/intel-labs-radix-sort mic report. html.
  • 8Tian X, Kamil R, Reiii S. Register level sort algorithm on multi-core SIMD processors [C]//Proc of the 3rd Workshop on Irregular Applications: Architecture and Algorithms. New York: ACM, 2013: No 9.
  • 9Sengupta S, Harris M, Zhang Yao, et al. Scan primitives for GPU computing [C] //Proc of the 22nd ACM SIGGRAPH/ EUROGRAPHICS Symp on Graphics hardware. Aire-la Ville, Switzerland: Eurographics Association, 2002: 97-106.
  • 10Satish N, Harris M, Garland M. Designing efficient sorting algorithms for manycore GPUs, NVR-2008 001 [R]. Santa Clara, CA: NVIDIA Corporation, 2008.

共引文献17

同被引文献17

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部