The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graphprocessing frameworks, such as Pregel, GPS and Giraph,...The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graphprocessing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSE We use this model to design and implement a high-performance distributed graphprocessing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a highbandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.展开更多
文摘The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graphprocessing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSE We use this model to design and implement a high-performance distributed graphprocessing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a highbandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.