Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Rong Chen; Jia-Xin Shi; Hai-Bo Chen; Bin-Yu Zang

doi:10.1007/s11390-015-1501-x

Rong Chen, Jia-Xin Shi, Hai-Bo Chen, Bin-Yu Zang. Bipartite-Oriented Distributed Graph Partitioning for Big Learning[J]. Journal of Computer Science and Technology, 2015, 30(1): 20-29. DOI: 10.1007/s11390-015-1501-x

Citation:

Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Abstract

Abstract

Many machine learning and data mining (MLDM) problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs. However, most distributed graph-parallel systems are oblivious to the unique characteristics in such graphs and existing online graph partitioning algorithms usually cause excessive replication of vertices as well as significant pressure on network communication. This article identifies the challenges and opportunities of partitioning bipartite graphs for distributed MLDM processing and proposes BiGraph, a set of bipartite-oriented graph partitioning algorithms. BiGraph leverages observations such as the skewed distribution of vertices, discriminated computation load and imbalanced data sizes between the two subsets of vertices to derive a set of optimal graph partitioning algorithms that result in minimal vertex replication and network communication. BiGraph has been implemented on PowerGraph and is shown to have a performance boost up to 17.75X (from 1.16X) for four typical MLDM algorithms, due to reducing up to 80% vertex replication, and up to 96% network traffic.

FullText(HTML)

References (21)

Relative Articles

Supplements (0)

Cited By

Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Abstract

Catalog

Export File

Citation

Format

Content