Accelerating Iterative Big Data Computing Through MPI

Fan Liang; Xiaoyi Lu

doi:10.1007/s11390-015-1522-5

Fan Liang, Xiaoyi Lu. Accelerating Iterative Big Data Computing Through MPI[J]. Journal of Computer Science and Technology, 2015, 30(2): 283-294. DOI: 10.1007/s11390-015-1522-5

Citation:

Fan Liang, Xiaoyi Lu. Accelerating Iterative Big Data Computing Through MPI[J]. Journal of Computer Science and Technology, 2015, 30(2): 283-294. DOI: 10.1007/s11390-015-1522-5

Citation:

Fan Liang, Xiaoyi Lu. Accelerating Iterative Big Data Computing Through MPI[J]. Journal of Computer Science and Technology, 2015, 30(2): 283-294. DOI: 10.1007/s11390-015-1522-5

Accelerating Iterative Big Data Computing Through MPI

Abstract

Abstract

Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of computing, data movement, and data management plays a key role for current distributed data computing systems. In this paper, we first analyze the overhead of shuffle operation in Hadoop and Spark when running PageRank workload, and then propose an event-driven pipeline and in-memory shuffle design with better overlapping of computation and communication as DataMPIIteration, an MPI-based library, for iterative big data computing. Our performance evaluation shows DataMPI-Iteration can achieve 9X~21X speedup over Apache Hadoop, and 2X~3X speedup over Apache Spark for PageRank and K-means.

FullText(HTML)

References (25)

Relative Articles

Supplements (0)

Cited By

Accelerating Iterative Big Data Computing Through MPI

Abstract

Catalog

Export File

Citation

Format

Content