We use cookies to improve your experience with our site.
Fan Liang, Xiaoyi Lu. Accelerating Iterative Big Data Computing Through MPI[J]. Journal of Computer Science and Technology, 2015, 30(2): 283-294. DOI: 10.1007/s11390-015-1522-5
Citation: Fan Liang, Xiaoyi Lu. Accelerating Iterative Big Data Computing Through MPI[J]. Journal of Computer Science and Technology, 2015, 30(2): 283-294. DOI: 10.1007/s11390-015-1522-5

Accelerating Iterative Big Data Computing Through MPI

  • Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of computing, data movement, and data management plays a key role for current distributed data computing systems. In this paper, we first analyze the overhead of shuffle operation in Hadoop and Spark when running PageRank workload, and then propose an event-driven pipeline and in-memory shuffle design with better overlapping of computation and communication as DataMPIIteration, an MPI-based library, for iterative big data computing. Our performance evaluation shows DataMPI-Iteration can achieve 9X~21X speedup over Apache Hadoop, and 2X~3X speedup over Apache Spark for PageRank and K-means.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return