We use cookies to improve your experience with our site.
Yue-Wen Wu, Yuan-Jia Xu, Heng Wu, Lin-Gang Su, Wen-Bo Zhang, Hua Zhong. Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach[J]. Journal of Computer Science and Technology, 2021, 36(5): 1184-1199. DOI: 10.1007/s11390-021-0232-4
Citation: Yue-Wen Wu, Yuan-Jia Xu, Heng Wu, Lin-Gang Su, Wen-Bo Zhang, Hua Zhong. Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach[J]. Journal of Computer Science and Technology, 2021, 36(5): 1184-1199. DOI: 10.1007/s11390-021-0232-4

Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach

  • Big data analytics applications are increasingly deployed on cloud computing infrastructures, and it is still a big challenge to pick the optimal cloud configurations in a cost-effective way. In this paper, we address this problem with a high accuracy and a low overhead. We propose Apollo, a data-driven approach that can rapidly pick the optimal cloud configurations by reusing data from similar workloads. We first classify 12 typical workloads in BigDataBench by characterizing pairwise correlations in our offline benchmarks. When a new workload comes, we run it with several small datasets to rank its key characteristics and get its similar workloads. Based on the rank, we then limit the search space of cloud configurations through a classification mechanism. At last, we leverage a hierarchical regression model to measure which cluster is more suitable and use a local search strategy to pick the optimal cloud configurations in a few extra tests. Our evaluation on 12 typical workloads in HiBench shows that compared with state-of-the-art approaches, Apollo can improve up to 30% search accuracy, while reducing as much as 50% overhead for picking the optimal cloud configurations.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return