We use cookies to improve your experience with our site.

U2CMigration:基于内存脏页预测分析的用户无感容器热迁移方法及系统

U2CMigration: User-Unaware Container Migration with Predictive Analysis of Memory Dirty Pages

  • 摘要:
    研究背景 容器热迁移是云计算和边缘数据中心维护服务连续性与资源效率的关键技术,然而,传统基于预拷贝的迁移方法在应对动态变化的内存脏页负载时表现不佳。在迁移过程中,脏页频繁被重写,导致多次且长时间的停机拷贝迭代,大幅延长迁移时间并造成严重的应用性能下降。不同负载下内存行为的不可预测性进一步增加了迁移的复杂性。现有方法往往采用容器内存脏页的通用性假设,未能充分捕捉负载特有的内存模式及系统级指标特性。因此,这些方法在实践中常常面临效率瓶颈,如迁移时间过长和停机时间超标,限制了其在实际应用中的有效性。
    目的 本研究旨在通过开发一种基于两阶段内存脏页预测的容器热迁移策略,提升云计算和边缘数据中心中容器热迁移的性能。该策略能够有效管理动态变化的容器内存脏页,帮助云和边缘服务运营商精准掌握特定负载的内存行为和系统级指标特性,从而最小化迁移时长和停机时间,并解决传统基于预拷贝迁移方法的局限性。
    方法 本研究提出了U2CMigration,一种基于内存脏页预测的容器热迁移策略,通过两阶段预测方法来管理内存脏页。在第一阶段,采用数据位移分析预测稳定的内存页;在第二阶段,使用基于注意力机制的模型预测不稳定的内存页,该模型能够捕捉时空特征及系统级指标之间的关联性。基于上述预测,本研究进而设计了一种容器热迁移策略,以优化停机拷贝迭代次数,在迁移过程中尽可能减少内存脏页。通过基于CRIU (Checkpoint/Restore In Userspace)构建了一个开源原型系统,并在阿里云上对多种典型容器工作负载进行了广泛的实验验证。实验结果表明,与现有的先进方法相比,U2CMigration在减少迁移时长和停机时间方面具有显著优势。
    结果 实验结果表明,U2CMigration在容器热迁移性能方面取得了显著提升。基于阿里云上的原型系统实验验证,该预测策略在多种容器化负载下显著减少了迁移时长和停机时间。具体而言,U2CMigration将迁移时长减少了26.1%至47.9%,停机时间减少了21.3%至32.6%,优于现有的先进迁移策略。
    结论 U2CMigration专注于优化云计算和边缘数据中心中的容器热迁移,实现了一种用于预测动态变化的内存脏页并设计高效迁移流程的容器热迁移方法。该方法设计了两阶段内存脏页预测模型,开发了用于确定最优停机拷贝迭代的迁移策略,以最小化迁移中的脏页数量,并在阿里云上进行了验证。实验验证了预测分析的有效性。此研究深化了对容器内存页面动态行为的理解,并提供了提升迁移效率的实用工具。本研究成果已开源至GitHub,为容器热迁移优化领域的进一步研究与开发提供了宝贵的资源。未来研究方向包括:扩展U2CMigration以支持GPU容器、评估并发迁移的性能,以及将两阶段内存预测方法应用于服务器内存和I/O优化、异常检测和故障预防等领域。

     

    Abstract: Container live migration serves as the cornerstone of maintaining containerized workloads in cloud and edge datacenters, particularly for stateful applications. However, the de facto memory pre-copy based migration faces severe performance issues for containers with dynamically changing memory dirty pages. Existing research often overlooks such dynamic nature of memory pages of various workloads and their unpredictable relationship with system-level features, causing unwise stop-and-copy iterations of container migrations. This can prolong container migrations by tens of seconds, severely degrading application performance. To address these challenges, we introduce U2CMigration, a user-unaware container live migration strategy for containerized workloads. It employs a lightweight and autonomous two-phase prediction by analyzing container memory pages across various workloads. We utilize the data shift prediction for stable memory pages (phase-1). For unstable memory pages (phase-2), we develop an attention-based prediction method that jointly considers the spatio-temporal characteristics of memory pages and system-level features. Guided by dirty page predictions, we further develop a container live migration strategy that judiciously decides the optimal stop-and-copy iteration with the minimum amount of memory dirty pages. We have implemented an open-source prototype of U2CMigration (https://doi.org/10.57760/sciencedb.32136) based on the CRIU (checkpoint/restore in userspace) project. Extensive prototype experiments demonstrate that U2CMigration reduces the container migration duration by 26.1%–47.9% and the downtime by 21.3%–32.6% compared with the state-of-the-art solutions.

     

/

返回文章
返回