信息年龄敏感的联邦学习

徐殷; 肖明军; 吴晨; 吴杰; 周津锐; 孙贺

doi:10.1007/s11390-024-3914-x

摘要:

研究背景 联邦学习是一种新兴且有前景的分布式机器学习范式，通过一个中央服务器的协调，促进了全局模型的协同训练，可能涉及大量的客户端。典型的联邦学习过程通常跨足多个轮次，直到达到令人满意的全局模型为止。一方面，联邦学习可以通过在本地保留客户端的训练数据来有效保护客户端的隐私。另一方面，由于只有本地模型参数而不是完整数据集被传输到服务器，联邦学习可以显著减少通信开销。由于这些独特的优势，联邦学习的多个高效且经济有效的工业应用正在涌现。在许多应用场景中，特别是涉及流数据的场景，数据随时间连续生成。参与联邦学习时，客户端有强烈的动机尽可能使用新鲜的数据集来训练其本地模型。因此，客户端将不可避免地在提供新鲜数据集方面花费一些额外的成本，但服务器的总预算通常是有限的。

目的在引入了广泛使用的“信息年龄”（Age-of-Information，AoI）指标来表示数据集的新鲜程度后，我们的研究目标是在信息年龄敏感的联邦学习系统中设计一个客户端选择算法，同时确保服务器的预算不大于预先设定的阈值。

方法我们首先为信息年龄敏感的联邦学习系统推导了一个模型收敛上界。这个上界表明全局模型的损失与本地数据集的新鲜度呈正相关。基于这一洞察，我们将选择最优客户端以最小化全局模型损失的问题转化为一个等价问题：选择具有最小平均AoI值的客户端。随后，我们将此问题构建为一个“无休止多臂赌博机”问题，其中每个客户端被视为一个手臂，客户端的本地数据集的AoI值被视为相应的状态。为了解决这个问题，我们提出了基于Whittle索引的客户端选择算法，称为WICS。其中，我们在每个联邦学习轮次中计算每个客户端的Whittle索引值。基于这些索引值，我们采用贪心策略选择适当的客户端，同时满足服务器端的预算约束。

结果 WICS算法实现了服务器可以选择一些客户端提供新鲜数据集进行本地模型训练，以便在预算约束下最小化全局模型的损失。同时，WICS算法可以实现几乎最优的客户端选择性能。

结论全面的理论分析结果表明，本地数据集的高AoI值对全局模型的收敛是有害的。我们观察到，所提出的WICS算法在模型准确性和损失方面的性能均优于比较算法。同时，我们的WICS算法在损失函数为凸或非凸时均能良好运行。此外，我们对所提出算法在更实际场景中的可能扩展进行了各种讨论，然后指出了未来深入研究的潜在方向。

Abstract: Federated learning (FL) is an emerging privacy-preserving distributed computing paradigm, enabling numerous clients to collaboratively train machine learning models without the necessity of transmitting clients’ private datasets to the central server. Unlike most existing research where the local datasets of clients are assumed to be unchanged over time throughout the whole FL process, our study addresses such scenarios in this paper where clients’ datasets need to be updated periodically, and the server can incentivize clients to employ as fresh as possible datasets for local model training. Our primary objective is to design a client selection strategy to minimize the loss of the global model for FL loss within a constrained budget. To this end, we introduce the concept of ‘‘Age of Information’’ (AoI) to quantitatively assess the freshness of local datasets and conduct a theoretical analysis of the convergence bound in our AoI-aware FL system. Based on the convergence bound, we further formulate our problem as a restless multi-armed bandit (RMAB) problem. Next, we relax the RMAB problem and apply the Lagrangian Dual approach to decouple it into multiple subproblems. Finally, we propose a Whittle’s Index Based Client Selection (WICS) algorithm to determine the set of selected clients. In addition, comprehensive simulations substantiate that the proposed algorithm can effectively reduce training loss and enhance the learning accuracy compared with some state-of-the-art methods.

信息年龄敏感的联邦学习

Age-of-Information-Aware Federated Learning