FedDAT: A Method for Fine-tuning Foundation Models in Multi-Modal Heterogeneous Federated Learning. (arXiv:2308.12305v1 [cs.LG])

In recent times, there have been significant advancements in multi-modal learning using foundation models. These models, which have millions or billions of parameters, usually require a large amount of data for fine-tuning. However, gathering and centralizing training data from various sectors becomes challenging due to privacy regulations. To overcome this challenge, Federated Learning (FL) has emerged as a promising solution. FL allows multiple clients to collaboratively train neural networks without sharing their local data. Previous works have utilized Parameter-efficient Fine-tuning (PEFT) methods for FL to reduce client computation burdens and communication overheads. These methods optimize and communicate only a small fraction of the model parameters during federated communications. However, most previous works focused on a single modality and ignored the presence of data heterogeneity across clients. In this study, we propose a fine-tuning framework called Federated Dual-Adapter Teacher (FedDAT) specifically designed for heterogeneous multi-modal FL. Our approach utilizes a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing client local updates and applying Mutual Knowledge Distillation (MKD) for efficient knowledge transfer. FedDAT is the first approach that enables distributed fine-tuning of foundation models for a range of heterogeneous Vision-Language tasks. We conducted extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, and FedDAT outperformed existing centralized PEFT methods adapted for FL.