Machine learning (ML) is transforming solutions in various industries and unlocking new insights from data. ML algorithms analyze large datasets, identify patterns, and make predictions based on those patterns. Distributed training is commonly used when the dataset or model is too large for a single instance to handle. Amazon SageMaker SDK provides native support for distributed training and offers example notebooks for popular frameworks.
However, there are cases where data is decentralized across multiple accounts or regions due to security and privacy regulations. In such scenarios, federated learning (FL) is a suitable approach to train a generalized model using decentralized data. FL allows multiple separate training sessions to run in parallel across different boundaries, such as geographical locations, and aggregates the results to build a global model. This is in contrast to centralized ML techniques where datasets are merged for training.
When using cloud-based approaches, distributed training typically occurs within a single region and account, with the dataset split into smaller subsets and processed by different nodes in a training cluster. In federated learning, training takes place across multiple accounts or regions, with each account or region having its own training instances. The training data remains decentralized throughout the process, and individual data is only accessed by its respective training session.
There are several open-source frameworks available for federated learning, including FATE, Flower, PySyft, OpenFL, FedML, NVFlare, and TensorFlow Federated. When selecting an FL framework, factors such as model category support, ML framework compatibility, and device/OS support should be considered. Flower is a lightweight and customizable FL framework that supports large-scale FL experiments and heterogeneous device scenarios. It is language-agnostic and can incorporate emerging algorithms, training strategies, and communication protocols.
Flower FL operates on a server-client architecture, where edge clients communicate with the server over RPC. Virtual clients consume minimal resources when inactive and load model and data into memory only during training or evaluation. The Flower server sends configuration dictionaries to Flower clients via gRPC, which are then deserialized and used for training. Flower allows customization of the learning process through the strategy abstraction, defining parameter initialization, client contributions, and training/evaluation details. It implements various averaging algorithms for FL.
A federated learning architecture using SageMaker and the Flower framework is implemented using bi-directional gRPC streams. Flower clients receive instructions as byte arrays, deserialize them, and perform training on local data. The results are then serialized and communicated back to the server. In this architecture, SageMaker notebook instances act as the Flower server and Flower clients, residing in different accounts but within the same region. The server defines the strategies and global parameters, which are serialized and sent to the client over VPC peering. The client starts a SageMaker training job based on the server configuration and sends back the updated parameters. The server evaluates the aggregated parameters and produces accuracy metrics using a testing dataset.
To implement federated learning using SageMaker, networking and VPC peering configurations are set up to enable communication between the server and client accounts. Cross-account access settings are established using IAM roles to delegate access across accounts for starting SageMaker training jobs. Federated learning client code is implemented in the client account using the Flower package and SageMaker managed training, while the server code is implemented in the server account using the Flower package.
Overall, federated learning offers a decentralized approach to training ML models when data cannot be centralized. SageMaker provides the infrastructure and tools to implement federated learning using the Flower framework, enabling efficient and secure training across multiple accounts and regions.