Using Amazon SageMaker to Train Self-Supervised Vision Transformers on Overhead Imagery

This is a guest blog post co-written by Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Travelers. The use of satellite and aerial images has become crucial in various fields such as precision agriculture, insurance risk assessment, urban development, and disaster response. However, training machine learning models to interpret this data is challenging and time-consuming due to the need for human annotation. To overcome this bottleneck, self-supervised learning (SSL) can be employed. SSL involves training models on large amounts of unlabeled image data to learn image representations that can be applied to downstream tasks like image classification or segmentation. This approach allows for better generalization to unseen data and reduces the need for labeled data. In this blog post, we demonstrate how to train self-supervised vision transformers on overhead imagery using Amazon SageMaker. This framework, developed in collaboration with the Amazon Machine Learning Solutions Lab (now known as the Generative AI Innovation Center), aims to support and enhance aerial imagery model use cases. Our solution is based on the DINO algorithm and utilizes the SageMaker distributed data parallel library (SMDDP) to distribute data across multiple GPU instances. After pre-training is complete, the DINO image representations can be transferred to various downstream tasks, resulting in improved model performance within the Travelers Data & Analytics space. The solution involves a two-step process of pre-training vision transformers and transferring them to supervised downstream tasks. The blog post provides a detailed walkthrough of the solution using satellite images from the BigEarthNet-S2 dataset. It covers the prerequisites, such as having access to a SageMaker notebook instance and an Amazon Simple Storage Service (Amazon S3) bucket. The preparation of the BigEarthNet-S2 dataset for DINO training and evaluation is explained, including steps like downloading the dataset, processing metadata files, and uploading the data to an S3 bucket. The next section focuses on training DINO models with SageMaker. The modifications made to the original DINO code are discussed, such as creating a custom PyTorch dataset class for loading BigEarthNet-S2 images and adding support for SMDDP. The code for training DINO models on BigEarthNet-S2 using SageMaker is provided, including the configuration of training hyperparameters and the creation of a SageMaker PyTorch Estimator. The post concludes by emphasizing the importance of training models on multiple GPUs or instances for better performance and providing tips for reducing training time, such as using checkpointing and distributed training.