Generative AI (GenAI) has the potential to revolutionize creative advertising. By retraining GenAI models and providing textual prompts, such as sentences describing scenes and objects, a wide variety of novel images can be created, including product shots. This technique has shown promising results with the emergence of latent diffusion models like Stable Diffusion, Midjourney, and Dall-E-2.

However, using these models in production requires constant refinement to ensure consistent outputs. This often involves creating numerous sample images and clever prompt engineering, which can be challenging at scale. In this post, we explore how GenAI can be harnessed to generate captivating and innovative advertisements on a large scale, particularly when dealing with extensive image catalogs.

One key technique within GenAI-based image generation is inpainting, which involves seamlessly creating image backgrounds and reducing unwanted image artifacts. By leveraging the power of GenAI, specifically through inpainting, visually stunning and engaging content can be produced. This post delves into the practical implementation of this technique using Amazon SageMaker endpoints, which enable efficient deployment of the GenAI models driving this creative process.

To strike a balance between creative freedom and efficient production, we propose generating a multitude of realistic images with minimal supervision. This involves splitting the inpainting process into layers, each with different prompts. The process can be summarized as follows: first, a general scene prompt is used to create a background image, and the object is randomly placed on it. Then, a layer is added to the lower mid-section of the object based on where it lies. Finally, a layer similar to the background is added to the upper mid-section of the object. This approach improves the realism of the object by scaling and positioning it relative to the background environment, aligning with human expectations.

The proposed solution involves the following data flow: the Segment Anything Model (SAM) and Stable Diffusion Inpainting models are hosted in SageMaker endpoints. A background prompt is used to generate a background image using the Stable Diffusion model. A base product image is passed through SAM to generate a mask, and the anti-mask is derived from the inverse of the mask. The generated background image, mask, foreground prompts, and negative prompts are used as input to the Stable Diffusion Inpainting model to generate a generated intermediate background image. Similarly, the generated background image, anti-mask, foreground prompts, and negative prompts are used as input to the Stable Diffusion Inpainting model to generate a generated intermediate foreground image. The final output is obtained by combining the generated intermediate foreground and background images.

To implement this solution, certain prerequisites are required, including an AWS account with access to AWS CloudFormation, SageMaker, and Amazon S3. The NVIDIA A10G-enabled instance ml.g5.2xlarge is used to host the models. The AWS CloudFormation template provided sets up the SageMaker notebooks needed to deploy the endpoints and run inference.

To mask the area of interest of the product, an image of the object and a mask delineating its contour are needed. This can be done using tools like Amazon SageMaker Ground Truth or AI tools like SAM. SAM is an advanced generative AI technique that generates high-quality masks for objects within images. It uses deep learning models trained on extensive datasets to accurately identify and segment objects, providing precise boundaries and pixel-level masks. The SAM model is hosted on a SageMaker endpoint using the provided notebook.

Inpainting is then used to create a generated image by combining the SAM-generated mask, user prompts, and advanced generative AI techniques. This seamlessly fills in the missing or masked regions of an image, blending them with the surrounding content. The Stable Diffusion Inpainting model is hosted on a SageMaker endpoint using the provided notebook.

Overall, this solution enables the generation of captivating and personalized images for creative advertising at scale, leveraging the power of GenAI and inpainting techniques. The detailed implementation steps and prerequisites are provided in the post, along with the necessary code and resources.