In today’s business landscape, many customers are enthusiastic about large language models (LLMs) and how generative AI can revolutionize their operations. However, integrating such solutions and models into regular business operations is not a simple task. In this article, we will explore how to operationalize generative AI applications using MLOps principles, resulting in foundation model operations (FMOps). We will also delve into the most common use case of generative AI, which is text-to-text applications and LLM operations (LLMOps), a subset of FMOps.
To begin, let’s briefly introduce MLOps principles and discuss their main differentiators compared to FMOps and LLMOps in terms of processes, people, model selection and evaluation, data privacy, and model deployment. These principles apply to customers who use pre-built models, create their own foundation models, or fine-tune existing models. Our approach is applicable to both open-source and proprietary models.
MLOps, as defined in our previous post on the MLOps foundation roadmap for enterprises with Amazon SageMaker, refers to the combination of people, processes, and technology to efficiently operationalize machine learning (ML) solutions. To achieve this, various teams and personas need to collaborate, as illustrated in the figure below.
The teams involved in MLOps are as follows:
1. Advanced analytics team (data lake and data mesh): Data engineers are responsible for preparing and ingesting data from multiple sources, building ETL pipelines to curate and catalog the data, and preparing historical data for ML use cases. These data owners focus on providing access to their data to multiple business units or teams.
2. Data science team: Data scientists focus on creating the best model based on predefined key performance indicators (KPIs) by working in notebooks. After the research phase, they collaborate with ML engineers to create automations for building and deploying models into production using CI/CD pipelines.
3. Business team: A product owner defines the business case, requirements, and KPIs to evaluate model performance. Other business stakeholders use the inference results (predictions) to drive decisions.
4. Platform team: Architects are responsible for the overall cloud architecture of the business and how different services are connected. Security SMEs review the architecture based on security policies and needs. MLOps engineers provide a secure environment for data scientists and ML engineers to operationalize ML use cases. They standardize CI/CD pipelines, user and service roles, container creation, model consumption, testing, and deployment methodology based on business and security requirements.
5. Risk and compliance team: Auditors assess data, code, and model artifacts to ensure compliance with regulations, such as data privacy.
Note that a single person can cover multiple personas depending on the scaling and maturity of MLOps in the business. Each persona requires dedicated environments to perform their respective processes, as shown in the figure below.
The environments involved in MLOps are as follows:
1. Platform administration: The platform team has access to create AWS accounts and link users and data.
2. Data: The data layer, known as the data lake or data mesh, is where data engineers, data owners, and business stakeholders prepare, interact with, and visualize data.
3. Experimentation: Data scientists use a sandbox or experimentation environment to test new libraries and ML techniques for proof of concept.
4. Model build, model test, model deployment: Data scientists and ML engineers collaborate in this environment to automate and move research to production.
5. ML governance: This environment stores, reviews, and audits all model and code artifacts.
The following diagram illustrates the reference architecture discussed in the MLOps foundation roadmap for enterprises with Amazon SageMaker. Each business unit has its own development, preproduction, and production accounts for operationalizing ML use cases. These accounts retrieve data from a centralized or decentralized data lake or data mesh. All models and code automation are stored in a centralized tooling account using a model registry. The infrastructure code for these accounts is versioned in a shared service account, allowing the platform team to abstract, templatize, maintain, and reuse it for onboarding new teams to the MLOps platform.
Generative AI, on the other hand, requires additional capabilities or extensions to the existing MLOps domain. One of these new concepts is the foundation model (FM). FMs are trained on massive amounts of data and have a vast number of parameters to predict the next best answer in various generative AI use cases. FMs can be used to create a wide range of other AI models, including text-to-text, text-to-image, and text-to-audio or video.
To operationalize generative AI use cases, we need to include the following in the MLOps domain:
1. FM operations (FMOps): This encompasses the productionization of generative AI solutions, including any type of use case.
2. LLM operations (LLMOps): This is a subset of FMOps that focuses on productionizing LLM-based solutions, such as text-to-text applications.
The figure below illustrates the overlap of these use cases. Compared to classic ML and MLOps, FMOps and LLMOps differ in four main categories, which we will cover in the following sections: people and process, selection and adaptation of FM, evaluation and monitoring of FM, and data privacy and model deployment. Monitoring will be covered in a separate post.
To simplify the description of the processes, we can categorize the main generative AI user types, as shown in the figure below:
1. Providers: These users build FMs from scratch and offer them as a product to other users (fine-tuners and consumers). They possess expertise in end-to-end ML, natural language processing (NLP), and data science, as well as large teams for data labeling and editing.
2. Fine-tuners: These users retrain (fine-tune) FMs from providers to meet specific requirements. They orchestrate the deployment of the model as a service for consumers. Fine-tuners require strong expertise in end-to-end ML, data science, and knowledge of model deployment and inference. They also need domain knowledge for tuning, including prompt engineering.
3. Consumers: These users interact with generative AI services from providers or fine-tuners through text prompting or a visual interface to perform desired actions. Consumers do not require ML expertise, but they should have an understanding of the service capabilities. Prompt engineering is necessary for better results.
MLOps is primarily required for providers and fine-tuners, while consumers can use DevOps and AppDev principles to create generative AI applications. We have also observed a movement among user types, where providers may become fine-tuners to support specific verticals, and consumers may become fine-tuners to achieve more accurate results.
In conclusion, integrating generative AI applications into business operations requires the application of MLOps principles, with additional considerations for FMOps and LLMOps. By following the operationalization journey for different generative AI user types, businesses can effectively leverage the power of generative AI to transform their operations.