What is MLOps?
Machine learning operations (MLOps) is the practice of creating new machine learning (ML) and deep learning (DL) models and running them through a repeatable, automated workflow that deploys them to production.
An MLOps pipeline provides a variety of services to data science teams, including model version control, continuous integration and continuous delivery (CI/CD), model service catalogs for models in production, infrastructure management, monitoring of live model performance, security, and governance.
This is part of an extensive series of guides about AI Technology.
In this article:
- Key Benefits of MLOps
- DevOps vs MLOps
- Implementing MLOps in Your Organization
- How to Succeed with MLOps: 12 Essential Best Practices
- Stay Ahead of the ML Curve with Run:ai
Key Benefits of MLOps
MLOps is the critical missing link that allows IT to support the highly specialized infrastructure requirements of ML infrastructure. The cyclical, highly automated MLOps approach:
- Reduces the time and complexity of moving models into production.
- Enhances communications and collaboration across teams that are often siloed: data science, development, operations.
- Streamlines the interface between R&D processes and infrastructure, in general, and operationalizes the use of specialized hardware accelerators (such as GPUs), in particular.
- Operationalizes model issues critical to long-term application health, such as versioning, tracking, and monitoring.
- Makes it easier to monitor and understand ML infrastructure and compute costs at all stages, from development to production.
- Standardizes the ML process and makes it more auditable for regulation and governance purposes.
DevOps vs MLOps
MLOps was inspired by DevOps, and the two approaches are inherently similar. However, there are a few ways in which MLOps differs significantly from DevOps:
- MLOps is experimental in nature - most of the activity of data science teams relates to experimentation. Teams constantly change features of their models to achieve better performance, while also managing an evolving codebase.
- Hybrid teams - data science teams include both developers (machine learning engineers) and data scientists or researchers who analyze data and develop models and algorithms. The latter might not be experienced at software development.
- Continuous testing (CT) - in addition to the regular testing stages of a DevOps pipeline, such as unit tests, functional tests and integration tests, an MLOps pipeline must also continually test the model itself - training it and validating its performance against a known dataset.
- Automatic retraining - in most cases, a pre-trained model cannot be used as-is in production. The model needs to be retrained and deployed on an ongoing basis. This requires automating the process data scientists go through to train and validate their models.
- Performance degradation - unlike regular software systems, even if a model is working perfectly, performance can degrade over time. This can happen due to unexpected characteristics of data consumed by the model, differences between training and inference pipelines, and unknown biases which can grow with each feedback loop.
- Data monitoring - it is not sufficient only to monitor a model as a software system. MLOps teams also need to monitor the data and predictions, to see when the model needs to be refreshed or rolled back.
Implementing MLOps in Your Organization
This discussion is based on a framework by Google Cloud. Below is Google’s process for implementing MLOps in your organization and moving from “MLOps Level 0” in which machine learning is completely manual, to “MLOps Level 2” in which you have a fully automated MLOps pipeline.
MLOps Level 0: Manual Process
At this level of maturity, a team is able to build useful ML/DL models, but have a completely manual process for deploying them to production. The ML pipeline looks like this:
- All steps in the pipeline are manual or based on experimental code executed in Jupyter Notebooks - including data analysis, preparation, training, and validation.
- Data scientists work separately from engineers who deploy the final prediction service. The data science team provides a trained model to ML engineers, who are responsible for making it available as an API at low latency. The differences between experimental environments and production environments can lead to training-serving skew.
- Models are not frequently released. The assumption is that the data science team has finished working on the model and it can now be deployed to production.
- There is no CI/CD because the model is not planned to change on a regular basis. So at this level of maturity, there is no consideration of automated building of model code (CI) or automated deployment of a prediction service to production (CD).
- There is no regular monitoring of model performance - under the assumption that the model will deliver consistent performance with new data.
MLOps Level 1: ML Pipeline Automation
At this level of maturity, there is an understanding that the model needs to be managed in a CI/CD pipeline, and training/validation needs to be performed continuously on new data. The ML Pipeline now evolves to look like this:
- Experiments can happen much faster, due to orchestration of the entire ML process. Data scientists can think of a hypothesis and rapidly deploy it to production.
- The model is continuously tested and re-trained with fresh data, based on feedback from live model performance.
- The same setup is used in the experimental environment as in the production environment, to eliminate training-serving skew.
- All components used to build and train the model are reusable and shareable across multiple pipelines.
MLOps Level 2: Full CI/CD Pipeline Automation
At this highest level of MLOps maturity, new experiments are seamlessly deployed to production with minimal involvement of engineers. A data scientist can easily create a new ML pipeline and automatically build, test, and deploy it to a target environment. This type of setup is illustrated in the following diagram:
A fully automated CI/CD pipeline works like this:
- Teams come up with new models and experiments, and generate source code that describes their efforts.
- Source code is automatically built by the CI engine, which runs automated tests. It generates artifacts that can be deployed at later stages.
- The pipeline deploys the artifacts to the target environment, which now has a fully functional new version of the model.
- The pipeline executes automatically based on a trigger, and the result is pushed to a model registry.
- The trained model is deployed and enables live predictions with low latency.
- The pipeline collects statistics on live model performance. Data scientists can evaluate this data and based on mode performance start a new experiment cycle (back to step 1).
How to Succeed with MLOps: 12 Essential Best Practices
1. Automate Model Deployment
Automating model deployment is essential for MLOps as it streamlines the process of integrating trained machine learning models into production environments. This is crucial for the following reasons:
- Consistency: Automated deployment processes help ensure that models are consistently deployed following predefined standards and best practices, reducing the risk of errors and inconsistencies that may arise from manual deployment.
- Faster time-to-market: Automation shortens the time it takes to deploy a model from development to production, enabling businesses to benefit from the insights generated by the model more quickly.
- Seamless updates: Automating model deployment allows for more frequent and seamless updates, ensuring that production models are always using the latest trained versions. This is particularly important when dealing with dynamic data or rapidly evolving business needs.
2. Keep the First Model Simple and Build the Right Infrastructure
Starting with a simple model and focusing on the infrastructure is crucial for success in MLOps. This approach has several advantages:
- Faster iteration: A simple model allows teams to iterate quickly, which is crucial in the early stages of a project. This helps identify potential issues and opportunities for improvement, allowing the team to refine the model and infrastructure more effectively.
- Easier debugging: Simple models are easier to understand and debug. This is important because it's often difficult to tell whether an issue is due to the model or the infrastructure. By starting simple, teams can more easily identify and fix problems in the infrastructure.
- Scalability: Building a robust and scalable infrastructure from the beginning ensures that the system can handle more complex models and larger datasets as the project grows. This approach saves time and resources, as the team won't need to overhaul the infrastructure later.
- Integration: A well-designed infrastructure makes it easier to integrate new models or components, allowing for smoother collaboration between data scientists and engineers. This, in turn, accelerates the development and deployment of new features and improvements.
To build a robust infrastructure, consider the following components:
- Data ingestion: Set up a reliable and efficient pipeline to collect, preprocess, and store data for training and evaluation.
- Model training: Develop a system that allows for easy and efficient training of models, along with the ability to track experiments and results.
- Model deployment: Create a reliable deployment pipeline that ensures models can be easily updated, rolled back, or replaced in production.
- Monitoring and logging: Implement monitoring and logging solutions to track model performance, resource usage, and any errors or issues that may arise.
- Security and compliance: Ensure that the infrastructure complies with relevant regulations and follows best practices for data security and privacy.
3. Enable Shadow Deployment
Shadow deployment is a technique used in MLOps where a new version of a machine learning model is deployed alongside the current production model without affecting the live system. The new model processes the same input data as the production model but does not influence the final output or decisions made by the system.
The role of shadow deployment in MLOps:
- Validation: Shadow deployments allow teams to evaluate the performance and behavior of the new model in a production-like environment, without disrupting the live system.
- Risk mitigation: By running the new model in parallel with the production model, teams can identify and fix potential issues before fully rolling out the updated model, minimizing the risk of unexpected problems.
- Performance comparison: Shadow deployments enable the comparison of the new model's performance against the current production model, ensuring the new version provides tangible improvements before being fully deployed.
Here are general steps of how this process can work:
- Infrastructure: Set up an infrastructure that supports running multiple models concurrently, processing the same input data without interfering with each other.
- Data routing: Route input data to both the production and shadow models, allowing them to process the data independently.
- Model outputs: Collect and store the outputs of both models separately, ensuring that only the production model's output is used for making decisions in the live system.
- Monitoring and evaluation: Monitor and compare the performance of both models, using predefined metrics to evaluate if the shadow model meets the desired criteria for full deployment.
4. Ensure Data Labeling is Strictly Controlled
Data labeling is crucial for supervised machine learning, as it provides the ground truth for training and evaluation. A controlled and consistent process for data labeling ensures high-quality labeled data and reduces the risk of introducing biases or errors into the model. Here are important practices to consider:
- Develop clear labeling guidelines: Establish comprehensive and unambiguous labeling instructions to minimize inconsistencies and errors among human annotators.
- Train and assess annotators: Provide training and support to annotators to help them understand the labeling guidelines and task requirements. Periodically assess their performance and provide feedback to ensure high-quality annotations.
- Use multiple annotators: Assign multiple annotators to label the same data, and use consensus or other techniques to aggregate their inputs. This can help reduce individual biases and improve the overall quality of labeled data.
- Monitor and audit the labeling process: Regularly monitor the labeling process and conduct quality audits to identify and address issues. Use feedback loops to iteratively refine the guidelines and improve annotator performance.
5. Use Sanity Checks for External Data Sources
Data quality plays a crucial role in the performance of ML models. Ensuring data sanity checks for all external data sources helps prevent issues related to data quality, inconsistencies, and errors.
- Data validation: Implement data validation checks to ensure that incoming data adheres to predefined data formats, types, and constraints.
- Detect anomalies: Develop strategies to identify and handle missing values, outliers, and duplicate records in the data. This will help maintain data quality and improve model performance.
- Monitor data drift: Regularly monitor data sources for changes in data distribution, which can impact model performance. Establish automated processes to detect data drift and trigger alerts for necessary action.
6. Write Reusable Scripts for Data Cleaning and Merging
Data preparation is a time-consuming process, often involving data cleaning, transformation, and merging from multiple sources. Writing reusable scripts for these tasks can improve efficiency and maintain consistency across projects. Here are recommended practices:
- Modularize code: Break down data preparation tasks into smaller, independent functions that can be easily reused and combined. This enables faster development, simplifies debugging, and improves code readability.
- Standardize data operations: Create standardized functions and libraries for common data operations such as data cleansing, imputation, and feature engineering. This promotes reusability, reduces duplication, and ensures consistent data handling across projects.
- Automate data preparation: Develop automated pipelines for data preparation tasks to minimize manual intervention and reduce the potential for errors. This can improve efficiency and make it easier to maintain and update data processes.
- Version control for scripts: Use version control systems to manage changes in data preparation scripts, ensuring that the latest and most accurate version is always used. This can help prevent inconsistencies and errors caused by using outdated or incorrect scripts.
7. Enable Parallel Training Experiments
Parallel training experiments allow running multiple machine learning model training jobs simultaneously. This approach is used to speed up the process of model development and optimization by exploring different model architectures, hyperparameters, or data preprocessing techniques concurrently.
Parallel training experiments can help organizations iterate more quickly, identify better model configurations, and make the most efficient use of their available computing resources. The benefits of parallel training experiments include:
- Accelerated model development: Running multiple experiments at the same time allows data scientists and engineers to quickly test different configurations and identify the most promising ones, reducing the overall time required to develop a high-performing model.
- Efficient resource utilization: Parallel training enables organizations to make better use of their available computing resources, such as GPUs or CPU cores, as they can distribute the training workload across these resources.
- Improved model performance: By exploring a broader range of model configurations in parallel, teams can increase the likelihood of finding a high-performing model that meets their specific needs and requirements.
- Experiment management: With multiple experiments running concurrently, it becomes essential to use tools and processes that can effectively track, compare, and analyze the results of different runs, facilitating data-driven decision-making and model optimization.
8. Evaluate Training Using Simple, Understandable Metrics
A well-defined, easily measurable, and understandable metric is critical for tracking the performance of a machine learning model. This allows teams to evaluate the model's progress and make informed decisions on how to improve it. Here are some recommendations for choosing and using an effective metric:
- Alignment with business objectives: The metric should be closely related to the ultimate goal of the project. This alignment ensures that improvements in the metric translate to tangible benefits for the organization.
- Interpretability: Choose a metric that is easy to understand and communicate. This helps both technical and non-technical stakeholders to make sense of the model's performance and its impact on the business.
- Trade-offs: In some cases, multiple metrics may be needed to capture different aspects of the model's performance. Consider the trade-offs between these metrics when optimizing the model.
9. Automate Hyper-Parameter Optimization
Hyperparameter optimization (HPO) is the process of finding the best set of hyperparameters for a given machine learning model. Hyperparameters are external configuration values that cannot be learned by the model during training but have a significant impact on its performance. Examples of hyperparameters include learning rate, batch size, and regularization strength for a neural network, or the depth and number of trees in a random forest.
There are several approaches to HPO, including:
- Grid search: A predefined set of hyperparameter values is exhaustively tested, and the best configuration is chosen based on a chosen evaluation metric.
- Random search: Sampling hyperparameter values randomly from a predefined search space. This is generally more efficient than grid search.
- Bayesian optimization: Using a probabilistic model to guide the search for optimal hyperparameters, keeping track of past evaluations and using this information to explore the search space more efficiently.
- Genetic algorithms: Searching the hyperparameter space by maintaining a population of candidate solutions that evolve over time.
- Gradient-based optimization: For some models, hyperparameters can be optimized using gradient-based methods, which leverage the gradient of the loss function with respect to the hyperparameters.
Incorporating HPO into an MLOps pipeline can have significant benefits, including:
- Improved model performance: Optimizing hyperparameters can lead to substantial improvements in model performance, allowing organizations to achieve higher predictive accuracy or better generalization to unseen data.
- Increased efficiency: Automated HPO can reduce the time and effort required for manual hyperparameter tuning, freeing up resources for other tasks and accelerating the overall development process.
- Consistency and reproducibility: By automating the HPO process and integrating it into a well-defined MLOps pipeline, organizations can ensure consistent and reproducible results, making it easier to track and compare the performance of different models and configurations.
- Continuous improvement: As new data is collected or as the underlying problem domain evolves, an automated HPO process can be easily incorporated into a continuous integration and deployment (CI/CD) pipeline, enabling the model to adapt and improve over time.
10. Continuously Monitor the Behaviour of Deployed Models
Continuous monitoring of deployed models is a vital aspect of MLOps, as it ensures that machine learning models maintain their performance and reliability in production environments. The importance of continuous monitoring lies in:
- Detecting model drift: As data distributions change over time, the model's performance may degrade. Continuous monitoring allows for early detection of these changes, prompting retraining or updating the model to maintain its effectiveness.
- Identifying issues: Continuous monitoring helps detect anomalies, errors, or performance issues in real-time, allowing teams to quickly address and resolve problems.
- Maintaining trust: By consistently tracking model performance, organizations can ensure that stakeholders trust the model's results and decisions, which is crucial for widespread adoption.
- Compliance and auditing: Continuous monitoring provides a record of model performance and usage, helping organizations maintain compliance with regulations and facilitating audits.
Continuous monitoring in MLOps typically involves:
- Performance metrics: Collecting and analyzing key performance metrics (e.g., precision, recall, or F1 score) at regular intervals to evaluate the model's effectiveness.
- Data quality: Monitoring input data for anomalies, missing values, or distribution shifts that could impact the model's performance.
- Resource usage: Tracking the usage of system resources (e.g., CPU, memory, or storage) to ensure the infrastructure can support the deployed model without issues.
- Alerts and notifications: Setting up automated alerts and notifications to inform relevant stakeholders when predefined thresholds are crossed, signaling potential issues or the need for intervention.
11. Enforce Fairness and Privacy
Ensuring fairness and privacy in ML models is critical to prevent unintended biases, discrimination, and violations of user privacy. Here are several ways to integrate fairness and privacy more effectively into the MLOps lifecycle:
- Assess fairness: Evaluate the model's fairness using appropriate metrics, and analyze its performance across different demographic groups to identify and mitigate potential biases.
- Use privacy-preserving techniques: Implement techniques such as differential privacy and federated learning to protect sensitive user data and maintain privacy during model training and inference.
- Regularly review policies: Stay updated on relevant laws, regulations, and ethical guidelines to ensure compliance with fairness and privacy requirements. Periodically review and update internal policies and practices accordingly.
12. Improve Communication and Alignment Between Teams
Effective collaboration and communication between cross-functional teams, such as data scientists, engineers, and business stakeholders, are essential for successful MLOps. This ensures that everyone is on the same page and working towards a common goal.
- Establish clear objectives: Clearly define the problem statement, scope, and desired outcomes for ML projects. Regularly communicate these objectives and any changes to all team members.
- Maintain documentation: Document every step of the ML project, including data sources, data preparation, feature engineering, model selection, evaluation metrics, and deployment strategies. This enables knowledge sharing, improves team collaboration, and simplifies onboarding new team members.
- Hold regular meetings: Schedule regular meetings and discussions for team members to share updates, challenges, and insights. Encourage open communication and feedback to ensure everyone's contributions are acknowledged and incorporated.
- Use version control: Use version control systems like Git for code, data, and models to track changes, manage collaboration, and maintain a single source of truth.
Learn more about cutting edge MLOps tools in our guides to:
Stay Ahead of the ML Curve with Run:ai
In today’s highly competitive economy, enterprises are looking to Artificial Intelligence in general and Machine and Deep Learning in particular to transform big data into actionable insights that can help them better address their target audiences, improve their decision-making processes, and streamline their supply chains and production processes, to mention just a few of the many use cases out there. In order to stay ahead of the curve and capture the full value of ML, however, companies must strategically embrace MLOps.
Run:ai’s AI/ML virtualization platform is an important enabler for Machine Learning Operations teams. Focusing on deep learning neural network models that are particularly compute-intensive, Run:AI creates a pool of shared GPU and other compute resources that are provisioned dynamically to meet the needs of jobs in process. By abstracting workloads from the underlying infrastructure, organizations can embrace MLOps and allow data scientists to focus on models, while letting IT teams gain control and real-time visibility of compute resources across multiple sites, both on-premises and in the cloud.
See for yourself how Run:ai can operationalize your data science projects, accelerating their journey from research to production.
The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.