Machine Learning Operations (ML Ops)

Machine Learning Operations (or ML Ops) is a set of practices that standardize and streamline the deployment of machine learning (ML) models into production or live environment. As mentioned before, it encompasses the machine learning knowledge to create the models, the DevOps that control the software lifecycle and the data engineering to handle the constantly mutable data; meaning it takes from all these disciplines as shown below:

  • Continuous deployment (CD) no longer refers to a single software package or a service, but a system (an ML training pipeline) that should automatically deploy another service (model prediction service).
  • Continuous training (CT) is unique to ML building and stands for model service and retraining.

ML pipelines​

Bearing the previous information in mind, the following diagram shows one of the most established formalization of the sequence of steps, or pipeline, to deploy an ML model proposed by Though Works in [3]; they call it Continuous Delivery for Machine Learning (CD4ML):

  1. Data scientists build the model by experimenting with different approaches to find the best candidate. This is done creating code which trains a given model with the data prepared in the previous step. The selection of the model is done using error analysis, error measurement, and model performance metrics.
  2. Developers or DevOps engineers deploy the model into a production environment. This is done by packaging the model and sending it to the desired environment in the cloud or edge devices. This process is called. This can be done as embedded in an application, served as an API which is called model serving, docker container deployed or saved in a framework standardized way (e.g., TensorFlow SavedModel, PMML, PFA, or ONNX), among other strategies [1].

MLOps levels

As could be seen previously, the ML code is just the first of many steps to see the model effectively create a contribution that can work in live real-world systems:

  • Level 1: continuous training of the model by automating the ML pipeline, good fit for models that need to be retrained because of new data, but not sufficient to rapidly test other ML ideas or new pipeline components.
  • Level 2: a robust CI/CD automated system is needed, this is needed when you want to provide data scientists with a rapid way to explore feature engineering, model architecture, and hyperparameters.

MLOps benefits and costs

According to Forbes, the ML Ops industry will be worth around four (4) billion by 2025 [5] which makes sense giving the amount of benefits it pertains:

  • ML models scalability and management
  • ML model’s health and governance
  • Helps handling the unpredictability and quality of data
  • Facilitates collaboration using CI/CD
  • Testing: data and model validation
  • Monitoring: ML Ops systems require continuous monitoring and auditing for accuracy. For this, different types of monitoring are needed: memory usage monitoring when doing predictions, model performance monitoring and infrastructure monitoring.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
The Data Analysis Bureau

The Data Analysis Bureau

We are a Data Science and Data Engineering Innovation Agency specialising in Machine Learning.