ML-Ops: Operationalizing a Machine Learning Model, end to end

What does it mean to deploy a Machine Learning Model?

Abhishek Dabas
6 min readAug 27, 2021

As the Machine Learning (ML)community continues to grow, we want to deploy and serve our models better. ML deployment faces the general issues in the deployment lifecycle of any software application PLUS(+) an additional set of ML specific issues. To build a machine learning to a specific use case we perform data collection, feature engineering, modeling building, model evaluation. Once we have a prototype of this ML model, we would want to put it into production, which includes the serving infrastructure and monitoring. As mentioned above, the ML pipeline needs effective automation of the whole process to serve its predictions accurately over time with a potential to scale. Code bugs, system failures, human error, missing logs are some faults to name in this process. Ex. there are new movies from different genre released every week…so a recommendation system would need retraining periodically to capture this new information and signals. The point is, we need to make sure, how often do we actually need retaining and if there is something NEW coming in, which needs to be captured. Recognizing, prioritizing, and rewarding this effort is important for the good health of a successful ML lifecycle. The goal of MLOps is to close the gap between the development and production of a ML application and help ML projects to have better quality, reliability and maintainability.

Learning is an iterative process. A human learns a new skill set over a period of time, which includes trying and failing multiple times until a certain level of precision is acheived. Similarly, Machine Learning is also an iterative process, where you start with preprocessing your data and a simple baseline model, on which you work iteratively to improve the performance. Once we are past the testing phase and have a model, which is ready to be put into production, we would want to check the deployment constraints (finalizing tools, choosing a metric, setting up codebase, etc). Its important to understand, that a ML model is helpful for an organization, only when it continuously feeds insights to its users with a certain precision, hence it's very important to plan this deployment lifecycle. The complications are not over yet, once deployed, we would also want to have an “Auditing Framework”, where we would want to plan and check where our system can go wrong…this could include a check on bias & fairness, performance drift, etc etc. Let's discuss more about these challenges and solutions that can be helpful in the a machine learning deployment lifecycle.

Challanges with the deployment of ML Model:

  1. CACE: Changing Anything Changes Everything
    A ML pipeline consists of many small components (data validation, feature selection, modeling, validation). All these components perform the actions independent of each other but depend on input from the previous step to perform a particular action on it. Changing something in one component will change the performance of the whole pipeline. Ex. change in hyperparameters or input data or sampling methods will change the performance of the model and there is a possibility of getting very different predictions with such changes.
  2. Data Drift:
    Data is the heart of ML model. Data defines how the ML pipeline should behave and change in data distribution is something that is one of the hardest for us to see sometimes. Because we live in a dynamic environment, the disribution of data changes over time. Ex. if a new feature is added to the input data through the course of time, we need to train our model to capture these signals.
  3. Concept Drift:
    It is the change in relationships between input and output data over time. This is usually when the Data Generating Process, in itself has changed over time. Ex. There were alot of issues with credit card fraud detection models after the pandemic started, because the normal behaviour of humans have changed after pandemic (Read More here).
  4. Model Degradation:
    Models are a result of code and data, which usually evolves over time. When there is a shift in the data, the model’s performance degrades (which is normal), but usually, it's either retrained fully, which takes a lot of time or a new model is stacked on top of the previous model which works on the output from the first model. This process can lead to multiple stacking, redundant effort, and increased cost.
  5. Data Centric VS Model Centric approach:
    There are 2 approaches to improve the ML performance, Data-Centric approach(hold the model fixed and improve the quality of data) and Model-Centric approach(hold data fixed and iteratively improve the model). I believe there is a tradeoff between both these approaches because sometimes we need complex deep learning models to pick complex signals from the data to make a prediction, but without a good quality of data, even the most complex model would not be very efficient. Its improtant to figure out the tradeoff between both these approaches
  6. Intermediate Data Storage:
    When a big machine learning pipeline is created, each component depends on the output from previous components, wherein the data has to be shared and stored in an intermediate space. At this stage it becomes important to structure the pipeline in way that the data is transferred in an efficient manner across the components.
  7. Feedback loops:
    It's important to make sure that the training dataset is not influenced by the output from the currently used model because in that case, the model is learning from its own biases and probably will aplify those. To make sure the ML model is always learning about the environment and accurate with its predictions, we need to keep feeding in the fresh examples to the model.

Best Practices:

  1. Modular Design:
    Building a Machine learning pipeline with modular components can make it easy to test and reproducible. These components can exist in form of libraries to build the fragments of the ML pipeline. Ex. a component for loading data, feature selection and engineering, training models and validation. *Transfer Learning*- It makes learning from one model to the next version.
  2. Continuous Experimentation, Learning, and Deployment:
    ML pipelines operate in a dynamic environment, which is why data and models continuously change over time. ML pipeline should be adaptable to continuous changes and upgrades to maintain the accuracy of its predictions. As mentioned above, even a small change such as an addition of a feature in the input dataset can cause the complete pipeline to break and require retraining. ML pipelines are better to be automated, when the pipeline detects changes, a trigger schedules automatic retraining. ML pipelines need repeated experimentation (ex. A/B testing) and the creation of models scratch/retraining with new data, over time to stay relevant in the dynamic environment.
  3. Monitoring Performance:
    In order to detect changes in the behavior of our ML pipeline, we need to monitor the performance of the model and other behaviors such as bias, fairness, etc. ML pipelines always have the risk of degradation and drift. “Statistical Process Control” can be used to detect the changes/deviation in the performance. Humans are really good at inferring from visuals and understanding information in a summarized manner, which is a very important aspect of performance monitoring.
  4. Version Control:
    Version controlling has been a very important pillar for the evolution within software engineering and is similarly very important and effective for tracking the changes in the ML pipeline over time. It makes switching between different versions of the model easy if required.
  5. Logging:
    Writing intermediate data files helps to compare and track the performance of machine learning experiments quantitatively. This can include the multiple metric log files which makes it easy for us to track the improvement and to detect the data/concept drift.
  6. Human in Loop:
    AI models can not be completely trusted without proper checks. A human in the loop for the training and testing stages of the pipeline can always give better results with more confidence. The combination of human and machine learning creates a more effective learning cycle. With a proper interaction between humans and machines, the performance of the pipeline can be increased.

Existing tools for MLOPS:

  1. TensorFlow Extended(TFX): An end-to-end platform for deploying production ML pipelines with TensorFlow.
  2. Torchserve: A flexible and easy-to-use tool for serving PyTorch models.
  3. AWS SageMaker: Prepare, build, train, and deploy high-quality machine learning (ML) models quickly
  4. MLflow: Open source platform for the machine learning lifecycle.
  5. Kubeflow: Making deployments of ML workflows on Kubernetes simple, portable and scalable.
  6. Cortex: Machine learning model serving infrastructure.
  7. Seldon.io: Take your ML projects from POC to production with maximum efficiency and minimal risk.
  8. BentoML: Open-source platform for high-performance ML model serving.

Resources:

  1. Hidden Technical Debt in Machine Learning Systems (nips.cc)
  2. https://docs.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model
  3. https://arxiv.org/ftp/arxiv/papers/2010/2010.02013.pdf
  4. https://github.com/kelvins/awesome-mlops

--

--

Abhishek Dabas

Masters Student | Machine Learning | Artificial Intelligence | Causal Inference | Data Bias | Twitter: @adabhishekdabas