Lifecycle of ML Model Deployments to Production

What does it mean to deploy a machine learning model to production?

As technology leaders, we invest in data science and machine learning engineering to improve the performance of the organization.

Fundamentally, we are solving business problems systematically through data-driven technology solutions. This is especially true when the problem is recurring at scale and must be addressed continuously.

Common examples are the problems of maximizing conversion rates, minimizing customer churn, maximizing gross margins, and many more. As data scientists, we set out to help solve these problems through various forms of analytics.

For a given business KPI, we create an end-to-end Jupyter notebook that culminates with a trained machine learning model. We then create a PowerPoint presentation for our stakeholders with a snapshot of predictions, explainability reports, insights, and suggested next steps.

Did we actually solve the business problem?

No, because it’s not enough to produce a single prediction at one point in time. We must build an automated system that returns accurate predictions continuously over time: Today, tomorrow, and every day thereafter.

Furthermore, even if the initial predictions and correlations are accurate, there is no guarantee they will hold next month (this is the known problem of drift, which we will explore further in future articles). Therefore, this ML system must also adapt as the business evolves over time.

Building and deploying the entire ML system to a production environment is the true definition of a successful model deployment.

This production deployment of a machine learning model happens in 4 major stages:

  1. Build and train an accurate ML model
  2. Build, test, and deploy an entire training pipeline infrastructure to a production environment through CI/CD
  3. Build, test, and deploy an entire inference pipeline infrastructure to a production environment through CI/CD
  4. Build, test, and deploy a custom software application that utilizes the inferences in specific ways within the larger corporate IT prod environment (this is the actual solution to the business problem)

Let’s look at a real example from my team.

We began by training a machine learning model to predict conversion rates. The mean average precision (MAP) objective metric met requirements by exceeding the baseline human performance, SHAP clearly explained model predictions, and we decided to deploy it to production.

This initial success marked completion of stage 1.

For the second stage, we built a serverless microservice training pipeline infrastructure in AWS and deployed it to a production environment through a cross-account CI/CD pipeline:

No alt text provided for this image

As you can see from the solution architecture diagram, the core training pipeline comes with a drift monitoring workflow. This Step Function is invoked once per day to monitor concept drift, data drift, latent drift, and trigger re-training if needed. We also display the rolling metrics in a separate QuickSight dashboard.

Note the use of CloudFormation for defining infrastructure-as-code. There is no manual “ClickOps” in the console for the production deployment. It’s all fully automated DevOps, and any changes will only make it to prod through git commits & CI/CD.

Once the entire training pipeline for the ML model landed in the production environment, we proceeded to build an inference pipeline for stage three:

No alt text provided for this image

You can see how the inference pipeline utilizes the machine learning model deployed as an endpoint to generate inferences concurrently. A training pipeline and an inference pipeline work together: The former trains and monitors the model, and the latter utilizes the model.

The inference pipeline will request predictions from the deployed model, whether real-time or batch transform, in response to events or schedules.

Once the entire inference pipeline was actively generating inferences in the production environment, we proceeded to build a custom software application for stage four:

No alt text provided for this image

This is a special application of a job recommendations user experience that texts “the job of the day” to distinct user cohorts. The business value of this application is increasing user engagement and conversion rates for filling jobs.

This final solution also comes with immutable event logging to a Hive-style partitioned S3 bucket. A separate Step Function is invoked periodically to crawl the paths, create tables for Athena queries, and visualize the production metrics

Going from stage 1 to stage 4 for this project took months. It was hard and deeply fulfilling to see through a true model deployment to production.

It’s an amazing feeling seeing improved conversion rates increase, increased revenue, and other KPIs moving in the desired direction. This is the whole point of investing in data science and machine learning engineering.

And this is only the beginning – even for this one ML model. There are multiple possible business applications of the same underlying model. Texting the job of the day is only one of them, and an entire roadmap has emerged as a result.

We recommend choosing ML projects where the trained models may have multiple inference applications. I always ask myself, “What are the 20% of ML models that will produce 80% of the business value?”

How do you and your teams build enterprise ML solutions? What is the project lifecycle from research to production?

Let us know in the comments 👇🏻

One thought on “Lifecycle of ML Model Deployments to Production

Leave a Reply

%d bloggers like this: