Featured

How To Deploy Lambda Functions As Docker Containers Through CI/CD

How do you deploy Lambda functions as Docker containers through CI/CD?

No alt text provided for this image

CloudFormation provides us two options for Lambda deployments:

  1. Zip the code, copy it to S3, and pass in the S3 path into the CF template
  2. Containerize the code, push it to Elastic Container Registry (ECR), and pass in the ECR image URI into the CF template
Continue reading “How To Deploy Lambda Functions As Docker Containers Through CI/CD”
Featured

Infrastructure-as-Code for Machine Learning Pipelines in AWS

We all start our AWS journey in the console. We do everything there.

We manually create and configure Lambda functions, Step Functions, IAM roles, S3 buckets, EMR clusters, and any other service we need as we implement a machine learning solution.

Continue reading “Infrastructure-as-Code for Machine Learning Pipelines in AWS”
Featured

Drift Monitoring for Machine Learning Models in AWS

We have trained a machine learning model that meets or exceeds performance metrics as a function of business requirements.

We have deployed this model to production after converting our Jupyter notebook into a scalable end-to-end training pipeline, including CI/CD and infrastructure-as-code.

Continue reading “Drift Monitoring for Machine Learning Models in AWS”

3 Degrees of Automation for Production Machine Learning Solutions

Have you released a machine learning solution to production, only to find yourself pulling KPI metrics manually every day to keep updating stakeholders on results?

Or, have you found yourself manually updating Lambda code in the AWS console to quickly fix a production bug a few hours into the release?

Both of these common scenarios represent examples of the degrees of automation for production machine learning solutions.

The dream is to have fully automated end-to-end ML solutions requiring minimal (if any) developer intervention throughout the course of operations. This is a core principle of the AWS Well-Architected Framework’s Operational Excellence pillar.

Continue reading “3 Degrees of Automation for Production Machine Learning Solutions”

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

“Should we use Kubernetes or go serverless first for new software solutions?”

This is a common question among technology teams across the world. Based on a recent LinkedIn survey, the answer seems to be an event split between the two approaches, with most people flexible based on the project.

Common arguments in favor of Kubernetes include portability, scalability, low latency, low cost, open-source support, and DevOps maturity.

Common arguments in favor of serverless include simplicity, maintainability, shorter lead times, developer experience, talent / skill set availability, native integration with other cloud services, and existing commitment to the cloud.

Is there a way to combine the best of both worlds and create cloud-native, serverless container-based solutions?

Continue reading “How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate”

5 Pillars of Architecture Design for Production ML Software Solutions

Creating a machine learning software system is like constructing a building. If the foundation is not solid, structural problems can undermine the integrity and function of the building.

MLOps considerations, such as systematically building, training, deploying, and monitoring machine learning models, are only a subset of all the elements required for end-to-end production software solutions.

This is because a machine learning model is not deployed to production in a vacuum. It is integrated within a larger software application, which itself is integrated within a larger business process with the goal of achieving specific business outcomes.

Continue reading “5 Pillars of Architecture Design for Production ML Software Solutions”

Lifecycle of ML Model Deployments to Production

What does it mean to deploy a machine learning model to production?

As technology leaders, we invest in data science and machine learning engineering to improve the performance of the organization.

Fundamentally, we are solving business problems systematically through data-driven technology solutions. This is especially true when the problem is recurring at scale and must be addressed continuously.

Continue reading “Lifecycle of ML Model Deployments to Production”

Custom ML Model Evaluation For Production Deployments

My team and I built a cloud-native recommender system that matches open jobs and people who are looking for work.

We trained machine learning models to power the system, following the tried-and-true process:

  1. Set up an end-to-end data science workflow in a Jupyter notebook
  2. Use domain knowledge to create the feature space through feature engineering
  3. Train parallel models through hyperparameter tuning jobs
  4. Evaluate final model performance on the holdout test set using the appropriate objective metric
  5. Convert the Jupyter notebook into scalable training pipeline components within a serverless microservice architecture
  6. Deploy the solution using infrastructure-as-code through a modular CI/CD pipeline
  7. Monitor model performance on production traffic
Continue reading “Custom ML Model Evaluation For Production Deployments”

Microservice Architecture for Machine Learning Solutions in AWS

Why adopt a microservice strategy when building production machine learning solutions?

Suppose your data science team produced an end-to-end Jupyter notebook, culminating in a trained machine learning model. This model meets performance KPIs in a development environment, and the next logical step is to deploy it in a production environment to maximize its business value.

We all go through this transition as we take machine learning projects from research to production. This is typically a hand-off from a data scientist to a machine learning engineer, although in my team it’s the same, properly trained, full-stack ML engineer.

Continue reading “Microservice Architecture for Machine Learning Solutions in AWS”

Shadow Deployments of Machine Learning Models in AWS

From a business leadership standpoint, it always feels risky to deploy a new machine learning model within a production application.

  • “What if the model makes wrong predictions, thereby affecting the stable business operations?”
  • “Will our users be negatively impacted by inaccurate model predictions?”
  • “How do we minimize the revenue impact of false positives or false negatives in production?”

These are fair questions, and it’s our job to address them, have a plan to minimize risk, and give our business leaders and stakeholders confidence.

Continue reading “Shadow Deployments of Machine Learning Models in AWS”

Monitoring & Reliability of Production ML Workloads in AWS

My team and I released a new machine learning solution for our users this week.

There is nothing more exciting than seeing all our business KPIs exceed targets. After all, business value is the reason we build, deploy, and scale ML solutions.

Continue reading “Monitoring & Reliability of Production ML Workloads in AWS”