How To Deploy Lambda Functions As Docker Containers Through CI/CD

How do you deploy Lambda functions as Docker containers through CI/CD?

No alt text provided for this image

CloudFormation provides us two options for Lambda deployments:

  1. Zip the code, copy it to S3, and pass in the S3 path into the CF template
  2. Containerize the code, push it to Elastic Container Registry (ECR), and pass in the ECR image URI into the CF template
Continue reading “How To Deploy Lambda Functions As Docker Containers Through CI/CD”

Infrastructure-as-Code for Machine Learning Pipelines in AWS

We all start our AWS journey in the console. We do everything there.

We manually create and configure Lambda functions, Step Functions, IAM roles, S3 buckets, EMR clusters, and any other service we need as we implement a machine learning solution.

Continue reading “Infrastructure-as-Code for Machine Learning Pipelines in AWS”

Drift Monitoring for Machine Learning Models in AWS

We have trained a machine learning model that meets or exceeds performance metrics as a function of business requirements.

We have deployed this model to production after converting our Jupyter notebook into a scalable end-to-end training pipeline, including CI/CD and infrastructure-as-code.

Continue reading “Drift Monitoring for Machine Learning Models in AWS”

Continuous Training of Machine Learning Models in Production

Is continuous training (CT) a machine learning operations (MLOps) best practice?

It depends on what we mean by CT.

Suppose it means continuously invoking training pipelines to ensure models ‘stay fresh’ as new production data lands in the data lake.

The training pipeline workflow could be executed automatically on schedule once per month, once per week, once per day, multiple times per day, or continuously in a loop.

This is one solution to prevent stale prod models, but there is a cost to run ML training pipelines. The more frequently we run them, the higher the cumulative sum of cost in a given time window.

Imagine continuously provisioning Spark clusters, EC2 instances, running massive hyperparameter tuning (HPT) jobs, logging metadata to S3 or DynamoDB, and more – plus the total resource utilization run time.

This form of ‘naive’ continuous training is an AWS Well-Architected Framework anti-pattern, specifically from a Cost Optimization pillar standpoint.

Is there a better way?

Consider event-driven ‘discrete’ (re)training in response to data/concept drift events:

No alt text provided for this image

My team and I deploy machine learning models to a production environment through CI/CD as CloudFormation stacks (check out our article on infrastructure-as-code to learn more). We deploy two Step Function serverless workflows for each model moving to prod, one for the main training pipeline and one for drift monitoring.

The drift monitoring Step Function is invoked once per day per production model. It extracts an inference dataset from our data lake containing the latest records with ground truth and proceeds to batch transform inference.

Next, we slice this ‘drift dataset’ and perform a weighted sum evaluation based on prod model performance on each slice (check out our article titled “Custom ML Model Evaluation For Production Deployments” to learn more).

If the final drift dataset score is X% below the production model’s score, we automatically invoke the training pipeline Step Function for re-training. This training pipeline execution may or may not yield a better model, but we have done our job in monitoring, measuring, and re-training with purpose.

Why do we monitor drift on a batch basis once per day?

In our domain, analytics reveals that drift never happens faster than daily or weekly. Fine tune drift monitoring frequency based on your domain because running it frequently also comes with a cost.

This intentional, purpose-driven (re)training minimizes CT cost and sheds light into the nature of the prediction problem’s domain.

In some domains, drift and re-training may happen on a daily frequency. In others, it may be once per week. In some cases, once per month or longer. Or, as my team and I learned, there could also be drift seasonality where it’s faster at certain times of the year and slower in others.

All these learnings led us to evolve from schedule-based continuous training to event-driven discrete training for all production ML models.

This article is meant to provide a different perspective on continuous training and various best practices to consider. We recommend adopting the approach that works best for you, your team, and the specific prediction problems you are solving.

Do you prefer schedule-based continuous training or event-driven discrete training for production ML models? Let us know in the comments!

Subscribe to our weekly LinkedIn newsletter: Machine Learning In Production

Reach out if you need help:

  • Maximizing the business value of your data to improve core business KPIs
  • Deploying & monetizing your ML models in production
  • Building Well-Architected production ML software solutions
  • Implementing cloud-native MLOps
  • Training your teams to systematically take models from research to production
  • Identifying new DS/ML opportunities in your company or taking existing projects to the next level

Would you like me to speak at your event? Email me at info@carloslaraai.com

Follow our blog: https://gradientgroup.ai/blog/

Unit Testing Data Validation Microservices for Production ML Pipelines

Unit testing is a vital element of production software engineering. After all, how do we know for sure that our code always returns the expected result regardless of input?

Unit testing is especially important in production machine learning because model training and pre-processing functions do not always throw exceptions when they should. Instead, the errors are often “absorbed” and ML pipeline execution seems to complete successfully.

For example, if a neural network layer expects a 3D tensor as input, and we have a bug where two of the dimensions are in the wrong order, training will complete “successfully” and yield a reasonable model (sad but true according to Andrej Karpathy, Senior Director of AI at Tesla).

Or, if an entire feature column comes in null due to a broken data engineering pipeline, df.fillna(…) will naively handle it without us knowing about the root problem. Validating upstream data quality systematically is paramount before deploying machine learning models to production.

In our previous article, we covered how to validate upstream data quality as the 2nd step of a machine learning training pipeline. This Validator microservice is essential before proceeding to the Transformer microservice for pre-processing and feature engineering:

No alt text provided for this image

As we write data validation functions, a best practice is to structure the code so that we can unit test each function independently, incrementally as we go.

The data validation microservice is special in the sense that each function contains assert statements to validate upstream data quality. However, these assert statements are not unit tests. Real unit testing requires several test cases per function to verify different inputs produce success or failure as expected.

These are the steps my team and I follow to unit test each data validation function:

  1. Make it succeed with the expected input dataset quality
  2. Brainstorm the most important test cases where we would want the data validation microservice to fail and notify upstream data engineers
  3. Make it fail correctly with a “low quality” input dataset for each test case to confirm the function does catch the data quality issues as expected
  4. Expand the unit test suite incrementally as new errors or edge case bugs appear in production (similar to the incremental improvement of error handling workflows)

We begin by extracting a high quality training dataset snapshot from our transactional data lake and subsampling a relatively small number of records:

df = dataset.extract()

test_df = df.sample(withReplacement=True, fraction=0.01, seed=1)

baseline_counts = test_df.count())

Next, we add a few test cases per data validation function (showing a subset for simplicity). Each test case is a “fake” PySpark DataFrame that has been transformed to lower data quality for the given test:

data_type_error = test_df.withColumn('StringFeature', df.StringFeature.cast(IntegerType())

missing_column_error = test_df.drop('StringFeature')

extra_column_error = test_df.withColumn('ExtraBaseFeature', f.lit(10))

different_order = test_df.select("Label","Feature_2","Feature_3", "Feature_4", ... , "Feature_1")

schema_test_cases = {
    'data_type_error': data_type_error,
    'missing_column_error': missing_column_error,
    'extra_column_error': extra_column_error,
    'different_order': different_order

constant_column_error = test_df.withColumn('Label', f.lit(1))
categorical_column_error = test_df.withColumn('StringFeature', f.lit('WA'))

useful_test_cases = {
    'constant_column_error': constant_column_error,
    'categorical_column_error': categorical_column_error

empty_num_col_error = test_df.withColumn('DecimalFeature', f.lit(None).cast(StringType()))

empty_array_col_error = test_df.withColumn('ArrayFeature', f.lit(None).cast(StringType()))

empty_test_cases = {
    'empty_num_col_error': empty_num_col_error,
    'empty_array_col_error': empty_array_col_error

negative_decimal_error = test_df.withColumn('DecimalFeature', f.abs(f.rand(seed=1)) * -10.0)

negative_int_error = test_df.withColumn('IntegerFeature', f.round(f.abs(f.rand(seed=1)) * -10.0, 0).cast(IntegerType()))

invalid_dates_error = test_df.withColumn('DateFeature', f.date_sub(f.col('DateFeature'), 3650))

invalid_labels_error = test_df.withColumn('Label', f.round(f.abs(f.rand(seed=1)) * 10.0, 0).cast(IntegerType()))

numeric_test_cases = {
    'negative_decimal_error': negative_decimal_error,
    'negative_int_error': negative_int_error,
    'invalid_dates_error': invalid_dates_error,
    'invalid_labels_error': invalid_labels_error

Finally, we execute all unit tests per data validation function and verify they all “fail correctly”:

print("Verifying input dataset schema...\n")

for name, test_case_df in schema_test_cases.items():
        print(f'verify_input_schema correctly failed with {name}')

print("Verifying useful columns...\n")

for name, test_case_df in useful_test_cases.items():
        print(f'verify_useful_columns correctly failed with {name}')

print("Verifying no empty columns...\n")

for name, test_case_df in empty_test_cases.items():
        verify_no_empty_columns(test_case_df, baseline_counts)
        print(f'verify_no_empty_columns correctly failed with {name}')

print("Verifying numeric feature ranges...\n")

for name, test_case_df in numeric_test_cases.items():
        verify_numeric_features(test_case_df, baseline_counts)
        print(f'verify_numeric_features correctly failed with {name}'))

We run unit tests for a given ML pipeline microservice during the test phase of our cross-account CI/CD pipeline, based on git commit events. The build phase containerizes the microservice, the test phase runs unit tests & integration testing, and the deploy phase pushes the code to production:

No alt text provided for this image

Make sure each unit test is meaningful and useful. Write as many as possible (within reason) during the brainstorming step and prune as you go. We recommend establishing a unit test suite baseline and iteratively enhancing it as needed.

I like adding unit testing to the acceptance criteria of technical tasks during a 2-week sprint to be implemented naturally during development. This way my team and I develop the habit and continuously release well-tested, high quality code.

We will cover unit testing of more “traditional” functions, such as feature engineering functions and window functions in a future article.

What do you think about this unit testing approach for data validation microservices? Is there anything you would do differently? Let us know in the comments!

Subscribe to our weekly LinkedIn newsletter: Machine Learning In Production

Reach out if you need help:

  • Maximizing the business value of your data to improve core business KPIs
  • Deploying & monetizing your ML models in production
  • Building Well-Architected production ML software solutions
  • Implementing cloud-native MLOps
  • Training your teams to systematically take models from research to production
  • Identifying new DS/ML opportunities in your company or taking existing projects to the next level
  • Anything else we can help you with

Would you like me to speak at your event? Email me at info@carloslaraai.com

Subscribe to our blog: https://gradientgroup.ai/blog/

Testing ML Microservices for Production Deployments

How do we ensure machine learning pipeline components produce the exact result we expect, especially prior to production deployments?

We could sanity check by inspecting a few output records by hand, but how do we know for sure that all output records are correct every time? This manual, stage 1 automation “ClickOps” approach is not scalable, consistent, or reliable.

We covered previously how to convert an end-to-end Jupyter notebook into a serverless microservice architecture by modularizing and decoupling ML pipeline components:

No alt text provided for this image

Before deploying these ML microservices to production, we need a systematic and automated testing process to verify each component works as expected.

Let’s split testing requirements into 2 major categories:

  • Data Validation
  • Unit Testing

We will focus on testing individual ML pipeline components and for now exclude integration testing, end-to-end testing, etc. Furthermore, we will use a training pipeline for this example, but the same concepts apply for inference pipelines.

The first components of a training pipeline are usually:

  1. Data extraction microservice to read data from a data lake
  2. Data validation microservice to identify upstream data issues early
  3. Data transformation microservice for pre-processing and feature engineering (this component’s output becomes the input for the model training microservice)

The first component extracts a training dataset composed of “base” features (which will be used to engineer more predictive features) using PySpark. In general, production training pipelines require distributed computing environments to handle big data.

This training dataset may be extracted from a data lake, a data warehouse, a relational database, by joining data from multiple streaming sources, etc. These upstream data pipelines are built by data engineers, such as this serverless delta lake (check out the AWS YouTube channel’s “This Is My Architecture” series to learn more about it):

No alt text provided for this image

Before proceeding to pre-processing and feature engineering, it’s a good idea to test this training dataset and verify that upstream data pipelines are producing data as expected. As many of us have experienced, broken data pipelines are among the most common sources of bugs in ML pipelines.

Start with a baseline understanding based on domain knowledge and early exploratory data analysis (EDA) of the relevant features. Common questions include:

  • Does the extracted dataset schema match the expected schema?
  • Does each column contain distinct values to allow a machine learning model to learn from it?
  • Do any columns contain all null values?
  • Are numeric feature ranges within expected statistical bounds?
  • Are distinct categorical feature values what we expect, or are there missing or new categories coming in?

This is a sample of what the data validation microservice code might look like in a PySpark script:

def verify_input_schema(df) -> None
        Verify the input dataset's schema matches the expected base feature schema.
    assert df.schema == StructType([
            StructField("Feature_1", IntegerType(), True),
            StructField("Feature_2", IntegerType(), True),
            StructField("Feature_3", IntegerType(), True),
            StructField("Feature_4", DateType(), True),
            StructField("Feature_5", StringType(), True),
            StructField("Feature_6", StringType(), True),
            StructField("Feature_7", StringType(), True),
            StructField("Feature_8", DecimalType(9,2), True),
            StructField("Feature_9", ArrayType(StringType(), True), True),
            StructField("Feature_10", ArrayType(StringType(), True), True),
            StructField("Feature_11", StringType(), True),
            StructField("Feature_12", StringType(), True),
            StructField("Feature_13", ArrayType(StringType(), True), True),
            StructField("Feature_14", DecimalType(9,2), True),
            StructField("Feature_15", DecimalType(9,2), True),
            StructField("Label", IntegerType(), True)
    print("Input schema is correct.\n")

def verify_useful_columns(df) -> None:
        Verify all columns contain distinctive information for ML models.
    output = df.drop(*(col for col in df.columns if df.select(col).distinct().count() == 1))
    assert len(output.columns) == len(df.columns)
    print("All columns contain distinct values.\n")
def verify_no_empty_columns(df, baseline_counts) -> None:
        Verify we get no empty columns in the input dataset.
        A percentage of null values per feature makes more sense based on known statistics, such as 3 standard deviations outside the mean (pending implementation).
    for column in df.columns:
        output_counts = df.filter(df[column].isNull() == True).count()
        assert baseline_counts > output_counts
    print("No empty columns.\n")

def verify_numeric_features(df, baseline_counts):
        Verify numeric feature ranges are valid based on domain knowledge.
        Check if more than 0.5% of rows have negative values (it would indicate an upstream data problem).
        Binary classification label can only have 2 distinct values, 1 and 0, with no nulls.
    numeric_columns = ["Num_Feature_1", "Num_Feature_2", "Num_Feature_3", "Num_Feature_4", "Num_Feature_5", "Num_Feature_6"]
    for column in numeric_columns:
        output_counts = df.filter(f.col(column) < 0).count()
        ratio = output_counts / baseline_counts * 100
        assert ratio < 0.05
    date_columns = ["Date_Feature_1"]
    for date in date_columns:
        output_counts = df.filter(f.col(date) < '2015-01-01').count()
        ratio = output_counts / baseline_counts * 100
        assert ratio < 0.05
    label = "Label"
    label_counts = df.groupBy(label).count()
    assert label_counts.count() == 2
    print("Numeric feature ranges are valid.")

print("Verifying input dataset schema...\n")

print("Verifying useful columns...\n")

print("Verifying no empty columns...\n")
verify_no_empty_columns(test_df, baseline_counts)

print("Verifying numeric feature ranges...\n")
verify_numeric_features(test_df, baseline_counts)

#print("Verifying distinct categorical features...\n")
#verify_categorical_features(df, baseline)

You may use PySpark, TensorFlow Data Validation, or regular Python depending on your use case requirements and preferences. The Data Validation microservice, as well as all other ML pipeline microservices, are usually Docker containers that are built, tested, and deployed independently through CI/CD pipelines.

To illustrate the decoupled nature of the training pipeline microservice architecture, we could replace the AWS Glue Data Validation job with a TensorFlow Data Validation serverless container using ECS Fargate:

No alt text provided for this image

From the code above, if any assertion fails, the data validation microservice fails. This stops the training pipeline execution and no issues make it into production. This is the first systematic quality gate between data engineering and machine learning engineering in the staging environment of the deployment process.

Why would we let the Data Validation component cause the entire training pipeline execution to fail? Couldn’t the pre-processing and feature engineering “Transformer” microservice handle nulls, impute, clip outliers, and more?

The answer is because if there is a fundamental problem with the way the data is generated upstream, naively handling it through pre-processing would not solve the root problem; it would only address a symptom. What’s worse, the training pipeline would probably complete successfully, but yielding an inferior model due to the hidden data quality issues.

The Data Validation microservice is vital for detecting potential upstream data pipeline problems, alerting the data engineering team accordingly, and stopping production deployments until further investigation. We use the learnings from these experiences to improve our data engineering and data validation processes.

The functions above must be unit tested, as well. After all, if we have a data validation function that checks if a column contains only null values, how do we know the function itself always behaves exactly that way?

We will cover unit testing in the next article, specifically for common functions in ML microservices.

Feel free to check out the various links throughout this article to learn more about those topics.

How do you test and validate datasets for machine learning pipelines early in the process before they reach production? Let us know in the comments!

Subscribe to our weekly LinkedIn newsletter: Machine Learning In Production

Reach out if you need help:

  • Maximizing the business value of your data to improve core business KPIs
  • Deploying & monetizing your ML models in production
  • Building Well-Architected production ML software solutions
  • Implementing cloud-native MLOps
  • Training your teams to systematically take models from research to production
  • Identifying new DS/ML opportunities in your company or taking existing projects to the next level
  • Anything else we can help you with

Would you like me to speak at your event? Email me at info@carloslaraai.com

Subscribe to our blog: https://gradientgroup.ai/blog/

How To Drive Revenue Growth Through Production ML Solutions

For any organization, 20% of the AI/ML use cases drive 80% of the business value.

How do we identify this 20%?

Always start with specific business outcomes. Forget about machine learning at the beginning and let such a solution (if any) emerge naturally out of a systematic discovery process.

In 2021, my team and I deployed a machine learning solution to production with the goal of increasing gross revenue above a baseline number. The initial release took place in a relatively small market to prove the business value, learn the operational challenges, fix critical bugs, and validate several hypotheses.

In that initially small market, we increased gross revenue from a $67,000/month baseline to $124,000/month post-release. As we scale, the projected new revenue exceeds $1 million/month. And this is just 1 of our production ML solutions.

This ROI success story is extremely rare today, and we want to show you the process we followed to achieve this business outcome.

1) Identify the company’s key strategic initiatives

Every organization has quarterly and yearly goals defined by the executive leadership team. Goals typically include gross revenue growth, improved profitability, increased customer retention, customer acquisition, etc. KPIs beyond revenue and profit are company-specific.

Find out the top 3-5 KPI objectives and assess the opportunity to improve each one individually. Given the on-demand staffing industry where we connect people and work at scale, my team and I started with gross revenue growth:

“How to increase on-demand gross revenue in the US?”

Note how we frame the KPI choice as a concise and specific business question.

2) Break up the business question into components

No alt text provided for this image

The revenue growth question is too complex to tackle directly. The best practice is to break up the question into its components, quantify them, and identify the areas of highest impact to focus on first. Domain knowledge of the business is vital for proper disaggregation.

Revenue naturally breaks up into supply and demand. The first decision is which side of the marketplace to focus on. To answer this, we ask a key question:

“Are we meeting all existing customer demand?”

In the on-demand staffing industry, we quantify this question through a KPI called fill rate. Upon investigation, my team and I found fill rate was well below 100%. We decided it made the most sense to focus on filling the existing demand. Therefore, we focused on supply (workers) instead of demand (customers).

Within worker supply, the 2 components we could focus on are recruiting & onboarding new workers, or increasing the utilization of existing workers (hours worked per week). We treat capacity per worker as a constant, fixed at 40 hours per week.

3) Develop hypotheses and validate them through data analysis

Before proceeding, we ask another key question:

“What are the worker utilization statistics across US segments?”

If median worker utilization is already close to 100%, it might be a better idea to recruit new workers or increase retention. If it’s well below 100%, we have an opportunity to increase worker engagement to drive increased utilization.

Note that we are addressing gross revenue growth indirectly by focusing on the worker utilization KPI. This makes our efforts more targeted and simpler.

Validating our hypotheses required a combination of SQL data analysis (especially OLTP queries) and subject matter expert (SME) interviews. Ultimately, the OLTP schema is the business truth.

We created a list of initial hypotheses and created corresponding epics/stories to be completed over the next 2-week sprint. Upon completion, we validated:

  • Median worker utilization was well below 100%
  • 80% of the revenue is generated by 20% of the workers (Pareto principle)

After synthesizing all results and aligning with business leaders/stakeholders, we concluded that increasing median associate utilization was the most viable way to drive revenue growth.

4) Determine whether the problem is a candidate for a machine learning solution

The disaggregation diagram above is simplified, but the logic trees went a lot deeper. At those deep levels of details, we collaborated with business and technology teams to brainstorm possible solutions to increase median associate utilization.

After going through steps 4) to 18), we ultimately built a fully serverless machine learning solution in AWS to text the “job of the day” to workers using a recommendations engine:

No alt text provided for this image

Feel free to go back to the beginning of this article to review the business results from this solution. It may all sound easy, but it was a herculean effort spanning several months and teams. On the verge of failure multiple times, we continued pushing through the walls until we succeeded in providing an ROI to the company.

I still find it amazing how we started with a question (“How to increase on-demand gross revenue in the US?”) that had nothing to do with AI/ML, and systematically converged on a production solution generating monthly cash flows.

This was a simplified view of our AI/ML adoption methodology to show you the approach we take for any machine learning solution. Did you find it valuable or insightful? Let us know in the comments!

Feel free to message me if you’d like to learn more about the entire process beyond the 4 initial steps covered here.

Subscribe to our weekly LinkedIn newsletter: Machine Learning In Production

Reach out if you need help:

  • Maximizing the business value of your data to improve core business KPIs
  • Deploying & monetizing your ML models in production
  • Building Well-Architected production ML software solutions
  • Implementing cloud-native MLOps
  • Training your teams to systematically take models from research to production
  • Identifying new DS/ML opportunities in your company or taking existing projects to the next level
  • Anything else we can help you with

Would you like me to speak at your event? Email me at info@carloslaraai.com

Follow our blog: https://gradientgroup.ai/blog/

3 Degrees of Automation for Production Machine Learning Solutions

Have you released a machine learning solution to production, only to find yourself pulling KPI metrics manually every day to keep updating stakeholders on results?

Or, have you found yourself manually updating Lambda code in the AWS console to quickly fix a production bug a few hours into the release?

Both of these common scenarios represent examples of the degrees of automation for production machine learning solutions.

The dream is to have fully automated end-to-end ML solutions requiring minimal (if any) developer intervention throughout the course of operations. This is a core principle of the AWS Well-Architected Framework’s Operational Excellence pillar.

Continue reading “3 Degrees of Automation for Production Machine Learning Solutions”

How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate

“Should we use Kubernetes or go serverless first for new software solutions?”

This is a common question among technology teams across the world. Based on a recent LinkedIn survey, the answer seems to be an event split between the two approaches, with most people flexible based on the project.

Common arguments in favor of Kubernetes include portability, scalability, low latency, low cost, open-source support, and DevOps maturity.

Common arguments in favor of serverless include simplicity, maintainability, shorter lead times, developer experience, talent / skill set availability, native integration with other cloud services, and existing commitment to the cloud.

Is there a way to combine the best of both worlds and create cloud-native, serverless container-based solutions?

Continue reading “How To Deploy Serverless Containers For ML Pipelines Using ECS Fargate”

5 Pillars of Architecture Design for Production ML Software Solutions

Creating a machine learning software system is like constructing a building. If the foundation is not solid, structural problems can undermine the integrity and function of the building.

MLOps considerations, such as systematically building, training, deploying, and monitoring machine learning models, are only a subset of all the elements required for end-to-end production software solutions.

This is because a machine learning model is not deployed to production in a vacuum. It is integrated within a larger software application, which itself is integrated within a larger business process with the goal of achieving specific business outcomes.

Continue reading “5 Pillars of Architecture Design for Production ML Software Solutions”

Lifecycle of ML Model Deployments to Production

What does it mean to deploy a machine learning model to production?

As technology leaders, we invest in data science and machine learning engineering to improve the performance of the organization.

Fundamentally, we are solving business problems systematically through data-driven technology solutions. This is especially true when the problem is recurring at scale and must be addressed continuously.

Continue reading “Lifecycle of ML Model Deployments to Production”

Custom ML Model Evaluation For Production Deployments

My team and I built a cloud-native recommender system that matches open jobs and people who are looking for work.

We trained machine learning models to power the system, following the tried-and-true process:

  1. Set up an end-to-end data science workflow in a Jupyter notebook
  2. Use domain knowledge to create the feature space through feature engineering
  3. Train parallel models through hyperparameter tuning jobs
  4. Evaluate final model performance on the holdout test set using the appropriate objective metric
  5. Convert the Jupyter notebook into scalable training pipeline components within a serverless microservice architecture
  6. Deploy the solution using infrastructure-as-code through a modular CI/CD pipeline
  7. Monitor model performance on production traffic
Continue reading “Custom ML Model Evaluation For Production Deployments”

Microservice Architecture for Machine Learning Solutions in AWS

Why adopt a microservice strategy when building production machine learning solutions?

Suppose your data science team produced an end-to-end Jupyter notebook, culminating in a trained machine learning model. This model meets performance KPIs in a development environment, and the next logical step is to deploy it in a production environment to maximize its business value.

We all go through this transition as we take machine learning projects from research to production. This is typically a hand-off from a data scientist to a machine learning engineer, although in my team it’s the same, properly trained, full-stack ML engineer.

Continue reading “Microservice Architecture for Machine Learning Solutions in AWS”