CI/CD Pipelines for Machine Learning Solutions in AWS

All machine learning projects within AWS begin in a development environment – usually SageMaker Studio notebooks, or Glue notebooks (when a cluster is needed during PySpark development). Both environments support CodeCommit for source control. Suppose we developed an end-to-end data science workflow in a Jupyter notebook using a dataset extracted from S3, Redshift, Aurora, orContinue reading “CI/CD Pipelines for Machine Learning Solutions in AWS”

Infrastructure-as-Code for Machine Learning Pipelines in AWS

We all start our AWS journey in the console. We do everything there. We manually create and configure Lambda functions, Step Functions, IAM roles, S3 buckets, EMR clusters, and any other service we need as we implement a machine learning solution. We use source control for occasional commits to the dev branch to keep trackContinue reading “Infrastructure-as-Code for Machine Learning Pipelines in AWS”

Drift Monitoring for Machine Learning Models in AWS

We have trained a machine learning model that meets or exceeds performance metrics as a function of business requirements. We have deployed this model to production after converting our Jupyter notebook into a scalable end-to-end training pipeline, including CI/CD and infrastructure-as-code. This deployment could be a SageMaker endpoint for live inference, or a Lambda functionContinue reading “Drift Monitoring for Machine Learning Models in AWS”

Delta Lake for Machine Learning Pipelines in AWS

Machine learning pipelines begin with data extraction – whether training or inference. After all, we need a dataset to begin any ML workflow. Most of us begin by querying OLTP/OLAP tables from an on-premises relational database, such as SQL Server. When our query completes, we save the results locally as CSV and then upload theContinue reading “Delta Lake for Machine Learning Pipelines in AWS”

How To Scope Out A Dataset From Scratch (Enterprise ML)

Every machine learning solution requires a dataset that encapsulates the business problem to be solved. A machine learning system will ingest this dataset, learn its complex patterns/relationships, and output a set of business predictions that help solve a specific business problem. This sounds great, but how do you acquire this dataset?

AI/ML Product Management Fundamentals

How do you build products that leverage machine learning? Machine learning is using data to answer valuable business questions. Answering these business questions should lead to the creation of tangible business value. This could be increased revenue, decreased costs, increased retention rate, increased operational efficiency, etc. Therefore, always focus on the business impact of artificial intelligence whenContinue reading “AI/ML Product Management Fundamentals”

Machine Learning Product Success Metrics

When you are building an AI/ML product, it’s paramount that you define clear success metrics from the beginning. These metrics will help guide the AI product development lifecycle and ensure that your team converges on the right product that solves business problems/user needs. There are two ways to assess AI/ML product success: 1) Business outcomesContinue reading “Machine Learning Product Success Metrics”

How To Identify Unknown Features In Machine Learning

What is a feature in machine learning? A feature is a measurable property or characteristic of an event you want to predict. But, what happens if you have missing or unknown features for the event you want to predict? What if these features are crucially important to make accurate predictions? Let’s look at a concreteContinue reading “How To Identify Unknown Features In Machine Learning”

Bias In Artificial Intelligence

You may have heard the term “bias” in artificial intelligence. It usually refers to machine learning algorithms that make biased predictions. Biased predictions are a sign of underperforming machine learning models that were not trained with the proper datasets. Most people know that the performance of a machine learning model is directly proportional to theContinue reading “Bias In Artificial Intelligence”

The Most Important Element Of AI Adoption

What is the most important element that will determine the success or failure of an AI/ML project? Most people, including technical professionals in the field, would think it’s the datasets: Quality, quantity, and a data engineering pipeline to produce them. This is because machine learning algorithms perform only as good as the data used toContinue reading “The Most Important Element Of AI Adoption”