Why adopt a microservice strategy when building production machine learning solutions?
Suppose your data science team produced an end-to-end Jupyter notebook, culminating in a trained machine learning model. This model meets performance KPIs in a development environment, and the next logical step is to deploy it in a production environment to maximize its business value.
We all go through this transition as we take machine learning projects from research to production. This is typically a hand-off from a data scientist to a machine learning engineer, although in my team it’s the same, properly trained, full-stack ML engineer.
The most direct approach is to convert all Pandas code into PySpark so that it can handle any size dataset. But that can still be a single PySpark script, running on an EMR cluster or serverless in a Glue ETL job.
Inevitably, a machine learning engineer will git commit a new change to the script that will introduce a bug into the workflow, leading to failed execution. Testing will not always catch every single bug, especially in ML pipelines where bugs are often subtle and insidious.
Given that it’s one large script containing the code for each step in the ML workflow, it may not be immediately clear why the error took place. As the code base grows, it becomes increasingly challenging and time-consuming to debug.
Furthermore, if this solution is active in production, the change broke the entire workload. We now have a production system down while we identify the root cause and fix it.
To ensure changes to a machine learning pipeline are introduced with minimal or no interruption to the existing workload in production, adopt a microservice instead of a monolithic architecture.
This approach replaces one large resource with multiple small resources and helps reduce the impact of a single failure on the overall workload.
Service-oriented architecture (SOA) is the practice of making software components reusable using service interfaces. Instead of building a monolithic application, where all functionality is contained in a single runtime, the application is instead broken into separate components.
Microservices extend this by making components that are single-purpose and reusable. When building your architecture, divide components along business boundaries or logical domains. For ML training pipelines, these components may include:
- Data Extraction
- Data Validation
- Data Transformation
- Model Training
- Model Evaluation
- Model Explainability
- Model Deployment
A microservice strategy enables distributed development, improves scalability, and enables easier change management. It also enables modular production deployments per individual component through CI/CD versus all-or-nothing deployments of the entire solution every time code changes (see “Modular Deployments in AWS Cross-Account CI/CD Pipelines” to learn more).
Two popular approaches for developing serverless microservices in AWS using Docker containers are Lambda and Fargate. Lambda functions are meant for lightweight components with small to moderate memory requirements and < 15-minute runtime. If your containers are heavier in memory and runtime, run them as ECS Fargate tasks.
If you are dealing with big data, but do not want to build and maintain your own Spark containers, Glue is an excellent alternative. You can mix and match services to produce the suite of serverless microservices for your ML solution, then orchestrate their execution using Step Functions.
This is a sample architecture of how my team and I build serverless microservice training pipelines, and how changes flow from git commits to production updates:
This architecture gave us the ability to extend, test, and deploy changes to pipeline components individually, frequently in small batches. Specialized error handling per component allows us to catch errors on a small scale, address them quickly, and fix them efficiently.
Inference pipelines – the custom software solutions that utilize the trained models – are also built out of microservices. Our culture is serverless first, and we leverage it whenever possible. Managing our own EC2 instances or compute layer is always a last resort for us, but you can definitely build solutions out of traditional, self-managed microservices if needed.
What is your approach from taking a machine learning project from research to production? Do you adopt a microservices architecture for ML pipelines? Let us know in the comments!
If you need help implementing cloud-native MLOps, Well-Architected production ML software solutions, training/inference pipelines, monetizing your ML models in production, have specific solution architecture questions, or would just like us to review your solution architecture and provide feedback based on your goals, contact us or send me a message and we will be happy to help you.
Subscribe to this blog: https://gradientgroup.ai/blog/
Follow me on LinkedIn: https://linkedin.com/in/carloslaraai