When it comes to creating business value, what is the real difference between data science and machine learning engineering?
Data science helps answer a specific question, such as “Why do we have an X% customer churn rate month over month?”
This is highly valuable because data scientists help shed light into the root cause of a business problem, as well as propose potential solutions. However, it only yields a one-time result: The answer to the question at one point in time.
On the other hand, machine learning engineering builds a production system of ongoing answers to recurring questions at scale, such as “What are the top N product recommendations for a given user?”
This is especially important when the answers to a recurring question may be changing/evolving over time (see “Drift Monitoring for ML Models” to learn more).
Data science and ML engineering are not mutually exclusive; they are sequential. Data science first (foundation in dev), ML engineering second (scale in prod).
Top data scientists follow a methodology to systematically answer business questions. These are common steps:
- Break down the business problem/question into its components
- Identify the components that can be materially impacted and yield the highest ROI within a reasonable time frame
- Develop hypotheses for root-cause analysis
- Perform data analysis to quantify components and validate/invalidate hypotheses (this is usually a combination of SQL queries and domain expert interviews)
- Refine hypotheses and determine whether machine learning is the right solution to help address/solve the problem
- Convert the core business question into a machine learning question and execute end-to-end ML workflow in a dev environment
- Present results, conclusions, and next steps to business leaders and stakeholders (statistics, plots, [engineered] features with the highest SHAP values, solution recommendations, etc.)
If the business question is recurring at scale, machine learning engineers will then take the output from data scientists (typically an end-to-end Jupyter notebook) and convert it into a scalable production software solution. These are common elements that ML engineers work on:
- Scalability, Extensibility, Modularity, & Testability
- Consistency & Reproducibility
- Logging & Monitoring
- [Serverless] Microservices Architecture
- Infrastructure-As-Code & Configuration Management
- Continuous Integration & Continuous Deployment/Delivery (CI/CD)
- Versioning & Rollback
- Fault Tolerance & Failure Recovery
- Containerization & Container Orchestration
- Model Deployment, Model Drift Monitoring, Model [Re]Training, Model Evaluation, Model Explainability
- Multi-Model Management & Model Registry
- Pipeline Metadata Management
- Experiment Tracking & Management
- Feature Store
Clearly, ML engineering = software engineering with a focus on ML (see “Why Software Engineering Is King In Enterprise ML & DE” to learn more).
Machine learning engineering extracts maximum business value from data science by realizing its full potential in production.
Data science and machine learning engineering are both important for an organization. Both teams work together to achieve a business outcome, aligned on the same mission.
When allocating headcount budget, ask yourself “Do we need data scientists or machine learning engineers?” based on product management’s top business priorities for DS/ML and the lifecycle stage of each project. Work with your tech lead to gain additional clarity and verify the teams’ needs (sometimes they answer may be “we need more data engineers”).
As a rule of thumb, have 2 machine learning engineers for every 1 data scientist in your team. Alternatively, build a team of “full-stack” ML engineers where they can take either DS or MLE stories/tasks based on priorities. My team loves the latter because there is constant variety in the work and skill acquisition.
What do you see as the biggest difference between data science and ML engineering? Comment below!
If you need help implementing AWS Well-Architected production machine learning solutions, training/inference pipelines, MLOps, or if you would like us to review your solution architecture and provide feedback, contact us or send me a message and we will be happy to help you.
Written by Carlos Lara, Director of Data Science & Machine Learning Engineering
Follow Carlos on LinkedIn: https://www.linkedin.com/in/carloslaraai/