These activities are an investment that yields business value… streamlining work and rapidly delivering value to customers...
Machine learning operations (MLOps) are the formal processes and requirements that govern activities within a data science project and facilitate its success and MLOps project management enables a team to motivate the project, define the problem, scope the work, and formalize communication. Efficient project execution, the focus of this article, reduces or eliminates common project failure modes by leveraging workflow, development, testing, and experimentation activities.
- Project workflows act as guardrails to help plan, organize, and implement data science projects.
- Project development structures the sequence of steps to generate software components.
- Project testing ensures code, data, and models perform as expected.
- Project experimentation verifies expected outcomes and discovers areas for improvement.
Project Workflow for MLOps
Implementing an MLOps workflow produces the framework to rapidly improve team and code efficiency, which benefits both team members and stakeholders. Team members are aided by the workflow because it forces code to be modularized and replicable to support changes in project requirements. The workflow benefits stakeholders by allowing them to provide continual project evaluation and team collaboration.
Initially, workflows validate data before subsequent training and prediction tasks, which requires that data vendors be carefully selected by coordinating with relevant internal departments such as procurement, legal, and IT. Additionally, external data sources should be thoroughly investigated for accuracy, cost, security, etc. Streamlining data evaluation and ingestion improves the validation process and helps to expedite subsequent workflow steps.
Automate training jobs
A robust data engineering pipeline needs to be incorporated into a model training pipeline to ensure that feature creation, transformation, and manipulation are consistent and concise. Additionally, a model training pipeline should assess different algorithms and their associated parameters to determine the best performing model.
Data science projects often require many experiments to obtain a target model for production. To track these experiments properly, all software and hardware dependencies must be recorded. The data and models within experiments can be managed by using a feature store and model registry, respectively, which ensures they can be used for additional experimentation and eventually transitioned to a production environment. Finally, the downstream consumption of results must be measured to understand how and where they are being used to support key performance indicators (KPIs).
Project Development for MLOps
Data science development creates and assembles all software components while enforcing proper guidelines for code style and best practices. The development team is encouraged to follow cost and resource guidelines to ensure high-quality software is produced while remaining on time and on budget. Integrating security assessments as a part of the development process reduces the risk and maintenance burden for downstream integration, testing, deployment, monitoring, and maintenance activities and is critical for long-term project success.
Select relevant tools
Development activities should utilize tools that support the project definition and scope. Pre-determining which tools are permitted by the production environment, customer, or industry before development commences helps limit the possibility for future integration failures. In addition, it is beneficial for teams to establish standard criteria when evaluating and selecting tools to reduce the time spent between planning and development, and to streamline the transition of team members between projects. Tool-selection criteria should include prioritizing flexible tools that can support changes to project requirements.
Create maintainable code
It is imperative that team members create maintainable code during development to aid the response to project requirement changes. Using identical and reproducible technical components in development and production environments with containers or virtual machines helps prevent introducing system-level bugs. Additionally, leveraging language-specific best practices for project structure, syntax, code comments/docstrings, etc. assists with code reviews, personnel changes, and refactoring.
Ensure data availability and accuracy
A prerequisite for successful code development is to ensure data availability and accuracy. Providing the ability to view and extract metadata allows developers to test their code with different variations of data inputs. Furthermore, storing and versioning artifacts from data processing in a feature store provides both developers and data scientists the ability to leverage identical, validated inputs, which helps verify downstream results. Data availability should be limited by enabling proper access controls to allow datasets to be reused securely.
Leverage source control
All data science projects should leverage source control to version code, data, and model artifacts, which ensures the versions of all dependencies are identical in development and production environments. Backing up data and deploying models as a part of the CI/CD pipeline must be accomplished prior to deployment. In addition, project managers should instill the importance of properly maintaining all project artifacts to team members.
Project Testing for MLOps
Testing data science components instills team members and stakeholders with confidence that the project can achieve the target business objectives. Testing also enhances the quality of the code and the results produced during the project. Critically, testing resolves most errors before code is introduced to a production environment.
Write unit tests
Writing robust unit tests is the first step to ensure a data science solution is thoroughly tested. Implementing a variety of error-detection tests for data-validation methods establishes a high benchmark for data quality, which enhances all downstream processing. Testing feature-creation methods on a representative sample of input data and verifying that model training and inferencing methods support edge cases help ensure deployed models are reliable and will perform well.
Perform integration and stress tests
Integration and stress tests should be performed on the pipeline before transitioning code or models to production to confirm they can scale for additional data and models. The team must ensure every stage of the pipeline is tested individually and collectively. In addition, it is necessary to determine the computation requirements for model training and inferencing for a variety of input sizes. Manually reviewing results once a data science pipeline is established in a production environment allows the project team to gauge if the automated testing is sufficient.
Models need to be validated before transitioning them into production by evaluating their relevance and correctness with sample input data. It is also important to test for model staleness in production, which requires constantly monitoring their bias, variance, and drift. Non-functional requirements such as security, fairness, and interpretability are additional testing considerations.
Project Experimentation for MLOps
Experimentation is a critical activity in data science projects that tests hypotheses, challenges assumptions, and compiles performance metrics to determine how well models align with business objectives. As a precursor to transitioning models to production, these activities identify model infrastructure requirements such as performance and scalability. Also, experiments serve to gauge the accuracy and interpretability of model predictions by using real-world production data.
Active data science projects likely have existing algorithms that should first be evaluated to determine a basis for experimentation. Algorithm complexity is a primary element to explore; it is important to build simple models first to avoid models that are too intricate for the target use case or not allowed for compliance reasons. Another essential experiment is to select and tune hyperparameters for existing algorithms to ensure they operate with the best configuration possible. Incorporating KPI targets when evaluating model performance metrics keeps experiments focused on improving business objectives.
Explore and compare
Existing data sources or algorithms can become degraded or obsolete, which necessitates exploring alternative solutions. It is important to compare model performance with different combinations of input features when evaluating new algorithms. In addition, investigating ensemble learning can enhance performance metrics by combining the best aspects of multiple models.
Share insights and risks
Although conducting initial experiments is important to determine a good algorithm, the overall goal of experimentation is to compare approaches continuously to ascertain the best one. This can be accomplished by setting a cadence for delivering insights so that results are on-time and fulfill business objectives. To reduce bottlenecks and achieve better solutions, experimenters should share insights early and often, offer feedback, spark new ideas, and gauge progress. Finally, it is essential to evaluate the operational risks associated with promoting the workflow to production if experimentation yields beneficial results.
Investing in Efficiency
Efficient data science project execution requires proper implementation of workflows, development, testing, and experimentation. Workflows validate data before training and predicting tasks, automate training jobs, and track experiments. Development activities utilize tools that support the project definition and scope, create maintainable code, ensure data availability and accuracy, and implement source control to version code, data, and model artifacts. Testing requires writing comprehensive unit tests, testing for model staleness and correctness, and performing integration and stress tests on the pipeline to ensure they can scale for additional data and models. Experimentation evaluates the existing algorithms, explores alternative data sources and algorithms, and compares approaches to ascertain the best one. Together, these activities are an investment in reliable and established processes, which yields business value by enabling a project team to streamline work and rapidly deliver value to customers.
Benjamin Johnson, Ph.D.
DevIQ, May 2023
This article is part of DevIQ's series on MLOps. Continue exploring this topic: