There are plethora of open source and paid ML tools available in market to implement MLOps in your project/organisation. In this article, we will discuss how to choose a tool that is ideal for you. Let us explore the following tools in detail and understand how to implement them.
Before we dive in to understand the tools, Let us look at the key phases of Machine Learning Operations.
The key phases of MLOps are
Data gathering
Data analysis
Data transformation/preparation
Model training & development
Model validation
Model serving
Model monitoring
Model re-training
Some of the well known tools avialable could be open source tools like MLflow, DVC, TensorBoard, Guild Ai, Kubeflow etc and someare paid versions like Neptune, comet, Vaholai, SageMaker etc.
How to evaluate an experiment tracking tool?
There is no one answer to the question “what is the best experiment tracking tool?”. We need to identify our requirements and check for various tools functionalities. While selecting, we will have to consider whether all MLOps tasks are covered in the tools or not.
Data and Pipeline Versioning: Version control for datasets, features and their transformations
Model and Experiment Versioning: Tracking candidate model architectures and the performance of model training runs
Hyperparameter Tuning: Systematic optimization of hyperparameter values for a given model
Model Deployment and Monitoring: Managing which model is deployed in production and tracking its ongoing performance
While selecting the tool, a Data Scientist or a Researcher should consider checking the following
If the tool comes with a web UI or it is console-based
If they can integrate the tool with preferred model training frameworks
What metadata can be logged, displayed, and compared (code, text, audio, video, etc.)
Can they easily compare multiple runs? If so, in what format – only table, or also charts
If organizing and searching through experiments are user-friendly
If metadata structures and dashboards can be customized
How easy it is to collaborate with other team members – Is it just about sharing a link to the experiment or screenshots have to be used as a workaround?
As an ML Engineer, you should check if the tool lets you
Easily reproduce and re-run experiments
Track and search through experiment lineage (data/models/experiments used downstream)
Save, fetch, and cache datasets for experiments
Integrate it with your CI/CD pipeline
Easily collaborate and share work with your colleagues
ML team lead should look at
General business-related aspects like pricing model, security, and support
How much infrastructure the tool requires, how easy it is to integrate it into your current workflow
Is the product delivered as commercial software, open-source software, or a managed cloud service?
What collaboration, sharing, and review feature it has
Let us look at some of the open source MLOps tools –
1. MLflow
MLflow is an open-source platform that helps manage the whole machine learning lifecycle. This includes experimentation, model storage, reproducibility, and deployment. Each of these four elements is represented by MLflow components.
MLflow Tracking – While running machine learning code, it acts as an API and UI for code versions, logging parameters, metrics, and artifacts and even for later comparing and visualizing the results
MLflow Projects – To share with other data scientists or transfer to production, it performs the packaging of ML code during a reusable, reproducible form
MLflow Models – Managing and deploying models from different ML libraries to a spread of model serving and inference platforms
MLflow Model Registry – A central model store to collaboratively manage the complete lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations
The MLflow Tracking component consists of an API and UI that support logging various metadata (including parameters, code versions, metrics, and output files) and later for visualizing the results.
Main advantages of MLflow are
Focus on the whole lifecycle of the machine learning process
Strong and big community of users that provides community support
Open interface that can be integrated with any ML library or language.
MLflow Projects provides great tools for reproducibility, extensibility, and experimentation.
MLflow can work with any ML library, algorithm, deployment tool or language
2. Kubeflow
Kubeflow is a full-fledged open source MLOps tool that makes the orchestration and deployment of Machine Learning workflows easier. Kubeflow provides dedicated services and integration for various phases of Machine Learning, including training, pipeline creation, and management of Jupyter notebooks. It integrates with various frameworks and also handles TensorFlow training jobs easily.
The Kubeflow project is dedicated to making ‘deployments’ of ML workflows on ‘Kubernetes’ simple, portable and scalable. It provides components for each stage in the ML lifecycle, from exploration to training and deployment.
Kubeflow takes advantage of multiple cloud native technologies including Istio, Knative, and Tekton. It leverages core Kubernetes primitives such as storage classes, deployments, services and custom resources. With Istio and Knative, Kubeflow gets the capabilities such as traffic splitting, blue/green deployments, canary releases, scale to zero, and auto-scaling. Tekton brings the ability to build images natively within the platform.
The key advantage of using Kubeflow is that it hides away the complexity involved in containerizing the code required for data preparation, training, tuning, and deploying machine learning models.
Main advantages
A user interface (UI) for managing and tracking experiments, jobs, and runs.
An end-to-end open-source platform
Built-in Notebook server service
3. Data Version Control (DVC)
DVC is a python written open source tool for Data Science and Machine Learning projects. It takes on a Git-like model to provide management and versioning of datasets and machine learning models. DVC is a simple command-line tool that makes machine learning projects shareable and reproducible.
DVC Studio is part of the DVC group of tools powered by iterative.ai. DVC studio provides a visual interface for ML projects, that was created to help users track experiments, visualize them and collaborate on them with the team.
The DVC Studio application can be accessed online or hosted on-premises.
The platform has been built to make ML models shareable and reproducible. DVC is designed to handle large files, data sets, machine learning models, and metrics and code. DVC can address the versioning and organization of huge amounts of information and store them in an exceedingly well-organized, accessible way. It emphasizes data and pipeline versioning and management.
Main advantages
DVC Studio is a visual interface that can be connected to GitHub, GitLab, or Bitbucket
It extracts metadata (model metrics and hyperparameters) from JSON files and presents them in a nice UI
Applies existing software engineering stack for ML teams.
Possibility to use differing kinds of storage— it’s storage agnostic.
Reproducibility by consistently maintaining a mixture of the input file, configuration, and therefore the code that was initially accustomed to run an experiment.
It connects ML steps into a DAG and run the total pipeline end-to-end.
4. Guild AI
Guild AI is an experiment tracking system for machine learning. It is equipped with features that allow you to run analysis, visualization, and diffing, automate pipelines, tune hyperparameters with AutoML, do scheduling, parallel processing, and remote training.
Guild AI also comes with multiple integrated tools for comparing experiments, such as:
Guild Compare – A curses-based application that lets you view runs in a spreadsheet format including flags and scalar values
Guild View – A web-based application that lets you view runs and compare results
Guild Diff<- A command that lets you compares two runs
Guild Compare- A curses-based application that lets you view runs in a spreadsheet forma including flags and scalar values
Guild View – A web-based application that lets you view runs and compare results
Guild Diff – A command that lets you compares two runs
Main advantages
No need to change the code, it runs scripts written in any language or framework
Doesn’t require additional software or systems like databases or containers
Strong and big community of users that provide community support
It helps in run analysis, visualization, and diffing, Pipeline automation, Hyperparameter tuning with AutoML and Scheduling
It uses parallel processing for faster training of models
5. Sacred + Omniboard
Sacred is open-source software that allows machine learning researchers to configure, organize, log, and reproduce experiments. It is designed to do all the tedious overhead work that you need to do around your actual experiment in order to:
Keep track of all the parameters of your experiment
Easily run your experiment for different settings
Save configurations for individual runs in a database
Reproduce your results
Sacred doesn’t come with its proper UI but there are a few dashboarding tools that you can connect to it, such as Omniboard.
Main advantages
Possibility to connect it to the preferred UI.
Possibility to track any model training developed with any Python library
Extensive experiment parameters customization optionsPyt
6. Pachyderm
Pachyderm is an enterprise-grade, open-source data science platform that makes it possible for its users to control an end-to-end machine learning cycle. From data lineage, through building and tracking experiments, to scalability options.
The software is available in three different versions:
Community – free and source-available version of Pachyderm built and backed by a community of experts;
Enterprise Edition – a complete version-controlled platform that can be deployed on the Kubernetes infrastructure of users’ choice;
Hub Edition – Hosted and managed version of Pachyderm.
Main advantages
Possibility to adapt the software version to your own needs
Established and backed by a strong community of experts
Pachyderm can efficiently schedule massively parallel workloads
Incremental Processing: Pachyderm understands how your data has changed and is smart enough to only process the new data
Version Control: Pachyderm version controls your data as it’s processed. You can always ask the system how data has changed, see a diff, and, if something doesn’t look right, revert
7. TensorBoard
TensorBoard is the visualization toolkit for TensorFlow, so it’s often the first choice of TensorFlow users. TensorBoard offers a suite of features for the visualization and debugging of machine learning models. Users can track experiment metrics like loss and accuracy, visualize the model graph, project embeddings to a lower-dimensional space, and much more.
TensorBoard provides the visualization and other features for machine learning experimentation:
Tracking and visualizing metrics such as loss and accuracy
Visualizing the model graph (ops and layers)
Viewing histograms of weights, biases, or other tensors as they change over time
Projecting embeddings to a lower dimensional space
Displaying images, text, audio data and Profiling TensorFlow programs
There’s also TensorBoard.dev that lets you upload and share your ML experiment results.
Main advantages
Well-developed features related to working with images, e.g. TensorBoard’s Projector that allows you to visualize any vector representation like word embeddings and images
The What-If Tool (WIT), that’s an easy-to-use interface for expanding understanding of black-box classification and regression ML models
Strong and big community of users that provide community support
8. ClearML
ClearML is an open-source platform, a suite of tools to streamline your ML workflow, supported by the team behind Allegro AI. The suite includes model training logging and tracking, ML pipelines management and data processing, data management, orchestration, and deployment.
All these features are reflected in 5 ClearML modules
ClearML- Python Package for integrating ClearML into your existing code-base
ClearML- Server storing experiment, model, and workflow data, and supporting the Web UI experiment manager
ClearML- Agent which is the ML-Ops orchestration agent, enabling experiment and workflow reproducibility, and scalability
ClearML- Data that provides data management and versioning on top of file-systems/object-storage
ClearML- Session that allows you to launch remote instances of Jupyter Notebooks and VSCo
ClearML is integrated with many frameworks and libraries, including model training, hyperparameter optimization, and plotting tools, as well as storage solutions.
Main advantages
ClearML Web UI that lets you track and visualize experiments
An option to work with tasks in Offline Mode, in which all information is saved in a local folder
Multiple users collaboration enabled by the ClearML Server
Comparison of open source tools available
We can see that MLflow and Pachyderm focus on entire lifecycle of the MLOps while TensorBoard and Guild AI focus on experiment management. This is one of the most important criteria for selecting a tool for MLOps. Pachyderm and kubeflow are not easy to integrate and are not lightweight.
The main focus of Data Version Control (DVC) is data versioning, model versioning and code versioning. MLflow and TensorBoard do not provide this feature. We can log artifacts in all the tools except in TensorBoard. Feature of logging audio and video is either limited or not available in the tools. Resource Monitoring is available in TensorBoard, kubeflow and pachyderm. This is comparison based on experiment tracking feature.
On Ui only pachyderm and kubeflow provide user management options. Features like view sharing, run comparison are provided by most of them. Customizable dashboards are still a feature which is limited or not provided by the tools. Reports are only provided by pachyderm and kubeflow.
All the above tools can be deployed on premise. All tools expect tensorboard to fetch experiments via api. Pachyderm, kubeflow and DVC are scalable to millions of runs and have a dedicated user support.
While selecting a tool we should also see the integrations avaiable and check if the packages used in the model are supported or not.
Some of paid tools available in market are
1. Neptune
Neptune is a metadata store for any MLOps workflow. It was built for both research and production teams that run a lot of experiments. It lets you monitor, visualize, and compare thousands of ML models in one place. Neptune supports experiment tracking, model registry, and model monitoring and it’s designed in a way that enables easy collaboration.
Users can create projects within the app, work on them together, and share UI links with each other (or even with external stakeholders). All this functionality makes Neptune the link between all members of the ML team. Neptune is available in the cloud version and can be deployed on-premise. It’s also integrated with 25+ other tools and libraries, including multiple model training and hyperparameter optimization tools.
Main advantages
Possibility to log and display all metadata types including parameters, model weights, images, HTML, audio, video etc
Flexible metadata structure that allows you to organize training and production metadata the way you want to
Easy to navigate web UI that allows you to compare experiments and create customized dashboards
2. Weight & Biases.
It is a machine learning platform built for experiment tracking, dataset versioning, and model management. For the experiment tracking part, its main focus is to help Data Scientists track every part of the model training process, visualize models and compare experiments.
It is also available in the cloud and as an on-premise tool. In terms of integrations, Weights & Biases support multiple other frameworks and libraries including Keras, PyTorch environment, TensorFlow, Fastai, Scikit-learn, and more.
Main advantages
A user-friendly and interactive dashboard that is a central place of all experiments in the app. It allows users to organize and visualize results of their model training process.
Hyperparameter search and model optimization
Diffing of logged datasets
3. Comet
Comet is an ML platform that helps data scientists track, compare, explain and optimize experiments and models across the model’s entire lifecycle, i.e. from training to production. In terms of experiment tracking, data scientists can register datasets, code changes, experimentation history, and models.
Comet is available for teams, individuals, academics, organizations, and anyone who wants to easily visualize experiments, facilitate work and run experiments. It can be used as a hosted platform or deployed on-premise.
Main advantages
Extensive comparison features—code, hyperparameters, metrics, predictions, dependencies, system metrics and more
Dedicated modules for vision, audio, text, and tabular data that allow for easy identification of issues with the dataset
Fully-customizable experiment table within the web-based user interface
4. Polyaxon
Polyaxon is a platform for reproducible and scalable machine learning and deep learning applications. It includes a wide range of features from tracking and optimization of experiments to model management, run orchestration, and regulatory compliance. The main goal of its developers is to maximize the results and productivity while saving costs.
In terms of experiment tracking, Polyaxon allows you to automatically record key model metrics, hyperparameters, visualizations, artifacts and resources as well as version control code and data. To later display the logged metadata, you can use Polyaxon UI or integrate it with another board, e.g. TensorBoard.
Polyaxon can be deployed on-premise or on a cloud provider of your choice. It also supports major ML and DL libraries, such as TensorFlow, Keras, or Scikit-learn.
Main advantages
Polyaxon UI that’s represented by the Runs Dashboard with visualization capabilities, collaboration features, and extendable interface
Collaboration features and project management tools
Scalable solution – Offers different plans from open source to cloud and enterprise
5. Valohai
Valohai is an MLOps platform that automates everything from data extraction to model deployment. The team behind this tool says that Valohai “offers Kubeflow-like machine orchestration and MLflow-like experiment tracking without any setup”. Although experiment tracking is not the main focus of this platform, it provides some functionality such as experiments comparison, version control, model lineage, and traceability.
Valohai is compatible with any language or framework and also many different tools and apps. It can be set up on any cloud vendor or in an on-premise setup. The software is also teamwork-oriented and has many features that facilitate it.
Main advantages
Significant acceleration of the model building process
Focused on the entire lifecycle of machine learning
Since it’s a platform built mainly for enterprises, privacy and security are their driving principles
6. Verta.ai
Verta is an enterprise MLOps platform. Its main features can be summarized in four words: track, collaborate, deploy and monitor. These functionalities are reflected in Verta’s main products: Experiment Management, Model Registry, Model Deployment, and Model Monitoring. The software has been created to facilitate the management of the entire machine learning lifecycle.
The Experiment Management component allows you to track and visualize ML experiments, log various metadata, search through and compare experiments, ensure model reproducibility, collaborate on ML projects within a team, and much more.
Verta supports many popular ML frameworks including TensorFlow, PyTorch, XGBoost, ONNX and more. It is available as an open-source service, SaaS and Enterprise.
Main advantages
Possibility to built customizable dashboards and visualize the modeling results
Collaboration features and user management
Scalable solution that covers multiple steps of the MLOps pipeline
7. SageMaker Studio.
SageMaker Studio is part of the AWS platform. It allows data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models. It claims to be the first integrated development environment (IDE) for ML. It has four components: prepare, build, train & tune, deploy & manage. The experiment tracking functionality is covered by the third one, train & tune. Users can log, organize and compare experiments, automate hyperparameter tuning, and debug training runs.
Main advantages
Built-in debugger and a profiler that lets you identify and reduce training errors and performance bottlenecks
Possibility to track thousands of experiments
Integration with a wide range of Amazon tools for ML related tasks
After comparing all the available tools we can see different tools offers different features, users can make a checklist of required features and then select tool with matching features.
Tools which are paid like neptune offers more features than open source tools like we can also log audio and video file which none of open source platform provides. Neptune provides flexible metadata structure that allows us to organize training and production metadata the way we want.
Most of paid tools provide a feature to create customized dashboards which helps in faster tracking of experiments. Valohai helps in automating everything from data extraction to model deployment.
Tools like neptune, Comet and Weights & Biases provides more integration options than most of open source tools.
As a conclusion, The range of available options is broad and diverse. Many options are available for experiment tracking framework if you are looking for an open-source or even an enterprise solution.
Check out the earlier articles in this series to understand how to install MLflow and implementing MLOps using MLflow.
Author
Explore No-code, automated machine learning for analytics teams here.