It is almost an obvious fact that features are crucial in helping machine learning models to process and understand datasets for training and production. Also, Feature extraction and storage are among the most important but often overlooked aspects of machine learning solutions. If we are trying to build a single machine learning model, feature extraction might seem basic but it can get complicated as the teams continue to scale.
For a large organization with dozens of data science teams cranking up machine learning models, each team would need to process different datasets and extract the corresponding features which makes it computationally expensive and also nearly impossible to scale. Feature Store also reduces computational time.
Another key challenge faced by high performance machine learning teams is to be able to build mechanisms with reusable features.In this context, Feature Store is emerging to be prevalent in modern machine learning solutions. A Feature Store serves as a repository of features that can be used on the training and evaluation of machine learning models. In this article, we will discuss about an open source feature store available for machine learning models named Feast. It abstracts many fundamental building blocks of feature extraction, transformation and discovery which are omnipresent in machine learning applications.
Importance of Feast as a Feature Store
Models need consistent access to data
Machine Learning (ML) systems built on traditional data infrastructure are often coupled to databases, object stores, streams, and files. However, as a result of this coupling is that, any change in data infrastructure may break dependent ML systems. Feast decouples your models from your data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval. It also provides a consistent means of referencing feature data for retrieval and therefore, ensures that models remain portable when moving from training to serving.
Deploying new features into production is difficult
Members of ML teams may have different objectives. Data scientists for example, aim to deploy features into production as soon as possible, while engineers want to ensure that production systems remain stable. These differing objectives can create an organizational friction that slows time-to-market for new features. Feast addresses this friction by providing both a centralized registry to which data scientists can publish features and a battle-hardened serving layer. Together, they enable non-engineering teams to ship features into production with minimal oversight.
Models need point-in-time correct data
ML models in production require a view of data that is consistent with the one on which they are trained, failing which the accuracy of these models could be compromised. But despite this need, many data science projects suffer inconsistencies introduced by future feature values being leaked to models during training. Feast solves the challenge of data leakage by providing point-in-time correct feature retrieval when exporting feature datasets for model training.
Features aren’t reused across projects
Different teams within an organization are often unable to reuse features across projects. Feast addresses this problem by introducing feature reuse through a centralized registry. This registry enables multiple teams working on different projects not only to contribute features, but also to reuse these same features.
Advantages of Feast Feature Store
Bridges gap between teams
Feast enables track and share of features between data scientists including a version-control repository. It Bridges the gap between data scientists and data & ML engineers.
Model Training-Serving Consistency
Feast enables feature consistency between model training and serving. This addresses the constant mismatch between the development and production version of machine learning models.
Feature Discovery
Feast enables the exploration and discoverability of features. This allows for a deeper understanding of features and their specifications, more feature reuse between teams and projects and also faster experimentation.
Implementation-agnostic
The output from Feature Stores is implementation-agnostic. No matter which algorithm or framework we use, the application/model will get data in a consistent format.
Time Saving
Feature Store saves time that would otherwise be spent in computing features. It gives more time for building new models.
Below figure shows how a Feature Store and its components help reduce computational time.
So,this was an overview of Feast as a Feature Store.
In our next article, Let’s look at How to setup and Integrate Feast with MLflow.
Stay tuned.
Check out the earlier articles in this series to understand how to install MLflow and implementing MLOps using MLflow.
Author
Explore No-code, automated machine learning for analytics teams here.