You seems to be using
old browser.

To get the most our of #!% please visit us from one of the following browsers.

menu

Services

menu

Solutions

menu

Unified Data Platform

menu

Airport

menu

Consumer Packaged Goods

menu

Retail

menu

Financial Services

menu

Automotive

menu

HR Analytics

menu

Partnerships

menu

Company

menu

About Us

menu

Partnerships

menu

Resources

menu

CSR

menu

Contact Us

Blogs

[MLOps Basics]: How to setup and integrate Feast with MLflow

By admin, Oct 8, 2021

In this article we will look at how we can setup and integrate Feast Feature Store with MLflow.
We will first need to setup Feast Feature Store in a local environment. We will take few features from telecom churn dataset, store it in a Parquet file, define a new FeatureView in the Feast repository, and retrieve it using Feast.

Following are the key steps involved:
1. Install feast Feature Store package
2. Prepare the dataset for the Feast Feature Store
3. Define features in the Feast repository
4. Retrieve value from the Feature Store
5. Integrate Feature Store with MLflow

Now let us understand each of these steps in detail.

First step is to install feast Feature Store package.

pip install feast

Then we will create a feature repository using feast in it.

feast init feature_repo
cd feature_repo
Feast apply

The Feast repository is created and it contains an example dataset. Hence, the following output will appear.

Preparation of dataset for the Feast Feature Store

The Feast Feature Store works with time-series features. Therefore, every dataset must contain the timestamp in addition to the entity id. Different observations of the same entity may exist if such observations have a different timestamp.

In our example, we will use the telecom churn dataset. The time when the dataset has been obtained will be used as the observation date and the DataFrame indexes will be turned into the entity ids.
When the repository is ready, dataset can be stored in the data directory as a Parquet file.

Dataset:

import pandas as pd
from datetime import datetime
data = pd.read_csv(‘/customer_churn_telecom.csv’)

data.reset_index(level=0, inplace=True)
# turns the index into a columndata = data.rename(columns={‘index’: ‘id’})

data[‘observation_date’] = datetime(2021, 7, 9, 10, 0, 0)
data.to_parquet(‘/content/feature_repo/data/Churn.parquet’)

Defining features in Feast repository

In the next step, we will need to prepare a Python file describing the FeatureView. Ideally, It must define the data input location, the entity identifier, and the available feature columns.

from datetime import timedelta
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, FileSource, ValueType

observations = FileSource(
path=”/content/feature_repo/data/Churn.parquet”,
event_timestamp_column=”observation_date”,)
iris = Entity(name=”id”, value_type=ValueType.INT64, description=”identifier”,)
observations_view = FeatureView(
name=”observations”,
entities=[“id”],
ttl=timedelta(days=0),
features=[
Feature(name=”Partner”, dtype=ValueType.FLOAT),
Feature(name=”SeniorCitizen”, dtype=ValueType.INT64),
Feature(name=”tenure”, dtype=ValueType.INT64),
Feature(name=”gender”, dtype=ValueType.STRING),
],
online=False,
input=observations,
tags={},)

The code above does three things:

It defines the feature source location. In this case, a path to the local file system. Note that the FileSource also requires the column containing the event timestamp.
The Entity object describes which column contains the entity identifier.
Finally, we define the FeatureView, which combines the available column names (and types) with the entity identifier and the data location. We have only historical data in our example, so I set the online parameter to False.

When we have the FeatureView definition, we can reload the repository and use the new feature:

Feast apply

Retrieving value from the Feature Store

To retrieve the value, we must specify the entity ids and the desired observation time:

import pandas as pd
from feast import FeatureStore
from datetime import datetime

entity_df = pd.DataFrame.from_dict( { “id”: range(0, 100), “event_timestamp”: datetime(2021, 7, 11, 10, 0, 0) } )

store = FeatureStore(repo_path=”/content/feature_repo”)
training_df = store.get_historical_features(
entity_df=entity_df,
feature_refs=[work
“observations:gender”,
“observations:tenure”],).to_df()

training_df

We are fetching 2 features (gender and tenure) from telecom churn data which is already stored in Feature Store.

Feast joins the request columns with the given entity_df DataFrame, so when data is not available, we get the entity_df value joined with nulls or NaNs.

In this way we can store features in the Feature Store and retrieve the features for some other project.

Feature Store integration with MLflow

Feast Feature Store can also be used in MLflow which can help in feature management. We can use features of different experiments and use them in other projects.

We can structure and populate data layer with relevant data which can be used for training models and feature generation. We will store features in parquet file in local environment, we can also use S3 storage. We can show Feature Store in Mlflow Ui with the help of ML Projects module where we can create a pipeline and integrate Feature Store repository into Mlflow workload.

After integrating the feast Feature Store with Mlflow we can fetch the historical features used in any project using below code and use them for training the model using different libraries.

if __name__ == “__main__”:
mlflow.set_experiment(“/mlflow/Customer_Acquisition”)
# Read the csv file
path = r”C:\Users\Downloads\customer-acquisition\bank_marketing.csv”
data = pd.read_csv(path)

entity_df = pd.DataFrame.from_dict(
{
“id”: range(0, 7032),
“event_timestamp”: datetime(2021, 7, 11, 10, 0, 0)
}
)

store = FeatureStore(repo_path=r’C:\Users\Downloads\Machine-Learning\feature_store’)

training_df = store.get_historical_features(
entity_df=entity_df,
feature_refs=[
“observations:gender”,
“observations:tenure”,
],
).to_df()

In above code we are fetching features from customer churn data which we already saved and we will use these features in customer acquisition model.

In this way we can fetch features from Feature Store and after this we can use these features to train the model. After training the model we log the metrics in the MLflow python file.

Feature Store with Mlflow will help the teams to –

Immediately find the stored features in the experiments by checking it on Mlflow tracking UI page.
Efficiently fetch features as they are stored in much organized way. It gives more time for building new models.
Reuse features between different projects in a time saving manner.
Process different datasets and extract the corresponding features in a consistent format.

So,this is how we can setup and integrate Feast with MLflow.
In our next series of articles, We will learn about other basic concepts of MLOps like Model Drift, Tools and Technologies for MLOps.

Stay tuned.

Check out the earlier articles in this series to understand how to install MLflow and implementing MLOps using MLflow.

Author

Data engineering team
GainInsights

info@gain-insights.com

Explore No-code, automated machine learning for analytics teams here.

RECENT POSTS

Looking to connect with us?

Start a conversation