[Episode 5]: MLOps on GCP using MLflow

By admin, Sep 29, 2021

In our earlier articles, we explored the MLOps deployment process on AWS and Azure. In this article, we cover how ML Models can be deployed on Google Cloud Platform (GCP) using MLflow. Let’s look at the 4-steps process involved in the implementation:

MLOps on GCP using MLflow

1. Creating MLFlow Docker Image

We will create a Docker image on MLflow, which doesn’t have an official Docker distribution available. We can install MLflow via PIP on our server VMs. However, to increase reproducibility, let us containerize MLflow.

docker build -t mlflow:1.14.1 .

2. Setting up Google Cloud infrastructure

In this section, we’ll use the Docker image created in the previous section, to start the tracker server. We’ll also be using Google Cloud SQL as the backend store and Google Cloud Storage as artifact store of our tracking server. This will ensure our tracking server VM is stateless and can be upgraded without losing state.

Let us begin with infrastructure provisioning.

Service Account

gcloud iam service-accounts create mlflow-tracking-sa –description=”Service Account to run the MLFLow tracking server” –display-name=”MLFlow tracking SA”

Artifact Bucket

gsutil mb gs://

Cloud SQL

gcloud sql instances create mlflow-backend –tier=db-f1-micro –region=us-central1 –root-password= –storage-type=SSDgcloud sql databases create mlflow –instance=mlflow-backend

IAM

gsutil iam ch ‘serviceAccount:mlflow-tracking-sa@.iam.gserviceaccount.com:roles/storage.admin’ gs://gcloud project add-iam-policy-binding erudite-realm-303906 –member=’serviceAccount:mlflow-tracking-sa@.iam.gserviceaccount.com’ –role=roles/cloudsql.editor

Startup Script

The below script can be used to initialize the VM instance.

#!/bin/bashMLFLOW_IMAGE=kaysush/mlflow:1.14.1
CLOUD_SQL_PROXY_IMAGE=gcr.io/cloudsql-docker/gce-proxy:1.19.1
MYSQL_INSTANCE=:us-central1:mlflow-backendecho ‘Starting Cloud SQL Proxy’
docker run -d –name mysql –net host -p 3306:3306 $CLOUD_SQL_PROXY_IMAGE /cloud_sql_proxy -instances $MYSQL_INSTANCE=tcp:0.0.0.0:3306echo ‘Starting mlflow-tracking server’
docker run -d –name mlflow-tracking –net host -p 5000:5000 $MLFLOW_IMAGE mlflow server –backend-store-uri mysql+pymysql://root:@localhost/mlflow –default-artifact-root gs:///mlflow_artifacts/ –host 0.0.0.0echo ‘Altering IPTables’
iptables -A INPUT -p tcp –dport 5000 -j ACCEPT

Upload the below script as
start_mlflow_tracking.sh at gs:///scripts/start_mlflow_tracking.sh .

Once all of the components are deployed, the MLflow tracking UI can be accessed using the external IP of the instance.

3. Track an ML Experiment

After the infrastructure is all set, the next step is to run an example experiment and track it. We will use MLflow’s example on scikit-learn for it. Let’s clone the Github repository.

git clone https://github.com/mlflow/mlflow
cd mlflow/examples/LR_telecom

This example uses conda to setup the virtual environment. One issue with this example is that it assumes file based artifact store by default and hence does not have a dependency on the google-cloud-storage. The conda.yaml file needs to be modified before creating the virtual environment as below:

name: tutorial
channels:
– defaults
dependencies:
– python=3.6
– pip
– pip:
– scikit-learn==0.23.2
– mlflow>=1.0
– google-cloud-storage==1.36.2

Now, let’s create the virtual environment:

conda env create -f conda.yaml
conda activate tutorial

Now we will run train.py which fits a logistic regression model for customer churn data . There is nothing out of usual in this code except few bits where we use mlflow.log_metric and mlflow.log_model to track our run.

Before we trigger this run, we need to set MLFLOW_TRACKING_URI environment variable to http://:5000 .

export MLFLOW_TRACKING_URI=http://:5000
python train.py

If everything is setup right, you can see an output in console.

Track ML experiment

4. Serving Model as REST API

Once a run is successfully registered, we can serve this model as a REST API and score records using JSON payloads. Go to MLflow tracking UI and get the run-id of your experiment run.

Serving model as Rest API

mlflow models serve -m runs:/6b8a350a5ec1417a94a3d45d1796739d/model

MLflow will create a conda environment. Install all the necessary dependencies and then start a server on localhost.

The following output would appear if the serving is successful.

2021/03/11 11:04:44 INFO mlflow.models.cli: Selected backend for flavor ‘python_function’
2021/03/11 11:05:08 INFO mlflow.utils.conda: === Creating conda environment mlflow-45bbec5b524b8308f4a03a7d2b96160e8715b9e1 ===
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: \ Ran pip subprocess with arguments:
2021/03/11 11:13:15 INFO mlflow.pyfunc.backend: === Running command ‘conda activate mlflow-45bbec5b524b8308f4a03a7d2b96160e8715b9e1 & waitress-serve –host=127.0.0.1 –port=5000 –ident=mlflow mlflow.pyfunc.scoring_server.wsgi:app’
INFO:waitress:Serving on http://127.0.0.1:5000

We will use some payload to test our endpoint. This will return predicted value if customer will churn or not. Since we send in two observations, we will get two values back.

This completes the process of using MLflow to deploy on GCP and remotely track ML experiments. Explore the earlier articles to understand how to Install MLflow and implement MLOps using MLflow.

Stay tuned for more articles on MLOps.

Author

Data engineering team

GainInsights

info@gain-insights.com

Explore No-code, automated machine learning for analytics teams here.

You seems to be usingold browser.

Services

Unleashing hospitality insights with a Unified Data Product

Solutions

Analytics-Ready Data Lake

Unified Data Platform

Analytics-Ready Data Lake

Airport

Analytics-Ready Data Lake

Consumer Packaged Goods

Analytics-Ready Data Lake

Retail

Analytics-Ready Data Lake

Financial Services

Analytics-Ready Data Lake

Automotive

Analytics-Ready Data Lake

HR Analytics

Partnerships

Analytics-Ready Data Lake

Company

Analytics-Ready Data Lake

About Us

Analytics-Ready Data Lake

Partnerships

Analytics-Ready Data Lake

Resources

Analytics-Ready Data Lake

CSR

Analytics-Ready Data Lake

Contact Us

Analytics-Ready Data Lake

Blogs

[Episode 5]: MLOps on GCP using MLflow

By admin, Sep 29, 2021

1. Creating MLFlow Docker Image

2. Setting up Google Cloud infrastructure

Service Account

Artifact Bucket

Cloud SQL

IAM

Startup Script

3. Track an ML Experiment

4. Serving Model as REST API

Author

RECENT POSTS

Looking to connect with us?

You seems to be using
old browser.