MLOps deployment into AWS Fargate: I

Vaibhav Satpathy
5 min readJan 7, 2021

Welcome to 3 Part Tutorial for end to end MLOps, starting from training, tracking, deploying, inferencing.

Part 1: Setup MLflow on AWS EC2

Part 2: MLOps deployment on AWS Fargate: I

Part 3: MLOps deployment on AWS Fargate: II

Getting on with Part 2…

You can find the Github Repository for all the used code here!

There are a lot of Machine Learning engineers around the globe who are excellent at their job of developing State of the art models for their use cases. But most of them have to face a huge hurdle in their life of deploying their models into cloud environment to be accessible by the world as API endpoints.

The biggest challenge for all of the ML engineers is having the see saw battle with DevOps side of the table. As ML developers are great at building models, DevOps engineers are outstanding at their task of setting up infrastructure on cloud, but when both the systems are to work together without detailed knowledge of the other, it becomes a tedious task at hand.

To solve this issue the world has been catching onto a trend of MLOps that involves the best of DevOps, ML and Data Engineering. Using MLOps organisations are able to deliver their state of the art models to the world with ease. According to some of the research organisations used to take nearly 1–6 months of time frame for end to end process of training, tracking, inferencing, deploying and versioning. MLOps has helped these firms to bring down the time significantly to mere weeks.

Today we will be training a simple image classifier using Tensorflow and Python. Followed by the necessary infra set up on AWS for deploying the trained model and making the most out of it.

Pre-requisites:

AWS account
Basic knowledge of python
Basic knowledge of cloud

That’s all we need to learn and develop MLOps skills for the world.

1. Training the model

As it is an image classifier we would need sample data to train it on. So for this purpose we would be using Intel’s image dataset.

https://www.kaggle.com/puneet6060/intel-image-classification

I. First step is to import all the necessary packages to run the script

import os
import random
import numpy as np
import mlflow
from mlflow import pyfunc
import argparse
import json
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import (Input,Conv2D,Dense,Flatten,)

II. After importing the packages we set our tracking URI and cloud storage point

tracking_uri = "http://testuser:test@ec2-18-220-228-243.us-east-2.compute.amazonaws.com"
s3_bucket = "s3://docuedge-mlflow-bucket"

III. Next we declare a simple model architecture

IV. Next we create a data generator

V. Set up a helper function to read all the meta data

V. Finally we compile all the above and using CLI and trigger the training script

Now the model has been trained and the metrics are ready for visualisation on the tracking URI. The model and the supporting artefacts are uploaded into cloud.

2. Infrastructure Setup

I. Primarily once we log in into our AWS dashboard we need to navigate to the Target Groups dashboard and create a Target Group.

II. While creating a Target Group we need to make sure that we select the VPC on which we have our security groups and subnets configured.

III. Protocol version for the Target Group is to be set to HTTP1 and IP type.

IV. In the next step we need to register the targets, which is basically the IP addresses under your subnets in your VPC where you are creating a new Target Group.

Make sure to remember the VPC, subnets, security groups and Load Balancer as we need them to define our task definitions in future

As we plan on deploying the trained model and it’s artefacts in Fargate we need to set up a Load Balancer to be able to visualise the swagger docs i.e API suite.

V. Open the Load balancer dashboard navigating from EC2 dashboard in AWS

VI. Next we need to create a load balancer. So select an ALB (Application Load Balancer)

VII. Fill in the name of your load balancer and select the VPC and it’s corresponding subnets (2 minimum) which were used for setting up the target groups.

VIII. As you plan on having a communication via HTTP(S) protocols you would need to add a SSL certificate from ACM (AWS Certificate Manager)

IX. Next step involves configuring your router and addition of target groups. We can add the Target group that we just created above or if you have an existing Target group that allows HTTP and HTTPS then you can add the same.

If you have selected the default VPC and the Target groups are on another VPC then the complete set up won’t return desired results.
Please be careful while selecting the VPCs, subnets, Target groups and security groups.

X. Review and launch your ALB (Application Load Balancer)

Congratulations you have successfully setup the infrastructure on AWS and your model has been trained with it’s artefacts waiting on S3 to be utilised for deployment.

See you in the next part !!!

--

--