GitHub

How to use MLflow to manage the Machine Learning lifecycle

In this repo, I experiment with MLflow to:

track machine learning experiments based on:
- metrics
- hyper-parameters
- source scripts executing the run
- code version
- notes & comments
compare different runs between each other
set up a tracking server locally and on AWS
deploy the your model using MLflow Models

Quickstart locally

To execute the code:

Install pipenv to run a virtual environment with mlflow (it's cleaner this way)

pip install pipenv

Clone the project

git clone [email protected]:ahmedbesbes/mlflow.git

Install the dependencies

cd mlflow/
pipenv install .

Start a tracking server locally

mlflow ui

Launch the training (or whatever code that logs to MLflow)

python train.py

Visit http://localhost:5000 to check the runs on the MLflow ui

Launch a tracking server on AWS

If you're a team of developers or data scientists, you can spin up a tracking server where everyone logs his/her runs

1. Prepare an EC2 machine and an S3 bucket

create an IAM user on AWS. Get its credentials, namely Access key ID and Secret access key
with this same user, create an s3 bucket to store future artifacts: give this bucket a name. Mine is mlflow-artifact-store-demo but you cannot pick it
Launch an EC2 instance: it doesn't have to be big. a t2.micro eligible to free tier does perfectly the job
Configure the security group of this instance to accept inbound http traffic on port 5000

ssh into your EC2:

install pip

sudo apt update
sudo apt install python3-pip

install pipenv

sudo pip3 install pipenv
sudo pip3 install virtualenv

export PATH=$PATH:/home/[your_user]/.local/bin/

now with pipenv, install the dependencies to run the mlflow server
```
pipenv install mlflow
pipenv install awscli
pipenv install boto3
```
on the EC2 machine, configure aws with user's crendentials so that the tracking server can have access to s3 and display the artifacts on the UI.

enter aws configure then follow the instructions to enter the credentials
start an mlflow server on the EC2 instance by defining the host as 0.0.0.0 and the --default-artifact-root as the S3 bucket
```
mlflow server -h 0.0.0.0  \
              --default-artifact-root s3://mlflow-artifact-store-demo
```

2. Set AWS credentials and change the tracking URI and

set the AWS credentials as environment variables so that the code uploads artifacts to the s3 bucket

export AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
export AWS_SECRET_ACCESS_KEY = <your-aws-secret-access-key>

change the tracking URI to the public dns of your EC2 machine + port 5000

In my case the tracking URI was: http://ec2-35-180-45-108.eu-west-3.compute.amazonaws.com:5000/

Now you everything should be be good: after running the script locally you can inspect metrics on the UI that run on the remote server

By clicking on a specific run, you can see its artifacts uploaded to S3.

In fact, these artifacts are effectively on S3.

Slides

French version
English version (coming soon)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
1_EDA.ipynb		1_EDA.ipynb
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
MLproject		MLproject
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
artifacts-s3.png		artifacts-s3.png
aug_test.csv		aug_test.csv
aug_train.csv		aug_train.csv
conda.yaml		conda.yaml
mlflow.png		mlflow.png
runs-remote.png		runs-remote.png
runs.png		runs.png
s3-console.png		s3-console.png
sample_submission.csv		sample_submission.csv
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How to use MLflow to manage the Machine Learning lifecycle

Quickstart locally

Launch a tracking server on AWS

1. Prepare an EC2 machine and an S3 bucket

2. Set AWS credentials and change the tracking URI and

Slides

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Neeraaaj/MlFlow

Folders and files

Latest commit

History

Repository files navigation

How to use MLflow to manage the Machine Learning lifecycle

Quickstart locally

Launch a tracking server on AWS

1. Prepare an EC2 machine and an S3 bucket

2. Set AWS credentials and change the tracking URI and

Slides

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages