[RFC] Reduce TorchServe Docker Image Size

## 🚀 The feature

- Reduce TorchServe CPU Image size by 25% using slim as the base image
- Refactor TorchServe Dockerfile to support slim based CPU & GPU Docker images and setup docker ci github action to test these images 


TorchServe Docker Image Sizes have gone up with every release.

The main reasons being
 -  PyTorch and its dependencies
     -  PyTorch GPU binaries have increased in size considerably 
 -  Open JDK has increased in size

The solution would be to use a light weight base image instead of ubuntu. After experiments with [alpine](https://github.com/pytorch/serve/pull/2381) and [slim](https://github.com/pytorch/serve/pull/2385), slim seems to be the best solution

### CPU Image

The following code would set up a base image with Python slim 3.9 and openjdk-17
```
ARG PYTHON_VERSION=3.9
ARG BASE_IMAGE=python:$PYTHON_VERSION-slim
FROM ${BASE_IMAGE} as compile-image

COPY --from=openjdk:17.0.1-jdk-slim /usr/local/openjdk-17 /usr/local/openjdk-17
ENV JAVA_HOME /usr/local/openjdk-17
RUN update-alternatives --install /usr/bin/java java /usr/local/openjdk-17/bin/java 1
```

### GPU Image

Replacing Nvidia run time image with nvidia base image is reducing GPU docker size by 3 GB as shown [here](https://github.com/pytorch/serve/pull/2392)

To optimize further, we would need to replace the base image with slim
Nvidia doesn't officially provide slim based image.

The simplest approach for GPU images would be to use the Nvidia ubuntu base image. [Example](https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/11.7.0/ubuntu2004/base/Dockerfile)
This would make the code simple.

The problem with this approach would be that
 - CPU image would be based on slim (debian)
 - GPU Image would be based on Ubuntu
 
In order to make this consistent, we would need to create and maintain slim based Nvidia base image

The Dockerfile would be as follows
```
FROM python:3.9-slim as base
FROM base as base-amd64
ENV NVARCH x86_64
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-7
FROM base as base-arm64
ENV NVARCH sbsa
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
FROM base-amd64
RUN apt-get update && apt-get install -y --no-install-recommends \
    gnupg2 curl ca-certificates && \
    curl -fsSL [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}/3bf863cc.pub](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D/3bf863cc.pub) | apt-key add - && \
    echo "deb [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D) /" > /etc/apt/sources.list.d/cuda.list && \
    apt-get purge --autoremove -y curl \
    && rm -rf /var/lib/apt/lists/*
ENV CUDA_VERSION 11.7.0
# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
    cuda-cudart-11-7=${NV_CUDA_CUDART_VERSION} \
    ${NV_CUDA_COMPAT_PACKAGE} \
    && rm -rf /var/lib/apt/lists/*
# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
    && echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
```

Maintaining this for every version of CUDA would be a huge challenge.

#### Possible Solution:

Write a script to take base Dockerfile from Nvidia and then modify the template to use slim.

The above mentioned solutions would already need TorchServe to have 2 base Dockerfiles (Dockerfile.cpu & Dockerfile.gpu)

### Testing

To test the above changes, we would need to have Docker CI.
Creating a new infra to test Docker is going to be a lot of work. 
The easier solution would be to:
- Build a docker image
-  Re-use the regression scripts we already have in `test/regression_tests.py`

This would mean we need to have a new Dockerfile to run the following on container start
```
CMD ["python", "test/regression_tests.py"]
```

The proposal to re-organise Dockerfile is as follows

To build images for production
```
Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile
```
To build dev images for the above case, we would just pass the `dev` flag to the docker build command in step 2 & 3

To build images for Docker CI

```
Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile.ci
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile.ci
```

Sample contents

Dockerfile.runtime
```
ARG PYTHON_VERSION
ARG BASE_IMAGE
FROM ${BASE_IMAGE} AS compile-image
ARG PYTHON_VERSION
ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && apt-get install -y \
curl

# Make the virtual environment and "activating" it by adding it first to the path.
# From here on the python$PYTHON_VERSION interpreter is used and the packages
# are installed in /home/venv which is what we need for the "runtime-image"
RUN python$PYTHON_VERSION -m venv /home/venv
ENV PATH="/home/venv/bin:$PATH"

RUN python -m pip install -U pip setuptools

# This is only useful for cuda env
RUN export USE_CUDA=1

ARG CUDA_VERSION=""

RUN python -m pip install --upgrade pip

RUN git clone --depth 1 https://github.com/pytorch/serve.git

WORKDIR "serve"

RUN \
    if echo "$BASE_IMAGE" | grep -q "cuda:"; then \
        # Install CUDA version specific binary when CUDA version is specified as a build arg
        if [ "$CUDA_VERSION" ]; then \
            python ./ts_scripts/install_dependencies.py --cuda $CUDA_VERSION; \
        # Install the binary with the latest CPU image on a CUDA base image
        else \
            python ./ts_scripts/install_dependencies.py; \
        fi; \
    # Install the CPU binary
    else \
        python ./ts_scripts/install_dependencies.py; \
    fi

# Make sure latest version of torchserve is uploaded before running this
RUN python -m pip install --no-cache-dir torchserve torch-model-archiver torch-workflow-archiver

```

Dockerfile.ci
```
# Final image for docker regression tests
FROM pytorch/torchserve-base:latest AS runtime-image
# Re-state ARG PYTHON_VERSION to make it active in this build-stage (uses default define at the top)
ARG PYTHON_VERSION
ARG BRANCH_NAME
ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && \
    apt-get upgrade -y && \
    apt-get install software-properties-common -y && \
    add-apt-repository -y ppa:deadsnakes/ppa && \
    apt remove python-pip  python3-pip && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    python$PYTHON_VERSION \
    python3-distutils \
    python$PYTHON_VERSION-dev \
    python$PYTHON_VERSION-venv \
    # using openjdk-17-jdk due to circular dependency(ca-certificates) bug in openjdk-17-jre-headless debian package
    # https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009905
    openjdk-17-jdk \
    build-essential \
    wget \
    numactl \
    nodejs \
    npm \
    zip \
    unzip \
    && npm install -g newman newman-reporter-htmlextra markdown-link-check \
    && rm -rf /var/lib/apt/lists/* \
    && cd /tmp

ENV PATH="/home/venv/bin:$PATH"

RUN python -m pip install --no-cache-dir -r https://raw.githubusercontent.com/pytorch/serve/$BRANCH_NAME/requirements/developer.txt

RUN mkdir /home/serve
ENV RUN_IN_DOCKER True

WORKDIR /home/serve
CMD ["python", "test/regression_tests.py"]
```


### Motivation, pitch


#### CPU

CPU Docker size has grown over time

Torchserve 0.6.0 CPU Docker Image Size
```
pytorch/torchserve               0.6.0-cpu                           af91330a97bd   11 months ago       1.49GB

```
TorchServe 0.8.0 Nightly Docker Image size

```
pytorch/torchserve-nightly       latest-cpu                          dceecc667a8a   13 hours ago        2GB

```
We did an analysis and the main reasons are

 - JDK size went by 280 MB ( JDK11 to JDK17)
 - PyTorch and its dependencies have grown in size (170MB)


Currently, there isn't much we can do about PyTorch and dependencies.

The solution is around using a lightweight base image .

After experiments with alpine and slim, slim images work well for TorchServe.

The current solution is using slim as the base image. The size is comparable to v0.6.0 image size
 
```
pytorch/torchserve-slim          latest                              ee9912ac1c50   About an hour ago   1.5GB

```

#### GPU

0.6.0 GPU image size

```
pytorch/torchserve           0.6.0-gpu                           fb6d4b85847d   11 months ago   4.49GB
```

0.8.0 Nightly GPU Image
```
pytorch/torchserve-nightly   latest-gpu                          4595b0ca83a3   12 hours ago    8.4GB
```

The main contributers are

- 2 GB from increase in PyTorch binaries and its dependencies(torch, torchvision, torchaudio)
```
3.9G	/home/ubuntu/anaconda3/envs/pytorch_2.0
1.9G	/home/ubuntu/anaconda3/envs/pytorch_1.11

```

- 1.04 GB from the nvidia runtime image
```
nvidia/cuda                  11.7.0-cudnn8-runtime-ubuntu20.04   6e0488db6af9   5 months ago    2.92GB
nvidia/cuda                  10.2-cudnn8-runtime-ubuntu18.04     9134b931c303   5 months ago    1.88GB
nvidia/cuda                  11.7.0-base-ubuntu20.04     3790a37af140   5 months ago    211MB
```

- 280 MB from jdk 11 to jdk 17

Solution

1) Use nvidia base image instead of runtime image (https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)

This reduces image size by 3GB
```
pytorch/ts-base              latest-gpu                          70dac4f0a335   3 hours ago     5.23GB

```

### Proposed Order of tasks
- Implement Docker Regression Suite 
  - Split Dockerfile into Dockerfile.runtime  & Dockerfile.ci/Dockerfile
  - Update regression tests to make them work on docker
- Implement and integrate Dockerfile.cpu ( with slim)
- Implement and integrate Dockerfile.gpu( with slim)
- Update Dockerfile.dev to work with the above changes

### Alternatives

_No response_

### Additional context

_No response_
```[tasklist]
### Tasks
- [ ] https://github.com/pytorch/serve/pull/2392
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Reduce TorchServe Docker Image Size #2411

🚀 The feature

CPU Image

GPU Image

Possible Solution:

Testing

Motivation, pitch

CPU

GPU

Proposed Order of tasks

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Reduce TorchServe Docker Image Size #2411

Description

🚀 The feature

CPU Image

GPU Image

Possible Solution:

Testing

Motivation, pitch

CPU

GPU

Proposed Order of tasks

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions