Skip to content

[RFC] Reduce TorchServe Docker Image Size #2411

Open
@agunapal

Description

@agunapal

🚀 The feature

  • Reduce TorchServe CPU Image size by 25% using slim as the base image
  • Refactor TorchServe Dockerfile to support slim based CPU & GPU Docker images and setup docker ci github action to test these images

TorchServe Docker Image Sizes have gone up with every release.

The main reasons being

  • PyTorch and its dependencies
    • PyTorch GPU binaries have increased in size considerably
  • Open JDK has increased in size

The solution would be to use a light weight base image instead of ubuntu. After experiments with alpine and slim, slim seems to be the best solution

CPU Image

The following code would set up a base image with Python slim 3.9 and openjdk-17

ARG PYTHON_VERSION=3.9
ARG BASE_IMAGE=python:$PYTHON_VERSION-slim
FROM ${BASE_IMAGE} as compile-image

COPY --from=openjdk:17.0.1-jdk-slim /usr/local/openjdk-17 /usr/local/openjdk-17
ENV JAVA_HOME /usr/local/openjdk-17
RUN update-alternatives --install /usr/bin/java java /usr/local/openjdk-17/bin/java 1

GPU Image

Replacing Nvidia run time image with nvidia base image is reducing GPU docker size by 3 GB as shown here

To optimize further, we would need to replace the base image with slim
Nvidia doesn't officially provide slim based image.

The simplest approach for GPU images would be to use the Nvidia ubuntu base image. Example
This would make the code simple.

The problem with this approach would be that

  • CPU image would be based on slim (debian)
  • GPU Image would be based on Ubuntu

In order to make this consistent, we would need to create and maintain slim based Nvidia base image

The Dockerfile would be as follows

FROM python:3.9-slim as base
FROM base as base-amd64
ENV NVARCH x86_64
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-7
FROM base as base-arm64
ENV NVARCH sbsa
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
FROM base-amd64
RUN apt-get update && apt-get install -y --no-install-recommends \
    gnupg2 curl ca-certificates && \
    curl -fsSL [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}/3bf863cc.pub](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D/3bf863cc.pub) | apt-key add - && \
    echo "deb [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D) /" > /etc/apt/sources.list.d/cuda.list && \
    apt-get purge --autoremove -y curl \
    && rm -rf /var/lib/apt/lists/*
ENV CUDA_VERSION 11.7.0
# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
    cuda-cudart-11-7=${NV_CUDA_CUDART_VERSION} \
    ${NV_CUDA_COMPAT_PACKAGE} \
    && rm -rf /var/lib/apt/lists/*
# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
    && echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

Maintaining this for every version of CUDA would be a huge challenge.

Possible Solution:

Write a script to take base Dockerfile from Nvidia and then modify the template to use slim.

The above mentioned solutions would already need TorchServe to have 2 base Dockerfiles (Dockerfile.cpu & Dockerfile.gpu)

Testing

To test the above changes, we would need to have Docker CI.
Creating a new infra to test Docker is going to be a lot of work.
The easier solution would be to:

  • Build a docker image
  • Re-use the regression scripts we already have in test/regression_tests.py

This would mean we need to have a new Dockerfile to run the following on container start

CMD ["python", "test/regression_tests.py"]

The proposal to re-organise Dockerfile is as follows

To build images for production

Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile

To build dev images for the above case, we would just pass the dev flag to the docker build command in step 2 & 3

To build images for Docker CI

Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile.ci
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile.ci

Sample contents

Dockerfile.runtime

ARG PYTHON_VERSION
ARG BASE_IMAGE
FROM ${BASE_IMAGE} AS compile-image
ARG PYTHON_VERSION
ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && apt-get install -y \
curl

# Make the virtual environment and "activating" it by adding it first to the path.
# From here on the python$PYTHON_VERSION interpreter is used and the packages
# are installed in /home/venv which is what we need for the "runtime-image"
RUN python$PYTHON_VERSION -m venv /home/venv
ENV PATH="/home/venv/bin:$PATH"

RUN python -m pip install -U pip setuptools

# This is only useful for cuda env
RUN export USE_CUDA=1

ARG CUDA_VERSION=""

RUN python -m pip install --upgrade pip

RUN git clone --depth 1 https://github.com/pytorch/serve.git

WORKDIR "serve"

RUN \
    if echo "$BASE_IMAGE" | grep -q "cuda:"; then \
        # Install CUDA version specific binary when CUDA version is specified as a build arg
        if [ "$CUDA_VERSION" ]; then \
            python ./ts_scripts/install_dependencies.py --cuda $CUDA_VERSION; \
        # Install the binary with the latest CPU image on a CUDA base image
        else \
            python ./ts_scripts/install_dependencies.py; \
        fi; \
    # Install the CPU binary
    else \
        python ./ts_scripts/install_dependencies.py; \
    fi

# Make sure latest version of torchserve is uploaded before running this
RUN python -m pip install --no-cache-dir torchserve torch-model-archiver torch-workflow-archiver

Dockerfile.ci

# Final image for docker regression tests
FROM pytorch/torchserve-base:latest AS runtime-image
# Re-state ARG PYTHON_VERSION to make it active in this build-stage (uses default define at the top)
ARG PYTHON_VERSION
ARG BRANCH_NAME
ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && \
    apt-get upgrade -y && \
    apt-get install software-properties-common -y && \
    add-apt-repository -y ppa:deadsnakes/ppa && \
    apt remove python-pip  python3-pip && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    python$PYTHON_VERSION \
    python3-distutils \
    python$PYTHON_VERSION-dev \
    python$PYTHON_VERSION-venv \
    # using openjdk-17-jdk due to circular dependency(ca-certificates) bug in openjdk-17-jre-headless debian package
    # https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009905
    openjdk-17-jdk \
    build-essential \
    wget \
    numactl \
    nodejs \
    npm \
    zip \
    unzip \
    && npm install -g newman newman-reporter-htmlextra markdown-link-check \
    && rm -rf /var/lib/apt/lists/* \
    && cd /tmp

ENV PATH="/home/venv/bin:$PATH"

RUN python -m pip install --no-cache-dir -r https://raw.githubusercontent.com/pytorch/serve/$BRANCH_NAME/requirements/developer.txt

RUN mkdir /home/serve
ENV RUN_IN_DOCKER True

WORKDIR /home/serve
CMD ["python", "test/regression_tests.py"]

Motivation, pitch

CPU

CPU Docker size has grown over time

Torchserve 0.6.0 CPU Docker Image Size

pytorch/torchserve               0.6.0-cpu                           af91330a97bd   11 months ago       1.49GB

TorchServe 0.8.0 Nightly Docker Image size

pytorch/torchserve-nightly       latest-cpu                          dceecc667a8a   13 hours ago        2GB

We did an analysis and the main reasons are

  • JDK size went by 280 MB ( JDK11 to JDK17)
  • PyTorch and its dependencies have grown in size (170MB)

Currently, there isn't much we can do about PyTorch and dependencies.

The solution is around using a lightweight base image .

After experiments with alpine and slim, slim images work well for TorchServe.

The current solution is using slim as the base image. The size is comparable to v0.6.0 image size

pytorch/torchserve-slim          latest                              ee9912ac1c50   About an hour ago   1.5GB

GPU

0.6.0 GPU image size

pytorch/torchserve           0.6.0-gpu                           fb6d4b85847d   11 months ago   4.49GB

0.8.0 Nightly GPU Image

pytorch/torchserve-nightly   latest-gpu                          4595b0ca83a3   12 hours ago    8.4GB

The main contributers are

  • 2 GB from increase in PyTorch binaries and its dependencies(torch, torchvision, torchaudio)
3.9G	/home/ubuntu/anaconda3/envs/pytorch_2.0
1.9G	/home/ubuntu/anaconda3/envs/pytorch_1.11

  • 1.04 GB from the nvidia runtime image
nvidia/cuda                  11.7.0-cudnn8-runtime-ubuntu20.04   6e0488db6af9   5 months ago    2.92GB
nvidia/cuda                  10.2-cudnn8-runtime-ubuntu18.04     9134b931c303   5 months ago    1.88GB
nvidia/cuda                  11.7.0-base-ubuntu20.04     3790a37af140   5 months ago    211MB
  • 280 MB from jdk 11 to jdk 17

Solution

  1. Use nvidia base image instead of runtime image (https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)

This reduces image size by 3GB

pytorch/ts-base              latest-gpu                          70dac4f0a335   3 hours ago     5.23GB

Proposed Order of tasks

  • Implement Docker Regression Suite
    • Split Dockerfile into Dockerfile.runtime & Dockerfile.ci/Dockerfile
    • Update regression tests to make them work on docker
  • Implement and integrate Dockerfile.cpu ( with slim)
  • Implement and integrate Dockerfile.gpu( with slim)
  • Update Dockerfile.dev to work with the above changes

Alternatives

No response

Additional context

No response

### Tasks
- [ ] https://github.com/pytorch/serve/pull/2392

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions