Description
🚀 The feature
- Reduce TorchServe CPU Image size by 25% using slim as the base image
- Refactor TorchServe Dockerfile to support slim based CPU & GPU Docker images and setup docker ci github action to test these images
TorchServe Docker Image Sizes have gone up with every release.
The main reasons being
- PyTorch and its dependencies
- PyTorch GPU binaries have increased in size considerably
- Open JDK has increased in size
The solution would be to use a light weight base image instead of ubuntu. After experiments with alpine and slim, slim seems to be the best solution
CPU Image
The following code would set up a base image with Python slim 3.9 and openjdk-17
ARG PYTHON_VERSION=3.9
ARG BASE_IMAGE=python:$PYTHON_VERSION-slim
FROM ${BASE_IMAGE} as compile-image
COPY --from=openjdk:17.0.1-jdk-slim /usr/local/openjdk-17 /usr/local/openjdk-17
ENV JAVA_HOME /usr/local/openjdk-17
RUN update-alternatives --install /usr/bin/java java /usr/local/openjdk-17/bin/java 1
GPU Image
Replacing Nvidia run time image with nvidia base image is reducing GPU docker size by 3 GB as shown here
To optimize further, we would need to replace the base image with slim
Nvidia doesn't officially provide slim based image.
The simplest approach for GPU images would be to use the Nvidia ubuntu base image. Example
This would make the code simple.
The problem with this approach would be that
- CPU image would be based on slim (debian)
- GPU Image would be based on Ubuntu
In order to make this consistent, we would need to create and maintain slim based Nvidia base image
The Dockerfile would be as follows
FROM python:3.9-slim as base
FROM base as base-amd64
ENV NVARCH x86_64
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-7
FROM base as base-arm64
ENV NVARCH sbsa
ENV NVIDIA_REQUIRE_CUDA "cuda>=11.7"
ENV NV_CUDA_CUDART_VERSION 11.7.60-1
FROM base-amd64
RUN apt-get update && apt-get install -y --no-install-recommends \
gnupg2 curl ca-certificates && \
curl -fsSL [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}/3bf863cc.pub](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D/3bf863cc.pub) | apt-key add - && \
echo "deb [https://developer.download.nvidia.com/compute/cuda/repos/debian11/${NVARCH}](https://developer.download.nvidia.com/compute/cuda/repos/debian11/$%7BNVARCH%7D) /" > /etc/apt/sources.list.d/cuda.list && \
apt-get purge --autoremove -y curl \
&& rm -rf /var/lib/apt/lists/*
ENV CUDA_VERSION 11.7.0
# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-cudart-11-7=${NV_CUDA_CUDART_VERSION} \
${NV_CUDA_COMPAT_PACKAGE} \
&& rm -rf /var/lib/apt/lists/*
# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
&& echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
Maintaining this for every version of CUDA would be a huge challenge.
Possible Solution:
Write a script to take base Dockerfile from Nvidia and then modify the template to use slim.
The above mentioned solutions would already need TorchServe to have 2 base Dockerfiles (Dockerfile.cpu & Dockerfile.gpu)
Testing
To test the above changes, we would need to have Docker CI.
Creating a new infra to test Docker is going to be a lot of work.
The easier solution would be to:
- Build a docker image
- Re-use the regression scripts we already have in
test/regression_tests.py
This would mean we need to have a new Dockerfile to run the following on container start
CMD ["python", "test/regression_tests.py"]
The proposal to re-organise Dockerfile is as follows
To build images for production
Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile
To build dev images for the above case, we would just pass the dev
flag to the docker build command in step 2 & 3
To build images for Docker CI
Dockerfile.cpu -> Dockerfile.runtime ->Dockerfile.ci
Dockerfile.gpu -> Dockerfile.runtime -> Dockerfile.ci
Sample contents
Dockerfile.runtime
ARG PYTHON_VERSION
ARG BASE_IMAGE
FROM ${BASE_IMAGE} AS compile-image
ARG PYTHON_VERSION
ENV PYTHONUNBUFFERED TRUE
RUN apt-get update && apt-get install -y \
curl
# Make the virtual environment and "activating" it by adding it first to the path.
# From here on the python$PYTHON_VERSION interpreter is used and the packages
# are installed in /home/venv which is what we need for the "runtime-image"
RUN python$PYTHON_VERSION -m venv /home/venv
ENV PATH="/home/venv/bin:$PATH"
RUN python -m pip install -U pip setuptools
# This is only useful for cuda env
RUN export USE_CUDA=1
ARG CUDA_VERSION=""
RUN python -m pip install --upgrade pip
RUN git clone --depth 1 https://github.com/pytorch/serve.git
WORKDIR "serve"
RUN \
if echo "$BASE_IMAGE" | grep -q "cuda:"; then \
# Install CUDA version specific binary when CUDA version is specified as a build arg
if [ "$CUDA_VERSION" ]; then \
python ./ts_scripts/install_dependencies.py --cuda $CUDA_VERSION; \
# Install the binary with the latest CPU image on a CUDA base image
else \
python ./ts_scripts/install_dependencies.py; \
fi; \
# Install the CPU binary
else \
python ./ts_scripts/install_dependencies.py; \
fi
# Make sure latest version of torchserve is uploaded before running this
RUN python -m pip install --no-cache-dir torchserve torch-model-archiver torch-workflow-archiver
Dockerfile.ci
# Final image for docker regression tests
FROM pytorch/torchserve-base:latest AS runtime-image
# Re-state ARG PYTHON_VERSION to make it active in this build-stage (uses default define at the top)
ARG PYTHON_VERSION
ARG BRANCH_NAME
ENV PYTHONUNBUFFERED TRUE
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && \
apt-get upgrade -y && \
apt-get install software-properties-common -y && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt remove python-pip python3-pip && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
python$PYTHON_VERSION \
python3-distutils \
python$PYTHON_VERSION-dev \
python$PYTHON_VERSION-venv \
# using openjdk-17-jdk due to circular dependency(ca-certificates) bug in openjdk-17-jre-headless debian package
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1009905
openjdk-17-jdk \
build-essential \
wget \
numactl \
nodejs \
npm \
zip \
unzip \
&& npm install -g newman newman-reporter-htmlextra markdown-link-check \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp
ENV PATH="/home/venv/bin:$PATH"
RUN python -m pip install --no-cache-dir -r https://raw.githubusercontent.com/pytorch/serve/$BRANCH_NAME/requirements/developer.txt
RUN mkdir /home/serve
ENV RUN_IN_DOCKER True
WORKDIR /home/serve
CMD ["python", "test/regression_tests.py"]
Motivation, pitch
CPU
CPU Docker size has grown over time
Torchserve 0.6.0 CPU Docker Image Size
pytorch/torchserve 0.6.0-cpu af91330a97bd 11 months ago 1.49GB
TorchServe 0.8.0 Nightly Docker Image size
pytorch/torchserve-nightly latest-cpu dceecc667a8a 13 hours ago 2GB
We did an analysis and the main reasons are
- JDK size went by 280 MB ( JDK11 to JDK17)
- PyTorch and its dependencies have grown in size (170MB)
Currently, there isn't much we can do about PyTorch and dependencies.
The solution is around using a lightweight base image .
After experiments with alpine and slim, slim images work well for TorchServe.
The current solution is using slim as the base image. The size is comparable to v0.6.0 image size
pytorch/torchserve-slim latest ee9912ac1c50 About an hour ago 1.5GB
GPU
0.6.0 GPU image size
pytorch/torchserve 0.6.0-gpu fb6d4b85847d 11 months ago 4.49GB
0.8.0 Nightly GPU Image
pytorch/torchserve-nightly latest-gpu 4595b0ca83a3 12 hours ago 8.4GB
The main contributers are
- 2 GB from increase in PyTorch binaries and its dependencies(torch, torchvision, torchaudio)
3.9G /home/ubuntu/anaconda3/envs/pytorch_2.0
1.9G /home/ubuntu/anaconda3/envs/pytorch_1.11
- 1.04 GB from the nvidia runtime image
nvidia/cuda 11.7.0-cudnn8-runtime-ubuntu20.04 6e0488db6af9 5 months ago 2.92GB
nvidia/cuda 10.2-cudnn8-runtime-ubuntu18.04 9134b931c303 5 months ago 1.88GB
nvidia/cuda 11.7.0-base-ubuntu20.04 3790a37af140 5 months ago 211MB
- 280 MB from jdk 11 to jdk 17
Solution
- Use nvidia base image instead of runtime image (https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)
This reduces image size by 3GB
pytorch/ts-base latest-gpu 70dac4f0a335 3 hours ago 5.23GB
Proposed Order of tasks
- Implement Docker Regression Suite
- Split Dockerfile into Dockerfile.runtime & Dockerfile.ci/Dockerfile
- Update regression tests to make them work on docker
- Implement and integrate Dockerfile.cpu ( with slim)
- Implement and integrate Dockerfile.gpu( with slim)
- Update Dockerfile.dev to work with the above changes
Alternatives
No response
Additional context
No response
### Tasks
- [ ] https://github.com/pytorch/serve/pull/2392