Skip to content

Turn-key solution to checkpoint/restore applications running in Linux containers

License

Notifications You must be signed in to change notification settings

twosigma/fastfreeze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build

Introduction

FastFreeze enables checkpoint/restore for applications running in Linux containers. It uploads/downloads checkpoint images to AWS S3 and Google Storage, provides a friendly CLI to job systems, and does not require elevated privileges (such as CAP_SYS_ADMIN).

The primary use-case of FastFreeze is to make long running and resource intensive applications resilient to failure. This is useful for a variety of reasons such as reducing compute waste, or lowering application completion time. This makes Google's preemptible VM and Amazon Spot VM offerings more attractive. We are exploring other use-cases such as JVM memory ballooning, warm-boots, and jupyter integration.

FastFreeze is powered by the CRIU engine.

Usage in a nutshell

  1. Start the application via FastFreeze with the run command in an empty Linux container (e.g., Kuebrnetes, Docker).

    fastfreeze run --image-url s3://fastfreeze-images/job-1234.ff -- app.sh [args...]
    
  2. Checkpoint the application with the checkpoint command. This persists the state of the application into the AWS S3 location we provided at step 1. The application is terminated upon successful checkpoint.

    fastfreeze checkpoint
    
  3. Restore the application by running is the same command as step 1, possibly on another machine. The run command checks if the image is present. If so, it restores the application. If not, it runs the application from scratch. This makes FastFreeze ideal to integrate with existing job systems that retry commands until the job succeeds.

    # same as step 1
    

Features

FastFreeze includes the following high-level features:

  • Unprivileged: FastFreeze does not need privileges like CAP_SYS_ADMIN to operate. We use a modified version of CRIU to accomplish this. In addition, we use set_ns_last_pid to control PIDs by cycling through PIDs at a rate of 100,000/s by essentially doing a fork bomb, until we reach the PID that we desire.

  • Fast: FastFreeze uses criu-image-streamer to perform fast checkpointings at speed of up to 15GB/s, given enough CPU and network bandwidth. This makes Google's preemptible VM and Amazon Spot VM offerings more attractive. FastFreeze can checkpoint and evacuate large applications (e.g., using 30GB of memory) within the tight eviction deadlines (~30secs).

  • Low overhead: FastFreeze needs less than 100MB of memory to perform a checkpoint or a restore. This memory headroom must be reserved in the container in addition to what the application uses. Note that the standard AWS S3 uploader (aws s3) tends to use a lot of memory (500MB) because it is written in Python and uses generous buffers. However, when using Google Cloud Storage, FastFreeze uses a bundled fast and memory efficient uploader, gcsthin, which uses 10MB of memory.

  • Compression: Checkpoint images can be compressed on the fly with lz4 or zstd. Setting the --cpu-budget option when checkpointing provides ways to control the compression algorithm. Compression is parallelized for optimal performance.

  • Encryption: Checkpoint images can be encypted on the fly with openssl. Setting the --passphrase-file option enables encryption using AES-256-CBC.

  • CPUID virtualization: FastFreeze enables CPU virtualization with libvirtcpuid. This enables the migration of applications within a heterogeneous datacenter. For example, starting an application on a machine that supports transactional memory can be migrated to a host that does not.

  • Time virtualization: FastFreeze implements time virtualization in userspace to offset the CLOCK_MONOTONIC when migrating to other machines with libvirttime. This feature is crucial for Java programs. Note there is a time namespace available in the kernel, but FastFreeze does not use it as it requires CAP_SYS_ADMIN.

  • File system: FastFreeze checkpoints and restore the files used by the application such as logs, and other temporary files. These files are not automatically detected, but rather, the user must specify the paths (files or directories) that must be preserved via the --preserve-path option.

  • Metrics: FastFreeze can be configured to emit metrics to an external service to collect checkpoint/restore stats. This is helpful to track the SLA of FastFreeze.

Non-root limitations

FastFreeze does not use privileged operations. This creates the following drawbacks:

  • FastFreeze must run within a Linux container. FastFreeze has the ability to create its own via --container name, or use an existing one (e.g., Kubernetes, Docker). This guarantees that there are no PID conflicts. The file system must remain unchanged when migrating an application to a different container. This is typically achieved by using the same container image.

  • The network connections are dropped upon restore. We rely on the application to be tolerant to network failures and reconnect to needed services.

  • The /proc/self/exe symlink is not restored and will point to the criu binary. When using gdb to attach to a restored program, one must pass the real executable path to gdb as such: gdb -p PID /path/to/exe.

  • Controlling PIDs without CAP_SYS_ADMIN can be slow if /proc/sys/kernel/pid_max is high. We recommend setting a value lower than 100,000.

  • Memory mapped files that have been deleted are not supported.

  • As FastFreeze assumes operating within a Linux container, it does not checkpoint/restore cgroups, seccomp, and user capabilities. We also do not support System V IPC. Create an issue if you need IPC support.

Supported Applications

FastFreeze supports most Linux applications, with some restrictions:

  • GPUs and external devices are not supported.

  • Applications that rely on host-dependent environment variables (like hostname, or job id) may have issues when migrated to a new host. Avoid relying on such variables, or caching host-dependent information.

  • Applications that use ptrace cannot be checkpointed (e.g., running under strace).

  • Due to CPUID virtualization, Only x86 64-bits applications running with GNU libc are supported. In practice, that means no musl libc, so no alpine docker.

  • Secure binaries are not supported. For example, an application that runs a script with sudo is a problem.

  • On some systems, apparmor can prevent the execution of certain application such as man because we relocate the system ld.so at /var/tmp/fastfreeze/run which may not be in the white-listed path of executable mmap files. This is not an issue in practice.

  • FastFreeze only supports a single application execution at a time within a container. An application can nevertheless be comprised of many processes and threads. To run two instances of FastFreeze, one must use two separate containers.

Non-features

  • Checkpoint images are not managed by FastFreeze. Pruning old images is not in the scope of FastFreeze.

Usage for running on a regular machine

Installation

FastFreeze is distributed in a self-contained 5MB package that can be extracted anywhere.

# might be needed to download and extract the archive
sudo apt-get install -y curl xz-utils

# Select the installation location. You may pick something like your home
# directory or /opt
cd ~

# This creates a fastfreeze directory in the current directory
curl -SL https://github.com/twosigma/fastfreeze/releases/download/v1.3.0/fastfreeze-v1.3.0.tar.xz | tar xJf -

# Optionally, you can make a fastfreeze symlink in ~/bin or /usr/local/bin for easy access.
ln -s $(pwd)/fastfreeze/fastfreeze ~/bin

# Confirm fastfreeze is working
fastfreeze/fastfreeze run sleep 60

Usage for running in Docker / Kubernetes

Installation

FastFreeze is distributed in a self-contained 5MB package that prefers to be extracted in /opt/fastfreeze (see more details below for why).

The following shows an example of the installation of FastFreeze in a Debian Docker image.

FROM debian:10

RUN apt-get update
RUN apt-get install -y curl xz-utils

RUN set -ex; \
  curl -SL https://github.com/twosigma/fastfreeze/releases/download/v1.3.0/fastfreeze-v1.3.0.tar.xz | \
    tar xJf - -C /opt; \
  ln -s /opt/fastfreeze/fastfreeze /usr/local/bin; \
  fastfreeze install

The install command overrides the system loader /lib64/ld-linux-x86-64.so.2, and creates /var/tmp/fastfreeze where files such as logs are kept. Note that replacing the system loader is useful even when not doing CPUID virtualization. It facilitates the injection of the time virtualiation library into all processes.

Enabling ptrace for CRIU

CRIU, the checkpointing engine, needs ptrace access on the machine. When denied, the error Unable to interrupt task: 1000 (Operation not permitted) will be shown.

To enable ptrace, there are two options:

  • Disable YAMA restrictions by setting /proc/sys/kernel/yama/ptrace_scope to 0 on the host where FastFreeze runs. See man ptrace(2) for more information.
  • Enable cap_sys_ptrace to the CRIU binary by adding the following in the Dockerfile. Note that this makes the CRIU binary secure, disabling LD_LIBRARY_PATH tricks. This means that the FastFreeze installation must be located in /opt/fastfreeze as we fallback on RPATH ELF tricks in this case. You'll need to provide CAP_SYS_PTRACE to the running container.
  RUN apt-get install -y libcap2-bin
  RUN setcap cap_sys_ptrace+eip /opt/fastfreeze/criu

Tutorial

You may try out FastFreeze with the following:

# First, save the previously suggested Dockerfile from the Installation section
# in the current directory
$ cat > Dockerfile

# Then, build the docker image
$ docker build . -t fastfreeze

# 1) Run the application for the first time
# In the following, we don't need cap_sys_ptrace, but it makes Docker
# relax its seccomp filters on kcmp(), which CRIU needs.
$ docker run \
  --rm -it \
  --user nobody \
  --cap-add=cap_sys_ptrace \
  --name ff \
  --mount type=bind,source=/tmp,target=/tmp \
  fastfreeze:latest \
  fastfreeze run --image-url file:/tmp/ff-test -- \
    bash -c 'for i in $(seq 100); do echo $i; sleep 1; done'

# The application is running. We should see on the terminal:
#   [ff.run] (0.001s) Time is Sat, 15 Aug 2020 05:21:41 +0000
#   [ff.run] (0.001s) Host is 44f6ce3d5b4a
#   [ff.run] (0.001s) Invocation ID is Jg9qyV
#   [ff.run] (0.012s) Fetching image manifest for file:/tmp/ff-test
#   [ff.run] (0.014s) Image manifest not found, running application from scratch
#   [ff.run] (0.030s) Application is ready, started from scratch
#   1
#   2
#   3
#   4

# 2) In another terminal, we invoke the checkpoint command
$ docker exec ff fastfreeze checkpoint

# We should see:
#   [ff.checkpoint] (0.000s) Time is Sat, 15 Aug 2020 05:21:54 +0000
#   [ff.checkpoint] (0.000s) Host is 44f6ce3d5b4a
#   [ff.checkpoint] (0.000s) Invocation ID is aaNN7y
#   [ff.checkpoint] (0.000s) Checkpointing application to file:/tmp/ff-test (num_shards=4 compressor=Lz4 prefix=aaNN7y)
#   tar: Removing leading `/' from member names
#   [ff.checkpoint] (0.014s) Uncompressed image size is 1 MiB, rate: 132 MiB/s
#   [ff.checkpoint] (0.017s) Checkpoint to file:/tmp/ff-test complete. Took 0.0s

# The first terminal should show:
#   [ff.run] (13.012s) Exiting with exit_code=137: Application caught fatal signal SIGKILL
#
# The application is now checkpointed. We can inspect the image in /tmp/ff-test
# We see that the image is split into 4 different pieces. This split is
# used to parallelize checkpointing, improving performance.
$ ls -lh /tmp/ff-test

#   total 116K
#   -rw-r--r-- 1 nobody nogroup 22K Aug 15 05:21 aaNN7y-1.ffs
#   -rw-r--r-- 1 nobody nogroup 19K Aug 15 05:21 aaNN7y-2.ffs
#   -rw-r--r-- 1 nobody nogroup 42K Aug 15 05:21 aaNN7y-3.ffs
#   -rw-r--r-- 1 nobody nogroup 23K Aug 15 05:21 aaNN7y-4.ffs
#   -rw-r--r-- 1 nobody nogroup  82 Aug 15 05:21 manifest.json

# 3) We restore the application by running the same command as in 1)
$ docker run \
  --rm -it \
  --user nobody \
  --cap-add=cap_sys_ptrace \
  --name ff \
  --mount type=bind,source=/tmp,target=/tmp \
  fastfreeze:latest \
  fastfreeze run --image-url file:/tmp/ff-test -- \
    bash -c 'for i in $(seq 100); do echo $i; sleep 1; done'

# We see in the terminal;
#  [ff.run] (0.000s) Time is Sat, 15 Aug 2020 05:29:53 +0000
#  [ff.run] (0.000s) Host is 4259e670e092
#  [ff.run] (0.000s) Invocation ID is V0qRYI
#  [ff.run] (0.015s) Fetching image manifest for file:/tmp/ff-test
#  [ff.run] (0.017s) Restoring application
#  [ff.run] (0.126s) Uncompressed image size is 1 MiB, rate: 134 MiB/s
#  [ff.run] (0.157s) Application is ready, restore took 0.2s
#  5
#  6
#  7
#  8

In this example, we used the local file system to store the checkpoint image, but in practice one would use something like AWS S3, or GCS.

Detailed Usage

Below is shown a synopsis of the FastFreeze available commands.

USAGE:
    fastfreeze <SUBCOMMAND>

SUBCOMMANDS:
    run           Run application. If a checkpoint image exists, the application is
                  restored. Otherwise, the application is run from scratch
    checkpoint    Perform a checkpoint of the running application
    extract       Extract a FastFreeze image to local disk
    wait          Wait for checkpoint or restore to finish
    install       Install FastFreeze in the specified directory

run

Run application. If a checkpoint image exists, the application is restored. Otherwise, the application is run from scratch

USAGE:
    fastfreeze run [OPTIONS] --image-url <url> [--] [app-args]...

OPTIONS:
        --image-url <url>          Image URL. S3, GCS and local filesystem are supported:
                                    * s3://bucket_name/image_path
                                    * gs://bucket_name/image_path
                                    * file:image_path
        --on-app-ready <cmd>       Shell command to run once the application is running
        --passphrase-file <file>   Provide a file containing the passphrase to be used for encrypting or
                                   decrypting the image. For security concerns, using a ramdisk like
                                   /dev/shm to store the passphrase file is preferable
        --preserve-path <path>...  Dir/file to include in the checkpoint image.
                                   May be specified multiple times.
                                   Multiple paths can also be specified colon separated
        --no-restore               Always run the app from scratch. Useful to ignore a faulty image
        --allow-bad-image-version  Allow restoring of images that don't match the version we expect
        --leave-stopped            Leave application stopped after restore, useful for debugging.
                                   Has no effect when running the app from scratch
    -v, --verbose                  Verbosity. Can be repeated

ARGS:
    <app-args>...    Application arguments, used when running the app from scratch. Ignored during restore

ENVS:
    FF_APP_PATH               The PATH to use for the application
    FF_APP_LD_LIBRARY_PATH    The LD_LIBRARY_PATH to use for the application
    FF_APP_VIRT_CPUID_MASK    The CPUID mask to use. See libvirtcpuid documentation for more details
    FF_APP_INJECT_<VAR_NAME>  Additional environment variables to inject to the application and its children.
                              For example, FF_APP_INJECT_LD_PRELOAD=/opt/lib/libx.so
    FF_METRICS_RECORDER       When specified, FastFreeze invokes the specified program to report metrics.
                              The metrics are formatted in JSON and passed as first argument
    CRIU_OPTS                 Additional arguments to pass to CRIU, whitespace separated
    S3_CMD                    Command to access AWS S3. Defaults to 'aws s3'
    GS_CMD                    Command to access Google Storage3. Defaults to 'gcs_streamer'

EXIT CODES:
    171          A failure happened during restore, or while fetching the image manifest.
                 Retrying with --no-restore will avoid that failure
    170          A failure happened before the application was ready
    128+sig_nr   The application caught a fatal signal corresponding to `sig_nr`
    exit_code    The application exited with `exit_code`

checkpoint

Perform a checkpoint of the running application

USAGE:
    fastfreeze checkpoint [OPTIONS]

OPTIONS:
        --leave-running            Leave application running after checkpoint
        --image-url <image-url>    Image URL, defaults to the value used during the run command
        --preserve-path <path>...  Dir/file to include in the image in addition to the ones specified during the
                                   run command. May be specified multiple times. Multiple paths can also be specified
                                   colon separated
        --num-shards <num-shards>  Level of parallelism. Split the image in multiple shards [default: 4]
        --cpu-budget <cpu-budget>  Amount of CPU at disposal. Possible values are [low, medium, high]. Currently,
                                   `low` skips compression, `medium` uses lz4, and high uses zstd [default: medium]
        --passphrase-file <file>   Enable image encryption. This points to a file containing a passphrase
                                   used to encrypt the image. The passphrase should contain at least 256
                                   bits of entropy

    -v, --verbose                  Verbosity. Can be repeated

ENVS:
    FF_METRICS_RECORDER  When specified, FastFreeze invokes the specified program to report metrics.
                         The metrics are formatted in JSON and passed as first argument
    CRIU_OPTS            Additional arguments to pass to CRIU, whitespace separated
    S3_CMD               Command to access AWS S3. Defaults to 'aws s3'
    GS_CMD               Command to access Google Storage3. Defaults to 'gcs_streamer'

extract

Extract a FastFreeze image to local disk

USAGE:
    fastfreeze extract [OPTIONS] --image-url <image-url>

OPTIONS:
    -i, --image-url <image-url>      Image URL, which can also be a regular local path
    -o, --output-dir <output-dir>    Output directory where to extract the image.
                                     Defaults to the last path component of image-url
        --allow-bad-image-version    Allow restoring of images that don't match the version we expect
    --passphrase-file <file>         Provide a file containing the passphrase to be used for decrypting the image
    -v, --verbose                    Verbosity. Can be repeated

ENVS:
    S3_CMD   Command to access AWS S3. Defaults to 'aws s3'
    GS_CMD   Command to access Google Storage3. Defaults to 'gcs_streamer'

wait

Wait for checkpoint or restore to finish

USAGE:
    fastfreeze wait [OPTIONS]

OPTIONS:
    -t, --timeout <timeout>    Fail after some specified number of seconds. Decimals are allowed
    -v, --verbose              Verbosity. Can be repeated

install

Install FastFreeze, mostly to setup virtualization

USAGE:
    fastfreeze install [OPTIONS]

OPTIONS:
    -v, --verbose    Verbosity. Can be repeated

Acknowledgments

License

FastFreeze is licensed under the Apache 2.0 license.