Skip to content

zincware/paraffin

Repository files navigation

zincware PyPI version

paraffin

Paraffin, derived from the Latin phrase parum affinis meaning little related, is a Python package designed to run DVC stages in parallel. While DVC does not currently support this directly, Paraffin provides an effective workaround. For more details, refer to the DVC documentation on parallel stage execution.

Warning

paraffin is still very experimental. Do not use it for production workflows.

Installation

Install Paraffin via pip:

pip install paraffin

Usage

To use Paraffin, you can run the following to run up to 4 DVC stages in parallel:

paraffin -n 4 <stage names>

If you have pip install dash you can also access the dashboard by running

paraffin --dashboard <stage names>

For more information, run:

paraffin --help

Labels

You can run paraffin in multiple processes (e.g. on different hardware with a shared file system). To specify where a stage should run, you can assign labels to each worker.

paraffin --labels GPU # on a GPU node
paraffin --label CPU intel # on a CPU node

To configure the stages you need to create a paraffin.yaml file as follows:

labels:
    GPU_TASK:
        - GPU
    CPU_TASK:
        - CPU
    SPECIAL_CPU_TASK:
        - CPU
        - intel

All stages that are not part of the paraffin.yaml will choose any of the available workers.

Tip

If you are building Python-based workflows with DVC, consider trying our other project ZnTrack for a more Pythonic way to define workflows.