dvc-patterns/pipelines/parallel-stages at main · mnrozhkov/dvc-patterns

History

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
models		models
reports		reports
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
evaluate_models.py		evaluate_models.py
prepare_data.py		prepare_data.py
train_lr.py		train_lr.py
train_rf.py		train_rf.py

README.md

Parallel Stages (training)

Example: pipelines/parallel-stages

In Data Version Control (DVC), the concept of "Parallel Stages" refers to a design pattern where multiple stages of a pipeline are executed concurrently, rather than sequentially. This approach is particularly useful when you have stages that are independent of each other and can be run simultaneously, thereby improving the efficiency and reducing the overall runtime of your pipeline.

graph TD;
    data --> train_rf["Random Forest"];
    data --> train_lr["Linear Regression"];
    train_rf --> evaluate;
    train_lr --> evaluate;

Run

dvc repro data                                  # Prepare features
dvc repro -s train_rf & dvc repro -s train_lr   # Train models in parallel
dvc repro -s train_lr -f                        # Train only Linear Regression
dvc repro evaluate                              # Run downstream stages

Notes:

This example assumes that parallel stages are running on the same machine.

This pattern can be applied to any stage of a pipeline, not just training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel-stages

parallel-stages

README.md

Parallel Stages (training)

Files

parallel-stages

Directory actions

More options

Directory actions

More options

Latest commit

History

parallel-stages

Folders and files

parent directory

README.md

Parallel Stages (training)