Skip to content

mlrepa/dvc-8-multiple-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pattern: Multiple datasets for validation

Install

python3 -m venv .venv
echo 'export PYTHONPATH=.' >> .venv/bin/activate
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt

Run

  1. Define list of datasets we need to run evaluation on in params.yaml Example:
evaluate:
  datasets: ['micro', 'customer-1', 'customer-2']
  ...
  1. Setup evaluate stage with foreach templating syntax in dvc.yaml
  evaluate:
    foreach:
      ${evaluate.datasets}
    do:
      cmd: python stages/evaluate.py --config=params.yaml --dataset=${item}
      deps:
      ...
      params:
      ...
      metrics:
      - ${reports_dir}/${evaluate.metrics_dir}/metrics_${item}.json:
          cache: false
  1. (Optional) Collect metrics into a single metrics.json file and run metrics value range checks in separate stages
  collect_metrics:
    cmd: python stages/collect_metrics.py --config=params.yaml
    deps:
    ...
    - ${reports_dir}/${evaluate.metrics_dir}
    params:
    ...
    metrics:
    - ${reports_dir}/${collect_metrics.metrics}:
        cache: false
  
  check_metrics:
    cmd: python stages/check_metrics.py --config=params.yaml
    deps:
    ...
    - ${reports_dir}/${collect_metrics.metrics}
    params:
    ...
    plots:
    - ${reports_dir}/${check_metrics.report}:
        cache: false

Run pipeline

dvc exp run

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages