Configuration lives under cfg/<mode>/...
where <mode>
is one of:
sweep
– hyperparameter exploration (multiple non-seed dimensions allowed; seed must be scalar)experiment
– focused evaluation (only the seed may vary; other hyperparameters must be scalar)
Directory pattern for a leaf config:
cfg/<mode>/<dataset>/<model>/<trainer>/cfg.yaml
Each leaf cfg.yaml
is merged with its ancestor cfg.yaml
files (walking upward until the cfg/
root) to form a base configuration. Merging is shallow: child keys overwrite parent keys.
Expansion writes resolved per‑trial configurations to:
out/log/<mode>/<dataset>/<model>/<trainer>/trial_###/cfg.yaml
Trials are the Cartesian product of list‑valued hyperparameters excluding those on a structural allowlist (currently only batch_metrics
). Special handling:
seed:
list -> becomes a sweep over seeds (each trial gets one scalar value)seed:
scalar -> required if no list sweep is desired
Mode rules enforced during expansion:
- sweep mode:
seed
must be scalar; any other list hyperparameters are allowed. - experiment mode: only
seed
may be list-valued; all other hyperparameters must be scalar (aside from structural allowlist likebatch_metrics
).
Re‑running expansion is idempotent: resolved trial cfg.yaml
files are regenerated (so parent edits propagate). A trial is considered complete when batch_log.csv
exists in the same trial_###
directory.
Summary:
- Hierarchical inheritance (leaf overrides parents).
- Lists -> sweep axes (except structural allowlist).
- A list-form
seed
defines a seed axis; else scalarseed
required. - Mode-specific constraints (see above) validated at expansion time.
- Resolved configs live only under
out/log/...
and drive all training & analysis.
Programmatic expansion + execution:
from src.run import run_trials
# All modes (sweep + experiment) across all datasets
run_trials()
# Only sweep mode for mnist
run_trials(datasets=["mnist"], modes=["sweep"])
During execution each trial directory accumulates logs (e.g. batch_log.csv
). The dashboard and analysis scripts read directly from out/log
.
This project is managed with uv. Always invoke Python modules and tests via uv run -m
so the correct environment and dependency resolution are used.
Examples:
# Run sweep
uv run -m run.sweep
# Run experiment
uv run -m run.experiment
# Run full test suite
uv run -m pytest
Direct python
or executing files as scripts is discouraged; prefer the module form above to ensure imports resolve consistently.
The dashboard provides fast, interactive plots of training metrics across trials. Launch it with:
uv run -m run.dashboard
It serves dashboard/index.html
and reads logs directly from out/log/.../trial_###/
.
-
Epoch aggregation: If
epoch_log.csv
is missing for a trial, the server synthesizes it on-demand frombatch_log.csv
with one row per (epoch, mode). Reductions:- loss, acc, and other numeric metrics: mean across batches (unweighted)
- lr, iter_budget: last value within the epoch
- grad_norm, error_min/med/max, iter_min/med/max: median within the epoch
- time: end-of-epoch time (max of batch times)
- n_batches is included for diagnostics
-
Caching & freshness: The generated
epoch_log.csv
is reused for subsequent views. Ifbatch_log.csv
is newer, the server regeneratesepoch_log.csv
automatically. Writes are atomic to avoid partial files.
The UI prefers epoch_log.csv
(small, fast) and falls back to batch_log.csv
if needed. No manual preprocessing required.