-
Notifications
You must be signed in to change notification settings - Fork 475
multi-env evals config #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c4d690d to
6c047c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| def is_hub_env(env_id: str) -> bool: | ||
| """Check if env_id refers to a Hub environment (has owner/ prefix).""" | ||
| return "/" in env_id and not env_id.startswith("./") and not env_id.startswith("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatch between is_hub_env and parse_env_id causes unhandled crash
Medium Severity
The is_hub_env function accepts any string containing / (that doesn't start with ./ or /), but parse_env_id requires exactly two parts when split by /. An input like "a/b/c" passes is_hub_env but causes parse_env_id to raise an unhandled ValueError. Both check_hub_env_installed and install_from_hub call parse_env_id after is_hub_env returns True without catching this exception, causing the CLI to crash with a traceback instead of a helpful error message.
Description
This PR implements evaluating multiple environments in parallel via
vf-eval. For more details check the updated docs.This PR is mainly concerned with the config system. Cosmetic updates will be shipped separately, e.g see #735
Examples
By default, we still evaluate a single env with no changes to the interface
To configure multi-environment training, specify a comma-separated list of env ids
Note, that all environments use their default configuration. Since CLI arguments apply to all enviroments one can only change values for all environments at the same time. To have more fine-grained configurability, check below.
To configure multi-environment training with (potentially) different arguments for each specify a path to a TOML config file
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Enables running multiple environments in one invocation with config-driven control, plus install utilities and extensive tests/docs.
[[eval]]TOML configs (configs/eval/*.toml) parsed byload_toml_config()with validation and precedence; CLI positional becomesenv_id_or_config(single env or.tomlpath)EvalRunConfigandrun_evaluations()to execute multipleEvalConfigs concurrently; event-loop lag monitoring moved to the multi-run flowcheck_hub_env_installed()before runningverifiers.utils.install_utils(Hub/local/repo installers, package checks, ID parsing) andvf-installrewritten to use itdocs/evaluation.mdexpanded with multi-env usage, TOML schema, and configuration precedenceEvalRunConfig; minor casting/log-level tweaks inrlm_env.pyandasync_utils.pyWritten by Cursor Bugbot for commit 9252a96. This will update automatically on new commits. Configure here.