Skip to content

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Jan 16, 2026

Description

Allows passing a Callable[[], Dataset] in place of Dataset for dataset / eval_dataset, lazy-loads when get_dataset / get_eval_dataset is first called.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Adds optional lazy dataset loading to environments, enabling deferred construction of Datasets via a DatasetBuilder callable.

  • Core: Environment now accepts Dataset | DatasetBuilder for dataset/eval_dataset, with new build_dataset()/build_eval_dataset() and formatting via _format_dataset_source; get_*_dataset triggers builds; preserves eager build for raw Datasets. Stores map_kwargs and validates message_type earlier.
  • Group/Trainer: EnvGroup concatenation now builds sub-env datasets via env.build_dataset()/build_eval_dataset(); RL orchestrator filters with env.get_dataset() instead of accessing env.dataset directly.
  • Types/Exports: Add vf.DatasetBuilder type alias and export from verifiers.__init__.
  • Docs: Add "Lazy Loading with DatasetBuilder" section to docs/environments.md and environments/AGENTS.md; expand RLMEnv details.
  • Example env: Refactor environments/alphabet_sort to use get_dataset_builder, extract helpers, and return builder-backed SortingEnv; bump version to 0.1.11.
  • Misc: Minor typing/casts in data_utils.py; small eval input simplification.

Written by Cursor Bugbot for commit 714d8b0. This will update automatically on new commits. Configure here.

@willccbb willccbb requested a review from mikasenghaas January 16, 2026 22:51
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, i like the pattern a lot. can you confirm my understanding: directly passing Dataset will eagerly build it, this means for such envs we will still "double" load datasets on e.g. the prime-rl orchestrator (which needs access to the dataset for sampling reasons) + on each env worker. but with this pattern we can use the builder class in which case the env workers would never have to build the dataset and it only exists on the orchestrator

im still a little worried abt the arbirary code execution part of load_environment which may have unpredictable side-effects (e.g. wiki-search creating a locked local db instance) but im not sure there's anything we can do abt it? maybe just move such init into a special space (analogous to setup_state but not on a per-rollout basis but globally?). in this world, setting up global state in load_environment would just be undefined behavior?

@willccbb
Copy link
Member Author

For now this doesn't deduplicate behavior, we can handle that at env client/server level.

It's just allowing deferral of dataset-specific preprocessing to whenever get_dataset is first called, so usage patterns which only call it for one of many replicas (like only getting data from a master worker + sending rollout requests to all workers) only need one loading total.

@mikasenghaas mikasenghaas mentioned this pull request Jan 19, 2026
19 tasks
@willccbb willccbb merged commit ea7eaa8 into main Jan 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants