Add the Reasoning Gym set of environments#326
Add the Reasoning Gym set of environments#326zafstojano wants to merge 14 commits intometa-pytorch:mainfrom
Conversation
Integrate reasoning_gym library to provide single-step reasoning tasks. Each episode presents one question from a configurable dataset, the agent submits an answer, and receives a score (0.0 to 1.0). Features: - Single-step episodes: reset() provides question, step() validates answer - Dataset persistence: Dataset reused across resets until config changes - Flexible configuration: Supports simple and composite datasets - Concurrent sessions: Multiple clients can connect simultaneously Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace EchoEnv template content with accurate documentation for Reasoning Gym environment. Update includes: - Single-step reasoning task workflow - Dataset configuration (simple and composite) - Dataset persistence behavior - Correct action/observation models (answer, score, question) - Reward structure (score-based, not length-based) - Use cases for LLM evaluation and agent training Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Show how to access the dataset_metadata field in the Quick Start example, demonstrating the full observation interface. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add comprehensive test suite with 26 tests covering environment behavior, models, client, and integration workflows - Fix imports in server files to support both Docker (direct import) and local testing (relative import) - Fix minor formatting issue in docstring Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Greptile OverviewGreptile SummaryAdded Reasoning Gym environment integration to OpenEnv, providing 100+ single-step reasoning tasks with verifiable rewards. Key Implementation Details:
Architecture Alignment:
Design Philosophy: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as ReasoningGymEnv<br/>(Client)
participant WS as WebSocket<br/>Connection
participant Server as FastAPI<br/>Server
participant Env as ReasoningGymEnvironment
participant RG as reasoning_gym<br/>Library
Note over Client,RG: Initial Setup & First Episode
Client->>Server: Connect (WebSocket)
Server->>Env: Create environment instance
Client->>WS: reset(dataset_name='leg_counting',<br/>seed=42, size=10)
WS->>Server: Forward reset request
Server->>Env: reset(...)
Env->>RG: create_dataset('leg_counting',<br/>seed=42, size=10)
RG-->>Env: Dataset instance
Env->>Env: Create iterator from dataset
Env->>Env: Get next question from iterator
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
Note over Client,RG: Agent Answers Question
Client->>WS: step(ReasoningGymAction(answer="4"))
WS->>Server: Forward step request
Server->>Env: step(action)
Env->>RG: score_answer(answer, entry)
RG-->>Env: score (0.0-1.0)
Env-->>Server: ReasoningGymObservation<br/>(score, correct_answer, done=True)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with score
Note over Client,RG: Next Question (Reuse Dataset)
Client->>WS: reset() [no params]
WS->>Server: Forward reset request
Server->>Env: reset()
Env->>Env: Reuse existing dataset
Env->>Env: Get next question from iterator
Note over Env: If iterator exhausted,<br/>wrap around to start
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
Note over Client,RG: New Dataset Configuration
Client->>WS: reset(dataset_name='composite',<br/>dataset_specs=[...], seed=99, size=30)
WS->>Server: Forward reset request
Server->>Env: reset(...)
Env->>RG: create_dataset('composite',<br/>datasets=specs, seed=99, size=30)
RG-->>Env: New dataset instance
Env->>Env: Create new iterator
Env->>Env: Get first question
Env-->>Server: ReasoningGymObservation<br/>(question, done=False)
Server-->>WS: Serialize observation
WS-->>Client: StepResult with question
|
|
tagging @burtenshaw @Darktex for visibility :) |
|
This looks good. Have you also deployed it to the HF hub, and updated the environments page |
…v into feat/reasoning-gym-env
|
I have added the env card. Regarding deploying to HF spaces, I have an issue. First, building the image with openenv build logsEven though I successfully push the env from CLI: I get a build error on the space: Any help is appreciated |
Summary
Hey there, I am one of the core contributors of Reasoning Gym - a suite of 100+ environments with verifiable rewards. I would be really happy to contribute this set of procedural data generators to OpenEnv!
Since these are all single-step environments, I went with the following design philosophy:
env.reset(...)creates an environment with the passed arguments:done=True. This time, simply callingenv.reset()with no arguments will yield a new generated sample from the previously instantiated environment.env.reset(...)with new dataset configs, it will re-instantiate a new dataset and continue yielding data from there:Type of Change
Alignment Checklist
Before submitting, verify:
.claude/docs/PRINCIPLES.mdand this PR aligns with our principles.claude/docs/INVARIANTS.mdand no invariants are violated/pre-submit-pr(orbash .claude/hooks/lint.shand tests) and addressed all issuesRFC Status
Test Plan
After building the Docker image, I have created a small script to test out the calls to the environment
Sample script
Script Output
Claude Code Review