Add TextArena and Connect4 rubric examples (RFC 004) by Darktex · Pull Request #341 · meta-pytorch/OpenEnv

Darktex · 2026-01-29T18:43:45Z

Summary

Demonstrates rubric integration patterns with two environments per RFC 004:

TextArena (Wordle)

WordleRubric composite rubric with greens, yellows, repetitions, correct sub-rubrics
Migrates from legacy RewardProvider to Rubric pattern
Full backwards compatibility via get_reward_signals()

Connect4

Connect4WinLossRubric trajectory rubric for terminal games
Demonstrates exponential discounting for credit assignment
Shows reset() integration with environment lifecycle

Changes

New files

envs/textarena_env/rubrics.py - Wordle rubric implementation
envs/connect4_env/rubrics.py - Connect4 trajectory rubric
tests/envs/test_textarena_rubrics.py - 18 tests
tests/envs/test_connect4_rubrics.py - 12 tests

Modified files

envs/textarena_env/server/environment.py - Use rubric instead of RewardProvider
envs/connect4_env/server/connect4_environment.py - Add optional rubric support

Test plan

All 30 environment rubric tests pass
All 116 total rubric tests pass
Code formatted with ruff

Dependencies

This PR depends on #340 (Rubric base system).

Demonstrates rubric integration patterns with two environments: TextArena (Wordle): - WordleRubric composite with greens, yellows, repetitions, correct - Migrates from legacy RewardProvider to Rubric pattern - Full backwards compatibility via get_reward_signals() Connect4: - Connect4WinLossRubric trajectory rubric for terminal games - Demonstrates exponential discounting for credit assignment - Shows reset() integration with environment lifecycle 30 tests covering both environment rubrics.

greptile-apps · 2026-01-29T18:47:06Z

Greptile Overview

Greptile Summary

This PR demonstrates successful rubric integration patterns for two environments per RFC 004.

Key Changes

TextArena (Wordle): Migrates from legacy RewardProvider to composite WordleRubric with sub-rubrics for greens, yellows, repetitions, and correct scoring. Maintains full backward compatibility via get_reward_signals() method.
Connect4: Implements Connect4WinLossRubric as a trajectory rubric using exponential discounting for credit assignment. Returns terminal rewards (1.0 win, 0.0 loss, 0.5 draw) with proper reset() integration.
Test Coverage: 30 new tests (18 for TextArena, 12 for Connect4) covering rubric behavior, discounting, serialization, and environment integration.

Architecture Alignment

The implementation correctly follows RFC 004 patterns:

Rubrics live inside environments as self.rubric attribute
Environments call self._reset_rubric() in reset() to clear trajectory state
TrajectoryRubric accumulates steps internally, returns intermediate reward until done
compute_step_rewards() provides per-step rewards with discounting for training

Both implementations respect the "rewards inside environment" principle (INVARIANTS.md), keeping reward computation server-side within the environment boundary.

Confidence Score: 5/5

This PR is safe to merge - clean implementation of RFC 004 patterns with comprehensive tests and no breaking changes
Score reflects excellent code quality: proper RFC 004 implementation, comprehensive test coverage (30 tests), maintains backward compatibility, follows all system invariants, and includes clear documentation
No files require special attention

Important Files Changed

Filename	Overview
envs/connect4_env/rubrics.py	New trajectory rubric for Connect4 win/loss scoring with exponential discounting - clean implementation following RFC 004
envs/connect4_env/server/connect4_environment.py	Adds optional rubric support to Connect4 environment with proper reset integration
envs/textarena_env/rubrics.py	Migrates Wordle from legacy RewardProvider to new Rubric system with composite scoring for greens/yellows/repetitions/correct
envs/textarena_env/server/environment.py	Replaces RewardProvider pattern with Rubric integration, maintains backward compatibility via get_reward_signals()

Sequence Diagram

sequenceDiagram
    participant Training as Training Loop
    participant Env as Environment
    participant Rubric as Rubric
    participant Game as Game Logic

    Note over Training,Game: Episode Start
    Training->>Env: reset()
    Env->>Rubric: reset()
    Note over Rubric: Clear trajectory buffer
    Env->>Game: Initialize game state
    Env-->>Training: Initial observation

    Note over Training,Game: Game Loop
    loop Until done
        Training->>Env: step(action)
        Env->>Game: Apply action
        Game-->>Env: New game state
        Env->>Rubric: __call__(action, observation)
        
        alt Not Done (intermediate step)
            Rubric->>Rubric: Append to trajectory
            Rubric-->>Env: 0.0 (intermediate reward)
        else Done (terminal step)
            Rubric->>Rubric: Append to trajectory
            Rubric->>Rubric: score_trajectory()
            Note over Rubric: Compute final score<br/>(win=1.0, loss=0.0, draw=0.5)
            Rubric-->>Env: Final score
        end
        
        Env-->>Training: Observation with reward
    end

    Note over Training,Game: Episode Complete
    Training->>Rubric: compute_step_rewards()
    Note over Rubric: Apply discounting:<br/>r_t = gamma^(T-1-t) * final_score
    Rubric-->>Training: Per-step rewards for training

burtenshaw · 2026-02-05T14:26:09Z

@kashif @sergiopaniego this is relevant to TRL examples

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 29, 2026

burtenshaw added the rubrics label Feb 5, 2026

sergiopaniego mentioned this pull request Feb 6, 2026

Add OpenEnv's Rubrics support huggingface/trl#4994

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TextArena and Connect4 rubric examples (RFC 004)#341

Add TextArena and Connect4 rubric examples (RFC 004)#341
Darktex wants to merge 1 commit intofeat/rubrics-corefrom
feat/rubrics-env-examples

Darktex commented Jan 29, 2026

Uh oh!

greptile-apps bot commented Jan 29, 2026

Uh oh!

burtenshaw commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Darktex commented Jan 29, 2026

Summary

TextArena (Wordle)

Connect4

Changes

New files

Modified files

Test plan

Dependencies

Uh oh!

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

Key Changes

Architecture Alignment

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

burtenshaw commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants