Skip to content

Feature/dmcontrol env#319

Open
mreso wants to merge 4 commits intometa-pytorch:mainfrom
mreso:feature/dmcontrol_env
Open

Feature/dmcontrol env#319
mreso wants to merge 4 commits intometa-pytorch:mainfrom
mreso:feature/dmcontrol_env

Conversation

@mreso
Copy link

@mreso mreso commented Jan 22, 2026

Summary

Adds dm_control_env, a new OpenEnv environment wrapping https://github.com/google-deepmind/dm_control for MuJoCo-based continuous control tasks.

Key features:

  • Supports 40+ environments across 18 domains (cartpole, walker, humanoid, cheetah, hopper, quadruped, etc.)
  • Dynamic environment switching via reset(domain_name="...", task_name="...")
  • Optional visual observations (base64-encoded PNG rendering)
  • macOS compatibility with threading-safe async methods
  • Full client-server architecture following OpenEnv patterns

Files added:

  • envs/dm_control_env/client.py - WebSocket client with from_direct() factory
  • envs/dm_control_env/models.py - Pydantic models (DMControlAction, DMControlObservation, DMControlState)
  • envs/dm_control_env/server/ - FastAPI server and Environment implementation
  • envs/dm_control_env/examples/ - Control examples for cartpole, hopper, and quadruped
  • envs/dm_control_env/README.md - Documentation with usage examples

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • New environment
  • Refactoring

Alignment Checklist

Before submitting, verify:

  • I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
  • I have checked .claude/docs/INVARIANTS.md and no invariants are violated
  • I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

  • Not required (bug fix, docs, minor refactoring)
  • RFC exists: #___
  • RFC needed (will create before merge)

Test Plan

cd envs/dm_control_env

  • Test client-server mode:
    PYTHONPATH=src:envs uvicorn envs.dm_control_env.server.app:app --port 8765

In another terminal:

python -c "from dm_control_env import DMControlEnv; c = DMControlEnv('http://localhost:8765'); print(c.reset())"

Claude Code Review

Alignment Review Report

Automated Checks

  • Lint: PASS - 77 files already formatted
  • Debug code: CLEAN - No debugger statements found in dm_control_env

Open RFCs Context
┌───────────────────────┬─────────────┬──────────────────────────────────────────────┐
│ RFC │ Status │ Relevance │
├───────────────────────┼─────────────┼──────────────────────────────────────────────┤
│ 000-project-phases.md │ Implemented │ Design principles - foundational │
├───────────────────────┼─────────────┼──────────────────────────────────────────────┤
│ 001-abstractions.md │ Implemented │ Environment/Client abstractions │
├───────────────────────┼─────────────┼──────────────────────────────────────────────┤
│ 002-env-spec.md │ Implemented │ Environment specification │
├───────────────────────┼─────────────┼──────────────────────────────────────────────┤
│ 003-mcp-support.md │ Implemented │ MCP integration (not used by dm_control_env) │
└───────────────────────┴─────────────┴──────────────────────────────────────────────┘
No Draft or In Review RFCs that would conflict with dm_control_env.

Tier 1: Fixes Required

None identified. The dm_control_env code passes all automated checks.

Tier 2: Alignment Discussion

Principle Conflicts

None identified. The dm_control_env implementation follows OpenEnv principles:
┌────────────────────────────┬────────┬────────────────────────────────────────────────────────────────────┐
│ Principle │ Status │ Evidence │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Gymnasium-style API │ ✅ │ Uses reset(), step(), state │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Container isolation │ ✅ │ Has server/Dockerfile │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Type safety with generics │ ✅ │ Environment[DMControlAction, DMControlObservation, DMControlState] │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Pydantic serialization │ ✅ │ All models extend Action, Observation, State │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Rewards inside environment │ ✅ │ Reward from dm_control passed through, not computed externally │
├────────────────────────────┼────────┼────────────────────────────────────────────────────────────────────┤
│ Client-server separation │ ✅ │ client.py does not import from server/ │
└────────────────────────────┴────────┴────────────────────────────────────────────────────────────────────┘
RFC Conflicts

None identified. The dm_control_env is a standard environment implementation that:

  • Does not introduce new core APIs
  • Does not change existing interfaces
  • Follows established patterns from echo_env
  • Does not require MCP support (uses standard Gym-like API only)

Per RFC README: "You generally don't need an RFC for new example environments (unless they introduce new patterns)." dm_control_env follows existing patterns.

Invariant Check
┌──────────────────────────┬────────┬────────────────────────────────────┐
│ Invariant │ Status │ Notes │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Gymnasium API signatures │ ✅ │ Standard reset(), step(), state │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Generic type safety │ ✅ │ Proper generic types used │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Pydantic serialization │ ✅ │ All wire types are Pydantic models │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Agent isolation │ ✅ │ No MCP tools exposing reset/step │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Container isolation │ ✅ │ Dockerfile provided │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Client-server separation │ ✅ │ No cross-imports │
├──────────────────────────┼────────┼────────────────────────────────────┤
│ Rewards in environment │ ✅ │ Uses dm_control's native reward │
└──────────────────────────┴────────┴────────────────────────────────────┘
Summary

  • 0 mechanical issues to fix
  • 0 alignment points for human review
  • 0 RFC conflicts to discuss

Verdict: READY FOR REVIEW - The dm_control_env follows all OpenEnv principles and invariants. It is a standard environment implementation without any architectural deviations.

mreso added 3 commits January 21, 2026 14:59
Add quadruped example
Added hopper examples
Align example cli
Fix libglx installation in docker and enable exiting with ctrl + c inside docker
Add screenshots
Increase random forces
- Rename directory from dmcontrol_env to dm_control_env
- Update all internal import paths and module references
- Add screenshots to README (cartpole.png, quadruped.png)
- Update examples with consistent CLI args (--visual, --headless, --task)
- Increase random force magnitude in hopper/quadruped examples
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 22, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile Summary

This PR adds dm_control_env, a new OpenEnv environment wrapping Google DeepMind's dm_control library to provide access to 40+ MuJoCo-based continuous control tasks across 18 domains (cartpole, walker, humanoid, cheetah, hopper, quadruped, etc.).

Key Features Implemented:

  • Full client-server architecture following OpenEnv patterns with WebSocket communication
  • Dynamic environment switching via reset(domain_name="...", task_name="...") without restarting the server
  • Optional visual observations (base64-encoded PNG rendering) controlled by render flag
  • macOS compatibility with threading-safe async methods (MuJoCo crashes when run in background threads on macOS, so synchronous fallback is used)
  • Proper reward passthrough from dm_control's native reward computation (not externally computed)
  • Type-safe generics with Pydantic models extending OpenEnv base types
  • Concurrent session support enabled (SUPPORTS_CONCURRENT_SESSIONS = True)
  • Comprehensive documentation with three interactive examples (cartpole, hopper, quadruped) demonstrating OpenEnv step/observation pattern

Implementation Quality:

  • Follows all OpenEnv principles from PRINCIPLES.md (Gymnasium API, container isolation, type safety, rewards in environment)
  • No invariant violations found - proper client-server separation, no MCP tools exposing reset/step to agents
  • Extensive fallback import handling for flexible deployment contexts
  • Proper error handling with helpful macOS-specific guidance for MuJoCo/OpenGL issues
  • Multi-stage Dockerfile with appropriate MuJoCo/OpenGL dependencies

Additional Improvements:

  • Applied exec to CMD in Dockerfiles across multiple environments (echo_env, repl_env, textarena_env, unity_env, websearch_env) for proper SIGINT/SIGTERM signal handling

Confidence Score: 5/5

  • This PR is safe to merge with no identified issues
  • Score reflects exemplary adherence to all OpenEnv principles and invariants, comprehensive testing as evidenced by the automated alignment review, proper handling of platform-specific issues (macOS threading), clean architecture with no client-server boundary violations, and high-quality documentation. The implementation follows established patterns from existing environments and introduces no new architectural concerns.
  • No files require special attention

Important Files Changed

Filename Overview
envs/dm_control_env/client.py WebSocket client implementing DMControlEnv with proper type safety, flexible import handling, and from_direct() factory for embedded server - follows OpenEnv patterns correctly
envs/dm_control_env/models.py Pydantic models for Action/Observation/State extending core OpenEnv types, includes comprehensive list of 40+ available environments
envs/dm_control_env/server/dm_control_environment.py Environment implementation wrapping dm_control.suite with dynamic environment switching, macOS threading workarounds, and proper reward passthrough from dm_control
envs/dm_control_env/server/app.py FastAPI application using create_app factory with concurrent session support enabled
envs/dm_control_env/server/Dockerfile Multi-stage Docker build with MuJoCo/OpenGL dependencies, proper exec usage for signal handling
envs/dm_control_env/pyproject.toml Package configuration with dm_control and mujoco dependencies, optional interactive/dev dependencies

Sequence Diagram

sequenceDiagram
    participant Client as DMControlEnv Client
    participant WS as WebSocket Connection
    participant Server as FastAPI Server
    participant Env as DMControlEnvironment
    participant DMC as dm_control.suite
    
    Note over Client,DMC: Initialization
    Client->>Server: HTTP GET /health
    Server-->>Client: 200 OK
    Client->>Server: WebSocket Connect
    Server->>Env: Create Environment Instance
    Env->>DMC: Load domain/task
    DMC-->>Env: Environment ready
    Server-->>Client: WebSocket Connected
    
    Note over Client,DMC: Reset Episode
    Client->>WS: reset(domain_name, task_name, render=True)
    WS->>Server: WebSocket message
    Server->>Env: reset_async()
    Env->>DMC: reset()
    DMC-->>Env: TimeStep (observations, reward)
    Env->>DMC: render() [if render=True]
    DMC-->>Env: RGB pixels
    Env-->>Server: DMControlObservation (obs, pixels, reward, done)
    Server-->>WS: JSON response
    WS-->>Client: StepResult[DMControlObservation]
    
    Note over Client,DMC: Step Loop
    loop Until done
        Client->>WS: step(DMControlAction)
        WS->>Server: WebSocket message
        Server->>Env: step_async(action)
        Env->>DMC: step(action_array)
        DMC-->>Env: TimeStep (observations, reward, done)
        Env->>DMC: render() [if render enabled]
        DMC-->>Env: RGB pixels
        Env-->>Server: DMControlObservation
        Server-->>WS: JSON response
        WS-->>Client: StepResult[DMControlObservation]
    end
    
    Note over Client,DMC: State Query
    Client->>Server: HTTP GET /state
    Server->>Env: state property
    Env-->>Server: DMControlState (domain, task, specs)
    Server-->>Client: JSON response
    
    Note over Client,DMC: Cleanup
    Client->>Server: WebSocket Close
    Server->>Env: close()
    Env->>DMC: close()
    DMC-->>Env: Cleanup complete
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@zkwentz
Copy link
Contributor

zkwentz commented Jan 22, 2026

Nice! Will take a closer look on desktop in a few hours.

@burtenshaw
Copy link
Collaborator

Hey @mreso . Thanks for this and sorry to go quiet on this. Some high level changes please:

  • Can you also deployed it to the HF hub?
  • Then update the environments page in the docs
  • I would remove all of the exec uvicorn changes in other envs and open a separate PR for those.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants