Add some more options to play_w_env.py #126

joschu · 2025-11-24T07:48:55Z

No description provided.

claude · 2025-11-24T07:49:12Z

Claude finished @joschu's task —— View job

Code Review Complete ✓

I've reviewed the changes to play_w_env.py. Overall, this is a solid enhancement that adds useful flexibility to the interactive environment testing tool. Here's my detailed feedback:

Summary of Changes

This PR adds two new optional parameters to customize the interactive play experience:

multiline: Enables multi-line input mode (default: True)
show_observation: Controls whether to display the full observation at each step (default: False)

Positive Aspects ✓

Backwards Compatible: All changes use default parameters, so existing code continues to work without modification
Clean Implementation: The multiline input logic is well-structured with clear termination condition (two consecutive blank lines)
User Experience: The multiline mode adds helpful instructions and properly handles input collection
Consistent API: Parameters are threaded cleanly through play_env → ManualPolicy → get_async_input

Issues & Suggestions

1. Minor Formatting Inconsistency (tinker_cookbook/rl/play_w_env.py:61)

action_str = await get_async_input(colored("Your action:", "yellow"), multiline=self.multiline)

The prompt changed from "Your action: " (with trailing space) to "Your action:" (no trailing space). This is fine for multiline mode since it prints on its own line, but in single-line mode this makes the input appear directly after the colon without spacing.

Suggestion: Consider restoring the trailing space for single-line mode:

prompt_text = "Your action:" if self.multiline else "Your action: "
action_str = await get_async_input(colored(prompt_text, "yellow"), multiline=self.multiline)

2. Observation Display Control (tinker_cookbook/rl/play_w_env.py:54-59)

The show_observation flag is a good addition for environments where the observation might be verbose or redundant. However, I'm curious about the default choice:

Current default: show_observation=False (observation hidden by default)
Previous behavior: Observation always shown

Question: Was the decision to hide observations by default intentional? For debugging purposes (which is the stated goal in the docstring), showing observations might be more helpful as the default behavior. Consider whether show_observation=True would be a better default to maintain previous behavior.

3. Multiline Input Edge Case

The multiline logic is solid, but there's a subtle behavior worth noting:

if line == "" and prev_line == "":
    if lines and lines[-1] == "":
        lines.pop()
    break

This correctly removes the first blank line before terminating. However, prev_line starts as None, so a user can't terminate immediately with two blank lines at the start—they must enter at least one character. This is probably fine (prevents accidental empty inputs), but worth documenting if it's intentional.

4. Documentation Gap

The module docstring at the top mentions the example, but doesn't document the new parameters. Consider adding a brief note:

"""
To help you debug your environment, you can use the play_env function to play as the policy by typing in your responses in an environment interactively.

Options:
- multiline=True: Enable multi-line input (terminate with two blank lines)
- show_observation=False: Display the full observation at each step

We include an example of playing the Twenty Questions environment in the main function.
...
"""

Alignment with Repository Conventions

Based on CLAUDE.md guidelines:

✓ Type hints: Properly typed parameters
✓ Async patterns: Correctly uses async/await throughout
✓ Minimal changes: Focused on the specific feature without unnecessary refactoring
✓ No over-engineering: Simple, direct implementation

Testing Recommendations

Since this is a utility for manual testing, automated tests might be challenging. However, consider:

Testing the multiline input logic with various input patterns (empty inputs, single line in multiline mode, etc.)
Verifying both show_observation=True and False modes work correctly
Ensuring the existing Twenty Questions example in main() still works as expected

Final Recommendation

Approve with minor suggestions. The changes are well-implemented and provide useful flexibility. The only concerns are:

Minor formatting issue with the prompt spacing (easy fix)
Question about the show_observation=False default (may want to reconsider)

These are optional improvements—the code is functional as-is and ready to merge if the defaults align with your intended use cases.

joschu · 2025-11-24T07:52:46Z

@claude please make the following changes (based on your review):

implement your suggestion re whitespace
keep show_observation=True as default
add a comment, don't bother changing
no need to document show_observation, this only there for an obscure reason. include the doc for multitline though

claude · 2025-11-24T07:53:00Z

Claude finished @joschu's task —— View job

Changes Implemented ✓

All requested changes have been completed and pushed to the branch:

Fix whitespace: Add trailing space for single-line mode prompt
Change show_observation default from False to True
Add documentation for multiline parameter in module docstring
Commit and push changes

Summary of Changes

1. Prompt spacing fix (line 64-65)

prompt_text = "Your action:" if self.multiline else "Your action: "
action_str = await get_async_input(colored(prompt_text, "yellow"), multiline=self.multiline)

Now single-line mode includes a trailing space for better readability.

2. show_observation default changed to True (lines 50 & 87)

ManualPolicy.__init__: show_observation: bool = True
play_env: show_observation: bool = True

This maintains the previous behavior of showing observations by default.

3. Documentation added (lines 4-5)

Options:
- multiline=True: Enable multi-line input mode (terminate with two blank lines)

As requested, only documented the multiline parameter (not show_observation).

|

…tion default, add multiline docs - Add trailing space to prompt in single-line mode for better UX - Change show_observation default from False to True - Add documentation for multiline parameter in module docstring Co-authored-by: John Schulman <[email protected]>

.

bffb271

github-actions bot and others added 2 commits November 24, 2025 07:54

.

7196a36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add some more options to play_w_env.py #126

Add some more options to play_w_env.py #126

Uh oh!

joschu commented Nov 24, 2025

Uh oh!

claude bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

joschu commented Nov 24, 2025

Uh oh!

claude bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add some more options to play_w_env.py #126

Are you sure you want to change the base?

Add some more options to play_w_env.py #126

Uh oh!

Conversation

joschu commented Nov 24, 2025

Uh oh!

claude bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✓

Summary of Changes

Positive Aspects ✓

Issues & Suggestions

1. Minor Formatting Inconsistency (tinker_cookbook/rl/play_w_env.py:61)

2. Observation Display Control (tinker_cookbook/rl/play_w_env.py:54-59)

3. Multiline Input Edge Case

4. Documentation Gap

Alignment with Repository Conventions

Testing Recommendations

Final Recommendation

Uh oh!

joschu commented Nov 24, 2025

Uh oh!

claude bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Implemented ✓

Summary of Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Nov 24, 2025 •

edited

Loading

claude bot commented Nov 24, 2025 •

edited

Loading