Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gymnasium support #192

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a413077
Adds gymnasium to the setup
prabhatnagarajan Apr 15, 2023
35a420c
gym -> gymnasium
prabhatnagarajan Apr 15, 2023
e1d7ead
modifies calls to env step to use truncations
prabhatnagarajan Apr 15, 2023
c804fe3
some Atari changes
prabhatnagarajan Apr 15, 2023
5daca4c
Makes more env modifications
prabhatnagarajan Apr 15, 2023
85fe46f
Fixes some observations, and uses new Gym AtariEnv properly
prabhatnagarajan Apr 16, 2023
c7d62f7
makes some evaluator updates
prabhatnagarajan Apr 16, 2023
b51ae32
Gets evaluations working by modifying RandomizeAction class
prabhatnagarajan Apr 16, 2023
ffdc311
fixes setup
prabhatnagarajan Apr 17, 2023
07c464b
Adds a generic GymWrapper
prabhatnagarajan Apr 18, 2023
675c978
Shifts Pendulum version in example to v1 since v0 is deprecated
prabhatnagarajan Apr 23, 2023
85c38e1
Adds Q value computation to DDQN (and by extension DDQN)
prabhatnagarajan Apr 27, 2023
02048ae
removes filelock from setup
prabhatnagarajan May 10, 2023
98a4efc
removes all required items
prabhatnagarajan May 10, 2023
4818545
fixes setup
prabhatnagarajan Jun 24, 2023
e3c867c
merges with master
prabhatnagarajan Dec 28, 2023
3f98ef7
does gymnasium all to gymnasium atari
prabhatnagarajan Dec 28, 2023
b145609
Fixes multiprocessvector_env step
prabhatnagarajan Apr 2, 2024
0c770b5
Multiprocess fixes
brett-daley Apr 2, 2024
e82663e
Merge pull request #6 from brett-daley/gymnasium_support
prabhatnagarajan Apr 2, 2024
04c1dd5
OpenAI -> Farama Foundation
prabhatnagarajan Apr 3, 2024
c408b08
Makes modifications for gymnasium imports, etc.
prabhatnagarajan Apr 3, 2024
03f203f
Removes continuing_time_limit now that gymnasium has truncation
prabhatnagarajan Apr 3, 2024
5614323
Remove Monitor
prabhatnagarajan Apr 3, 2024
4b0494e
Removes things from __init__
prabhatnagarajan Apr 6, 2024
e92cc63
Moves gym folder in examples to gymnasium
prabhatnagarajan Apr 6, 2024
420dddb
Fixes some imports and some tests
prabhatnagarajan Apr 10, 2024
74198b9
Fixes Randomize Action Wrapper
prabhatnagarajan May 7, 2024
3237561
merges with main
prabhatnagarajan Jul 26, 2024
6f0eac6
Merge branch 'master' into gymnasium_support
prabhatnagarajan Aug 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ build/
dist/
.idea/
results/
examples/gym/results/
examples/gymnasium/results/
2 changes: 1 addition & 1 deletion .pfnci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ main() {
# pytest does not run with attrs==19.2.0 (https://github.com/pytest-dev/pytest/issues/3280) # NOQA
"${PYTHON}" -m pip install \
'pytest==4.1.1' 'attrs==19.1.0' 'pytest-xdist==1.26.1' \
'gym[atari,classic_control]==0.19.0' 'optuna' 'zipp==1.0.0' 'pybullet==2.8.1' 'jupyterlab==2.1.5' 'traitlets==5.1.1' 'pyglet==1.5.27'
'gymnasium[atari,classic_control]==0.19.0' 'optuna' 'zipp==1.0.0' 'pybullet==2.8.1' 'jupyterlab==2.1.5' 'traitlets==5.1.1'

git config --global user.email "[email protected]"
git config --global user.name "Your Name"
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Refer to [Installation](http://pfrl.readthedocs.io/en/latest/install.html) for m

## Getting started

You can try [PFRL Quickstart Guide](examples/quickstart/quickstart.ipynb) first, or check the [examples](examples) ready for Atari 2600 and Open AI Gym.
You can try [PFRL Quickstart Guide](examples/quickstart/quickstart.ipynb) first, or check the [examples](examples) ready for Atari 2600 and Farama Foundation's gymnasium.

For more information, you can refer to [PFRL's documentation](http://pfrl.readthedocs.io/en/latest/index.html).

Expand Down Expand Up @@ -64,9 +64,9 @@ Following algorithms have been implemented in PFRL:
- [ACER (Actor-Critic with Experience Replay)](https://arxiv.org/abs/1611.01224)
- examples: [[atari]](examples/atari/train_acer_ale.py)
- [Categorical DQN](https://arxiv.org/abs/1707.06887)
- examples: [[atari]](examples/atari/train_categorical_dqn_ale.py) [[general gym]](examples/gym/train_categorical_dqn_gym.py)
- examples: [[atari]](examples/atari/train_categorical_dqn_ale.py) [[general gymnasium]](examples/gymnasium/train_categorical_dqn_gymnasium.py)
- [DQN (Deep Q-Network)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) (including [Double DQN](https://arxiv.org/abs/1509.06461), [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860), Double PAL, [Dynamic Policy Programming (DPP)](http://www.jmlr.org/papers/volume13/azar12a/azar12a.pdf))
- examples: [[atari reproduction]](examples/atari/reproduction/dqn) [[atari]](examples/atari/train_dqn_ale.py) [[atari (batched)]](examples/atari/train_dqn_batch_ale.py) [[flickering atari]](examples/atari/train_drqn_ale.py) [[general gym]](examples/gym/train_dqn_gym.py)
- examples: [[atari reproduction]](examples/atari/reproduction/dqn) [[atari]](examples/atari/train_dqn_ale.py) [[atari (batched)]](examples/atari/train_dqn_batch_ale.py) [[flickering atari]](examples/atari/train_drqn_ale.py) [[general gymnasium]](examples/gymnasium/train_dqn_gymnasium.py)
- [DDPG (Deep Deterministic Policy Gradients)](https://arxiv.org/abs/1509.02971) (including [SVG(0)](https://arxiv.org/abs/1510.09142))
- examples: [[mujoco reproduction]](examples/mujoco/reproduction/ddpg)
- [IQN (Implicit Quantile Networks)](https://arxiv.org/abs/1806.06923)
Expand All @@ -76,7 +76,7 @@ Following algorithms have been implemented in PFRL:
- [Rainbow](https://arxiv.org/abs/1710.02298)
- examples: [[atari reproduction]](examples/atari/reproduction/rainbow) [[Slime volleyball]](examples/slimevolley/)
- [REINFORCE](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)
- examples: [[general gym]](examples/gym/train_reinforce_gym.py)
- examples: [[general gymnasium]](examples/gymnasium/train_reinforce_gymnasium.py)
- [SAC (Soft Actor-Critic)](https://arxiv.org/abs/1812.05905)
- examples: [[mujoco reproduction]](examples/mujoco/reproduction/soft_actor_critic) [[Atlas walk]](examples/atlas/)
- [TRPO (Trust Region Policy Optimization)](https://arxiv.org/abs/1502.05477) with [GAE (Generalized Advantage Estimation)](https://arxiv.org/abs/1506.02438)
Expand All @@ -92,14 +92,14 @@ Following useful techniques have been also implemented in PFRL:
- [Dueling Network](https://arxiv.org/abs/1511.06581)
- examples: [[Rainbow]](examples/atari/reproduction/rainbow) [[DQN/DoubleDQN/PAL]](examples/atari/train_dqn_ale.py)
- [Normalized Advantage Function](https://arxiv.org/abs/1603.00748)
- examples: [[DQN]](examples/gym/train_dqn_gym.py) (for continuous-action envs only)
- examples: [[DQN]](examples/gymnasium/train_dqn_gymnasium.py) (for continuous-action envs only)
- [Deep Recurrent Q-Network](https://arxiv.org/abs/1507.06527)
- examples: [[DQN]](examples/atari/train_drqn_ale.py)


## Environments

Environments that support the subset of OpenAI Gym's interface (`reset` and `step` methods) can be used.
Environments that support the subset of Farama Foundation's gymnasium's interface (`reset` and `step` methods) can be used.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
- `atari`: examples for general Atari games
- `atari/reproduction`: examples with benchmark scores for reproducing published results on Atari
- `atlas`: training an Atlas robot to walk
- `gym`: examples for OpenAI Gym environments
- `gymnasium`: examples for OpenAI gymnasium environments
- `grasping`: examples for a Bullet-based robotic grasping environment
- `mujoco/reproduction`: examples with benchmark scores for reproducing published results on MuJoCo tasks
- `quickstart`: a quickstart guide of PFRL
Expand Down
6 changes: 3 additions & 3 deletions examples/atari/train_acer_ale.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
# Prevent numpy from using multiple threads
os.environ["OMP_NUM_THREADS"] = "1"

import gym # NOQA:E402
import gym.wrappers # NOQA:E402
import gymnasium # NOQA:E402
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with gymnasium much, but is it recommended to write import gymnasium as gym?
Do you know any article about coding convention of gymnasium?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. The examples they give often write what you have, but I think it's just to sell the simplicity of their transition from gym to gymnasium. I also think the distinction can help for clarity, so people are reminded which API is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it does seem even internally in their code they use gym as you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Anyway I agree with your opinion:

the distinction can help for clarity

import gymnasium.wrappers # NOQA:E402
import numpy as np # NOQA:E402
from torch import nn # NOQA:E402

Expand Down Expand Up @@ -91,7 +91,7 @@ def main():
args.outdir = experiments.prepare_output_dir(args, args.outdir)
print("Output files are saved in {}".format(args.outdir))

n_actions = gym.make(args.env).action_space.n
n_actions = gymnasium.make(args.env).action_space.n

input_to_hidden = nn.Sequential(
nn.Conv2d(4, 16, 8, stride=4),
Expand Down
6 changes: 3 additions & 3 deletions examples/atari/train_drqn_ale.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@
"""
import argparse

import gym
import gym.wrappers
import gymnasium
import gymnasium.wrappers
import numpy as np
import torch
from torch import nn
Expand Down Expand Up @@ -193,7 +193,7 @@ def make_env(test):
# Randomize actions like epsilon-greedy in evaluation as well
env = pfrl.wrappers.RandomizeAction(env, args.eval_epsilon)
if args.monitor:
env = gym.wrappers.Monitor(
env = gymnasium.wrappers.Monitor(
env, args.outdir, mode="evaluation" if test else "training"
)
if args.render:
Expand Down
4 changes: 2 additions & 2 deletions examples/atari/train_ppo_ale.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""An example of training PPO against OpenAI Gym Atari Envs.
"""An example of training PPO against OpenAI gymnasium Atari Envs.

This script is an example of training a PPO agent on Atari envs.

Expand All @@ -25,7 +25,7 @@
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--env", type=str, default="BreakoutNoFrameskip-v4", help="Gym Env ID."
"--env", type=str, default="BreakoutNoFrameskip-v4", help="gymnasium Env ID."
)
parser.add_argument(
"--gpu", type=int, default=0, help="GPU device ID. Set to -1 to use CPUs only."
Expand Down
18 changes: 9 additions & 9 deletions examples/atlas/train_soft_actor_critic_atlas.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
import logging
import sys

import gym
import gym.wrappers
import gymnasium
import gymnasium.wrappers
import numpy as np
import torch
from torch import distributions, nn
Expand All @@ -17,16 +17,16 @@

def make_env(args, seed, test):
if args.env.startswith("Roboschool"):
# Check gym version because roboschool does not work with gym>=0.15.6
# Check gymnasium version because roboschool does not work with gymnasium>=0.15.6
from distutils.version import StrictVersion

gym_version = StrictVersion(gym.__version__)
if gym_version >= StrictVersion("0.15.6"):
raise RuntimeError("roboschool does not work with gym>=0.15.6")
gymnasium_version = StrictVersion(gymnasium.__version__)
if gymnasium_version >= StrictVersion("0.15.6"):
raise RuntimeError("roboschool does not work with gymnasium>=0.15.6")
import roboschool # NOQA
env = gym.make(args.env)
env = gymnasium.make(args.env)
# Unwrap TimiLimit wrapper
assert isinstance(env, gym.wrappers.TimeLimit)
assert isinstance(env, gymnasium.wrappers.TimeLimit)
env = env.env
# Use different random seeds for train and test envs
env_seed = 2**32 - 1 - seed if test else seed
Expand Down Expand Up @@ -59,7 +59,7 @@ def main():
"--env",
type=str,
default="RoboschoolAtlasForwardWalk-v1",
help="OpenAI Gym env to perform algorithm on.",
help="OpenAI gymnasium env to perform algorithm on.",
)
parser.add_argument(
"--num-envs", type=int, default=4, help="Number of envs run in parallel."
Expand Down
30 changes: 15 additions & 15 deletions examples/grasping/train_dqn_batch_grasping.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
import functools
import os

import gym
import gym.spaces
import gymnasium
import gymnasium.spaces
import numpy as np
import torch
from torch import nn
Expand All @@ -13,7 +13,7 @@
from pfrl.q_functions import DiscreteActionValueHead


class CastAction(gym.ActionWrapper):
class CastAction(gymnasium.ActionWrapper):
"""Cast actions to a given type."""

def __init__(self, env, type_):
Expand All @@ -24,14 +24,14 @@ def action(self, action):
return self.type_(action)


class TransposeObservation(gym.ObservationWrapper):
class TransposeObservation(gymnasium.ObservationWrapper):
"""Transpose observations."""

def __init__(self, env, axes):
super().__init__(env)
self._axes = axes
assert isinstance(env.observation_space, gym.spaces.Box)
self.observation_space = gym.spaces.Box(
assert isinstance(env.observation_space, gymnasium.spaces.Box)
self.observation_space = gymnasium.spaces.Box(
low=env.observation_space.low.transpose(*self._axes),
high=env.observation_space.high.transpose(*self._axes),
dtype=env.observation_space.dtype,
Expand All @@ -41,7 +41,7 @@ def observation(self, observation):
return observation.transpose(*self._axes)


class ObserveElapsedSteps(gym.Wrapper):
class ObserveElapsedSteps(gymnasium.Wrapper):
"""Observe the number of elapsed steps in an episode.

A new observation will be a tuple of an original observation and an integer
Expand All @@ -52,10 +52,10 @@ def __init__(self, env, max_steps):
super().__init__(env)
self._max_steps = max_steps
self._elapsed_steps = 0
self.observation_space = gym.spaces.Tuple(
self.observation_space = gymnasium.spaces.Tuple(
(
env.observation_space,
gym.spaces.Discrete(self._max_steps + 1),
gymnasium.spaces.Discrete(self._max_steps + 1),
)
)

Expand All @@ -64,13 +64,13 @@ def reset(self):
return self.env.reset(), self._elapsed_steps

def step(self, action):
observation, reward, done, info = self.env.step(action)
observation, reward, terminated, truncated, info = self.env.step(action)
self._elapsed_steps += 1
assert self._elapsed_steps <= self._max_steps
return (observation, self._elapsed_steps), reward, done, info
return (observation, self._elapsed_steps), reward, terminated, truncated, info


class RecordMovie(gym.Wrapper):
class RecordMovie(gymnasium.Wrapper):
"""Record MP4 videos using pybullet's logging API."""

def __init__(self, env, dirname):
Expand All @@ -87,7 +87,7 @@ def reset(self):
pybullet.STATE_LOGGING_VIDEO_MP4,
os.path.join(self._dirname, "{}.mp4".format(self._episode_idx)),
)
return obs
return obs, {}


class GraspingQFunction(nn.Module):
Expand Down Expand Up @@ -243,7 +243,7 @@ def main():
max_episode_steps = 8

def make_env(idx, test):
from pybullet_envs.bullet.kuka_diverse_object_gym_env import ( # NOQA
from pybullet_envs.bullet.kuka_diverse_object_gymnasium_env import ( # NOQA
KukaDiverseObjectEnv,
)

Expand All @@ -263,7 +263,7 @@ def make_env(idx, test):
# Disable file caching to keep memory usage small
env._p.setPhysicsEngineParameter(enableFileCaching=False)
assert env.observation_space is None
env.observation_space = gym.spaces.Box(
env.observation_space = gymnasium.spaces.Box(
low=0, high=255, shape=(84, 84, 3), dtype=np.uint8
)
# (84, 84, 3) -> (3, 84, 84)
Expand Down
15 changes: 0 additions & 15 deletions examples/gym/README.md

This file was deleted.

15 changes: 15 additions & 0 deletions examples/gymnasium/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Examples for OpenAI gymnasium environments

- `train_categorical_dqn_gymnasium.py`: CategoricalDQN for discrete action action spaces
- `train_dqn_gymnasium.py`: DQN for both discrete action and continuous action spaces
- `train_reinforce_gymnasium.py`: REINFORCE for both discrete action and continuous action spaces (only for episodic envs)

## How to run

```
python train_categorical_dqn_gymnasium.py [options]
python train_dqn_gymnasium.py [options]
python train_reinforce_gymnasium.py [options]
```

Specify `--help` or read code for options.
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
"""An example of training Categorical DQN against OpenAI Gym Envs.
"""An example of training Categorical DQN against OpenAI gymnasium Envs.

This script is an example of training a CategoricalDQN agent against OpenAI
Gym envs. Only discrete spaces are supported.
gymnasium envs. Only discrete spaces are supported.

To solve CartPole-v0, run:
python train_categorical_dqn_gym.py --env CartPole-v0
python train_categorical_dqn_gymnasium.py --env CartPole-v0
"""

import argparse
import sys

import gym
import gymnasium
import torch

import pfrl
Expand Down Expand Up @@ -66,7 +66,7 @@ def main():
print("Output files are saved in {}".format(args.outdir))

def make_env(test):
env = gym.make(args.env)
env = gymnasium.make(args.env)
env_seed = 2**32 - 1 - args.seed if test else args.seed
env.seed(env_seed)
# Cast observations to float32 because our model uses float32
Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
"""An example of training DQN against OpenAI Gym Envs.
"""An example of training DQN against OpenAI gymnasium Envs.

This script is an example of training a DQN agent against OpenAI Gym envs.
This script is an example of training a DQN agent against OpenAI gymnasium envs.
Both discrete and continuous action spaces are supported. For continuous action
spaces, A NAF (Normalized Advantage Function) is used to approximate Q-values.

To solve CartPole-v0, run:
python train_dqn_gym.py --env CartPole-v0
To solve CartPole-v1, run:
python train_dqn_gymnasium.py --env CartPole-v1

To solve Pendulum-v0, run:
python train_dqn_gym.py --env Pendulum-v0
To solve Pendulum-v1, run:
python train_dqn_gymnasium.py --env Pendulum-v1
"""

import argparse
import os
import sys

import gym
import gymnasium
import numpy as np
import torch.optim as optim
from gym import spaces
from gymnasium import spaces

import pfrl
from pfrl import experiments, explorers
Expand All @@ -42,7 +42,7 @@ def main():
" If it does not exist, it will be created."
),
)
parser.add_argument("--env", type=str, default="Pendulum-v0")
parser.add_argument("--env", type=str, default="Pendulum-v1")
parser.add_argument("--seed", type=int, default=0, help="Random seed [0, 2 ** 32)")
parser.add_argument("--gpu", type=int, default=0)
parser.add_argument("--final-exploration-steps", type=int, default=10**4)
Expand Down Expand Up @@ -100,7 +100,7 @@ def clip_action_filter(a):
return np.clip(a, action_space.low, action_space.high)

def make_env(idx=0, test=False):
env = gym.make(args.env)
env = gymnasium.make(args.env)
# Use different random seeds for train and test envs
process_seed = int(process_seeds[idx])
env_seed = 2**32 - 1 - process_seed if test else process_seed
Expand Down
Loading