Skip to content

Commit

Permalink
Merge pull request #33 from Toni-SM/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
Toni-SM authored Oct 3, 2022
2 parents 1e10737 + 77029f8 commit 840e36f
Show file tree
Hide file tree
Showing 149 changed files with 9,137 additions and 2,768 deletions.
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,38 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [0.8.0] - 2022-10-03
### Added
- AMP agent for physics-based character animation
- Manual trainer
- Gaussian model mixin
- Support for creating shared models
- Parameter `role` to model methods
- Wrapper compatibility with the new OpenAI Gym environment API (by @JohannLange)
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Configuration parameter `store_separately` to agent configuration dict
- Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
- Benchmark results for Isaac Gym and Omniverse Isaac Gym on the GitHub discussion page
- Franka Emika real-world example

### Changed
- Models implementation as Python mixin [**breaking change**]
- Multivariate Gaussian model (`GaussianModel` until 0.7.0) to `MultivariateGaussianMixin`
- Trainer's `cfg` parameter position and default values
- Show training/evaluation display progress using `tqdm` (by @JohannLange)
- Update Isaac Gym and Omniverse Isaac Gym examples

### Fixed
- Missing recursive arguments during model weights initialization
- Tensor dimension when computing preprocessor parallel variance
- Models' clip tensors dtype to `float32`

### Removed
- Parameter `inference` from model methods
- Configuration parameter `checkpoint_policy_only` from agent configuration dict

## [0.7.0] - 2022-07-11
### Added
- A2C agent
Expand Down
27 changes: 23 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@ First of all, **thank you**... For what? Because you are dedicating some time to

<hr>

### I don't want to contribute (for now), I just want to ask a question!
### I just want to ask a question!

If you have a question, please do not open an issue for this. Instead, use the following resources for it (you will get a faster response):

- [skrl's GitHub discussions](https://github.com/Toni-SM/skrl/discussions), a place to ask questions and discuss about the project

- [Isaac Gym's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-gym/322), , a place to post your questions, find past answers, or just chat with other members of the community about Isaac Gym topics
- [Isaac Gym's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-gym/322), a place to post your questions, find past answers, or just chat with other members of the community about Isaac Gym topics

- [Omniverse Isaac Sim's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/simulation/69), a place to post your questions, find past answers, or just chat with other members of the community about Omniverse Isaac Sim/Gym topics

### I have found a (good) bug. What can I do?

Expand All @@ -21,10 +23,16 @@ Open an issue on [skrl's GitHub issues](https://github.com/Toni-SM/skrl/issues)
- A link to the source code of the library that you are using (some problems may be due to the use of older versions. If possible, always use the latest version)
- Any other information that you think may be useful or help to reproduce/describe the problem

Note: Changes that are cosmetic in nature (code formatting, removing whitespace, etc.) or that correct grammatical, spelling or typo errors, and that do not add anything substantial to the functionality of the library will generally not be accepted as a pull request

### I want to contribute, but I don't know how

There is a [board](https://github.com/users/Toni-SM/projects/2/views/8) containing relevant future implementations which can be a good starting place to identify contributions. Please consider the following points

#### Notes about contributing

- Try to **communicate your change first** to [discuss](https://github.com/Toni-SM/skrl/discussions) the implementation if you want to add a new feature or change an existing one
- Modify only the minimum amount of code required and the files needed to make the change
- Changes that are cosmetic in nature (code formatting, removing whitespace, etc.) or that correct grammatical, spelling or typo errors, and that do not add anything substantial to the functionality of the library will generally not be accepted as a pull request

#### Coding conventions

**skrl** is designed with a focus on modularity, readability, simplicity and transparency of algorithm implementation. The file system structure groups components according to their functionality. Library components only inherit (and must inherit) from a single base class (no multilevel or multiple inheritance) that provides a uniform interface and implements common functionality that is not tied to the implementation details of the algorithms
Expand All @@ -39,6 +47,17 @@ Read the code a little bit and you will understand it at first glance... Also
- Capitalize (the first letter) and omit any trailing punctuation
- Write it in the imperative tense
- Aim for about 50 (or 72) characters
- Add import statements at the top of each module as follows:

```ini
function annotation (e.g. typing)
# insert an empty line
python libraries and other libraries (e.g. gym, numpy, time, etc.)
# insert an empty line
machine learning framework modules (e.g. torch, torch.nn)
# insert an empty line
skrl components
```

<hr>

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
<p align="center">
<img width="300rem" src="docs/source/_static/data/skrl-up-transparent.png">
<img width="300rem" src="https://raw.githubusercontent.com/Toni-SM/skrl/main/docs/source/_static/data/skrl-up-transparent.png">
</p>
<h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
<br>

**skrl** is an open-source modular library for Reinforcement Learning written in Python (using [PyTorch](https://pytorch.org/)) and designed with a focus on readability, simplicity, and transparency of algorithm implementation. In addition to supporting the [OpenAI Gym](https://www.gymlibrary.ml) and [DeepMind](https://github.com/deepmind/dm_env) environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run
**skrl** is an open-source modular library for Reinforcement Learning written in Python (using [PyTorch](https://pytorch.org/)) and designed with a focus on readability, simplicity, and transparency of algorithm implementation. In addition to supporting the [OpenAI Gym](https://www.gymlibrary.dev) and [DeepMind](https://github.com/deepmind/dm_env) environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run

<br>

Expand All @@ -14,7 +14,7 @@ https://skrl.readthedocs.io/en/latest/

<br>

> **Note:** This project is under **active continuous development**. Please make sure you always have the latest version
> **Note:** This project is under **active continuous development**. Please make sure you always have the latest version. Visit the [develop](https://github.com/Toni-SM/skrl/tree/develop) branch or its [documentation](https://skrl.readthedocs.io/en/develop) to access the latest updates to be released.
<br>

Expand Down
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ sphinx-tabs==3.2.0
gym
torch
tensorboard
tqdm
packaging
1 change: 1 addition & 0 deletions docs/source/_static/imgs/manual_trainer.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/_static/imgs/model_gaussian.svg
100644 → 100755
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/source/_static/imgs/model_multivariate_gaussian.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/_static/imgs/rl_schema.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 17 additions & 15 deletions docs/source/examples/deepmind/dm_manipulation_stack_sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@
import torch.nn as nn

# Import the skrl components to build the RL system
from skrl.models.torch import GaussianModel, DeterministicModel
from skrl.models.torch import Model, GaussianMixin, DeterministicMixin
from skrl.memories.torch import RandomMemory
from skrl.agents.torch.sac import SAC, SAC_DEFAULT_CONFIG
from skrl.trainers.torch import SequentialTrainer
from skrl.envs.torch import wrap_env


# Define the models (stochastic and deterministic models) for the SAC agent using the helper classes
# Define the models (stochastic and deterministic models) for the SAC agent using the mixins.
# - StochasticActor (policy): takes as input the environment's observation/state and returns an action
# - Critic: takes the state and action as input and provides a value to guide the policy
class StochasticActor(GaussianModel):
class StochasticActor(GaussianMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions=False,
clip_log_std=True, min_log_std=-20, max_log_std=2):
super().__init__(observation_space, action_space, device, clip_actions,
clip_log_std, min_log_std, max_log_std)
Model.__init__(self, observation_space, action_space, device)
GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)

self.features_extractor = nn.Sequential(nn.Conv2d(3, 32, kernel_size=8, stride=3),
nn.ReLU(),
Expand All @@ -40,7 +40,7 @@ def __init__(self, observation_space, action_space, device, clip_actions=False,

self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))

def compute(self, states, taken_actions):
def compute(self, states, taken_actions, role):
# The dm_control.manipulation tasks have as observation/state spec a `collections.OrderedDict` object as follows:
# OrderedDict([('front_close', BoundedArray(shape=(1, 84, 84, 3), dtype=dtype('uint8'), name='front_close', minimum=0, maximum=255)),
# ('jaco_arm/joints_pos', Array(shape=(1, 6, 2), dtype=dtype('float64'), name='jaco_arm/joints_pos')),
Expand Down Expand Up @@ -83,9 +83,10 @@ def compute(self, states, taken_actions):
input["jaco_arm/joints_pos"].view(states.shape[0], -1),
input["jaco_arm/joints_vel"].view(states.shape[0], -1)], dim=-1))), self.log_std_parameter

class Critic(DeterministicModel):
def __init__(self, observation_space, action_space, device, clip_actions = False):
super().__init__(observation_space, action_space, device, clip_actions)
class Critic(DeterministicMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions=False):
Model.__init__(self, observation_space, action_space, device)
DeterministicMixin.__init__(self, clip_actions)

self.features_extractor = nn.Sequential(nn.Conv2d(3, 32, kernel_size=8, stride=3),
nn.ReLU(),
Expand All @@ -105,7 +106,7 @@ def __init__(self, observation_space, action_space, device, clip_actions = False
nn.ReLU(),
nn.Linear(32, 1))

def compute(self, states, taken_actions):
def compute(self, states, taken_actions, role):
# map the observations/states to the original space.
# See the explanation above (StochasticActor.compute)
input = self.tensor_to_space(states, self.observation_space)
Expand Down Expand Up @@ -133,11 +134,12 @@ def compute(self, states, taken_actions):
# Instantiate the agent's models (function approximators).
# SAC requires 5 models, visit its documentation for more details
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.sac.html#spaces-and-models
models_sac = {"policy": StochasticActor(env.observation_space, env.action_space, device, clip_actions=True),
"critic_1": Critic(env.observation_space, env.action_space, device),
"critic_2": Critic(env.observation_space, env.action_space, device),
"target_critic_1": Critic(env.observation_space, env.action_space, device),
"target_critic_2": Critic(env.observation_space, env.action_space, device)}
models_sac = {}
models_sac["policy"] = StochasticActor(env.observation_space, env.action_space, device, clip_actions=True)
models_sac["critic_1"] = Critic(env.observation_space, env.action_space, device)
models_sac["critic_2"] = Critic(env.observation_space, env.action_space, device)
models_sac["target_critic_1"] = Critic(env.observation_space, env.action_space, device)
models_sac["target_critic_2"] = Critic(env.observation_space, env.action_space, device)

# Initialize the models' parameters (weights and biases) using a Gaussian distribution
for model in models_sac.values():
Expand Down
33 changes: 18 additions & 15 deletions docs/source/examples/deepmind/dm_suite_cartpole_swingup_ddpg.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,42 +5,44 @@
import torch.nn.functional as F

# Import the skrl components to build the RL system
from skrl.models.torch import DeterministicModel
from skrl.models.torch import Model, DeterministicMixin
from skrl.memories.torch import RandomMemory
from skrl.agents.torch.ddpg import DDPG, DDPG_DEFAULT_CONFIG
from skrl.resources.noises.torch import OrnsteinUhlenbeckNoise
from skrl.trainers.torch import SequentialTrainer
from skrl.envs.torch import wrap_env


# Define the models (deterministic models) for the DDPG agent using a helper class
# and programming with two approaches (layer by layer and torch.nn.Sequential class).
# Define the models (deterministic models) for the DDPG agent using mixins
# and programming with two approaches (torch functional and torch.nn.Sequential class).
# - Actor (policy): takes as input the environment's observation/state and returns an action
# - Critic: takes the state and action as input and provides a value to guide the policy
class DeterministicActor(DeterministicModel):
def __init__(self, observation_space, action_space, device, clip_actions = False):
super().__init__(observation_space, action_space, device, clip_actions)
class DeterministicActor(DeterministicMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions=False):
Model.__init__(self, observation_space, action_space, device)
DeterministicMixin.__init__(self, clip_actions)

self.linear_layer_1 = nn.Linear(self.num_observations, 400)
self.linear_layer_2 = nn.Linear(400, 300)
self.action_layer = nn.Linear(300, self.num_actions)

def compute(self, states, taken_actions):
def compute(self, states, taken_actions, role):
x = F.relu(self.linear_layer_1(states))
x = F.relu(self.linear_layer_2(x))
return torch.tanh(self.action_layer(x))

class DeterministicCritic(DeterministicModel):
def __init__(self, observation_space, action_space, device, clip_actions = False):
super().__init__(observation_space, action_space, device, clip_actions)
class DeterministicCritic(DeterministicMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions=False):
Model.__init__(self, observation_space, action_space, device)
DeterministicMixin.__init__(self, clip_actions)

self.net = nn.Sequential(nn.Linear(self.num_observations + self.num_actions, 400),
nn.ReLU(),
nn.Linear(400, 300),
nn.ReLU(),
nn.Linear(300, 1))

def compute(self, states, taken_actions):
def compute(self, states, taken_actions, role):
return self.net(torch.cat([states, taken_actions], dim=1))


Expand All @@ -58,10 +60,11 @@ def compute(self, states, taken_actions):
# Instantiate the agent's models (function approximators).
# DDPG requires 4 models, visit its documentation for more details
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ddpg.html#spaces-and-models
models_ddpg = {"policy": DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True),
"target_policy": DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True),
"critic": DeterministicCritic(env.observation_space, env.action_space, device),
"target_critic": DeterministicCritic(env.observation_space, env.action_space, device)}
models_ddpg = {}
models_ddpg["policy"] = DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True)
models_ddpg["target_policy"] = DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True)
models_ddpg["critic"] = DeterministicCritic(env.observation_space, env.action_space, device)
models_ddpg["target_critic"] = DeterministicCritic(env.observation_space, env.action_space, device)

# Initialize the models' parameters (weights and biases) using a Gaussian distribution
for model in models_ddpg.values():
Expand Down
82 changes: 82 additions & 0 deletions docs/source/examples/gym/gym_cartpole_cem.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
import gym

import torch.nn as nn
import torch.nn.functional as F

# Import the skrl components to build the RL system
from skrl.models.torch import Model, CategoricalMixin
from skrl.memories.torch import RandomMemory
from skrl.agents.torch.cem import CEM, CEM_DEFAULT_CONFIG
from skrl.trainers.torch import SequentialTrainer
from skrl.envs.torch import wrap_env


# Define the model (categorical model) for the CEM agent using mixin
# - Policy: takes as input the environment's observation/state and returns an action
class Policy(CategoricalMixin, Model):
def __init__(self, observation_space, action_space, device, unnormalized_log_prob=True):
Model.__init__(self, observation_space, action_space, device)
CategoricalMixin.__init__(self, unnormalized_log_prob)

self.linear_layer_1 = nn.Linear(self.num_observations, 64)
self.linear_layer_2 = nn.Linear(64, 64)
self.output_layer = nn.Linear(64, self.num_actions)

def compute(self, states, taken_actions, role):
x = F.relu(self.linear_layer_1(states))
x = F.relu(self.linear_layer_2(x))
return self.output_layer(x)


# Load and wrap the Gym environment.
# Note: the environment version may change depending on the gym version
try:
env = gym.make("CartPole-v0")
except gym.error.DeprecatedEnv as e:
env_id = [spec.id for spec in gym.envs.registry.all() if spec.id.startswith("CartPole-v")][0]
print("CartPole-v0 not found. Trying {}".format(env_id))
env = gym.make(env_id)
env = wrap_env(env)

device = env.device


# Instantiate a RandomMemory (without replacement) as experience replay memory
memory = RandomMemory(memory_size=1000, num_envs=env.num_envs, device=device, replacement=False)


# Instantiate the agent's model (function approximator).
# CEM requires 1 model, visit its documentation for more details
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.cem.html#spaces-and-models
models_cem = {}
models_cem["policy"] = Policy(env.observation_space, env.action_space, device)

# Initialize the models' parameters (weights and biases) using a Gaussian distribution
for model in models_cem.values():
model.init_parameters(method_name="normal_", mean=0.0, std=0.1)


# Configure and instantiate the agent.
# Only modify some of the default configuration, visit its documentation to see all the options
# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.cem.html#configuration-and-hyperparameters
cfg_cem = CEM_DEFAULT_CONFIG.copy()
cfg_cem["rollouts"] = 1000
cfg_cem["learning_starts"] = 100
# logging to TensorBoard and write checkpoints each 1000 and 5000 timesteps respectively
cfg_cem["experiment"]["write_interval"] = 1000
cfg_cem["experiment"]["checkpoint_interval"] = 5000

agent_cem = CEM(models=models_cem,
memory=memory,
cfg=cfg_cem,
observation_space=env.observation_space,
action_space=env.action_space,
device=device)


# Configure and instantiate the RL trainer
cfg_trainer = {"timesteps": 100000, "headless": True}
trainer = SequentialTrainer(env=env, agents=[agent_cem], cfg=cfg_trainer)

# start training
trainer.train()
Loading

0 comments on commit 840e36f

Please sign in to comment.