Merge pull request #33 from Toni-SM/develop

Develop
Toni-SM · Oct 3, 2022 · 840e36f · 840e36f
2 parents 1e10737 + 77029f8
commit 840e36f
Show file tree

Hide file tree

Showing 149 changed files with 9,137 additions and 2,768 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,38 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [0.8.0] - 2022-10-03
+### Added
+- AMP agent for physics-based character animation
+- Manual trainer
+- Gaussian model mixin
+- Support for creating shared models
+- Parameter `role` to model methods
+- Wrapper compatibility with the new OpenAI Gym environment API (by @JohannLange)
+- Internal library colored logger
+- Migrate checkpoints/models from other RL libraries to skrl models/agents
+- Configuration parameter `store_separately` to agent configuration dict
+- Save/load agent modules (models, optimizers, preprocessors)
+- Set random seed and configure deterministic behavior for reproducibility
+- Benchmark results for Isaac Gym and Omniverse Isaac Gym on the GitHub discussion page
+- Franka Emika real-world example
+
+### Changed
+- Models implementation as Python mixin [**breaking change**]
+- Multivariate Gaussian model (`GaussianModel` until 0.7.0) to `MultivariateGaussianMixin`
+- Trainer's `cfg` parameter position and default values
+- Show training/evaluation display progress using `tqdm` (by @JohannLange)
+- Update Isaac Gym and Omniverse Isaac Gym examples
+
+### Fixed
+- Missing recursive arguments during model weights initialization
+- Tensor dimension when computing preprocessor parallel variance
+- Models' clip tensors dtype to `float32`
+
+### Removed
+- Parameter `inference` from model methods
+- Configuration parameter `checkpoint_policy_only` from agent configuration dict
+
 ## [0.7.0] - 2022-07-11
 ### Added
 - A2C agent

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,13 +3,15 @@ First of all, **thank you**... For what? Because you are dedicating some time to
 
 <hr>
 
-### I don't want to contribute (for now), I just want to ask a question!
+### I just want to ask a question!
 
 If you have a question, please do not open an issue for this. Instead, use the following resources for it (you will get a faster response):
 
 - [skrl's GitHub discussions](https://github.com/Toni-SM/skrl/discussions), a place to ask questions and discuss about the project
 
-- [Isaac Gym's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-gym/322), , a place to post your questions, find past answers, or just chat with other members of the community about Isaac Gym topics
+- [Isaac Gym's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/isaac-gym/322), a place to post your questions, find past answers, or just chat with other members of the community about Isaac Gym topics
+
+- [Omniverse Isaac Sim's forum](https://forums.developer.nvidia.com/c/agx-autonomous-machines/isaac/simulation/69), a place to post your questions, find past answers, or just chat with other members of the community about Omniverse Isaac Sim/Gym topics
 
 ### I have found a (good) bug. What can I do?
 
@@ -21,10 +23,16 @@ Open an issue on [skrl's GitHub issues](https://github.com/Toni-SM/skrl/issues)
 - A link to the source code of the library that you are using (some problems may be due to the use of older versions. If possible, always use the latest version)
 - Any other information that you think may be useful or help to reproduce/describe the problem
 
-Note: Changes that are cosmetic in nature (code formatting, removing whitespace, etc.) or that correct grammatical, spelling or typo errors, and that do not add anything substantial to the functionality of the library will generally not be accepted as a pull request
-
 ### I want to contribute, but I don't know how
 
+There is a [board](https://github.com/users/Toni-SM/projects/2/views/8) containing relevant future implementations which can be a good starting place to identify contributions. Please consider the following points
+
+#### Notes about contributing
+
+- Try to **communicate your change first** to [discuss](https://github.com/Toni-SM/skrl/discussions) the implementation if you want to add a new feature or change an existing one
+- Modify only the minimum amount of code required and the files needed to make the change
+- Changes that are cosmetic in nature (code formatting, removing whitespace, etc.) or that correct grammatical, spelling or typo errors, and that do not add anything substantial to the functionality of the library will generally not be accepted as a pull request
+
 #### Coding conventions
 
 **skrl** is designed with a focus on modularity, readability, simplicity and transparency of algorithm implementation. The file system structure groups components according to their functionality. Library components only inherit (and must inherit) from a single base class (no multilevel or multiple inheritance) that provides a uniform interface and implements common functionality that is not tied to the implementation details of the algorithms
@@ -39,6 +47,17 @@ Read the code a little bit and you will understand it at first glance... Also
   - Capitalize (the first letter) and omit any trailing punctuation
   - Write it in the imperative tense
   - Aim for about 50 (or 72) characters
+- Add import statements at the top of each module as follows:
+
+  ```ini
+  function annotation (e.g. typing)
+  # insert an empty line 
+  python libraries and other libraries (e.g. gym, numpy, time, etc.)
+  # insert an empty line
+  machine learning framework modules (e.g. torch, torch.nn)
+  # insert an empty line
+  skrl components
+  ```
 
 <hr>
 

diff --git a/README.md b/README.md
@@ -1,10 +1,10 @@
 <p align="center">
-  <img width="300rem" src="docs/source/_static/data/skrl-up-transparent.png">
+  <img width="300rem" src="https://raw.githubusercontent.com/Toni-SM/skrl/main/docs/source/_static/data/skrl-up-transparent.png">
 </p>
 <h2 align="center" style="border-bottom: 0 !important;">SKRL - Reinforcement Learning library</h2>
 <br>
 
-**skrl** is an open-source modular library for Reinforcement Learning written in Python (using [PyTorch](https://pytorch.org/)) and designed with a focus on readability, simplicity, and transparency of algorithm implementation. In addition to supporting the [OpenAI Gym](https://www.gymlibrary.ml) and [DeepMind](https://github.com/deepmind/dm_env) environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run
+**skrl** is an open-source modular library for Reinforcement Learning written in Python (using [PyTorch](https://pytorch.org/)) and designed with a focus on readability, simplicity, and transparency of algorithm implementation. In addition to supporting the [OpenAI Gym](https://www.gymlibrary.dev) and [DeepMind](https://github.com/deepmind/dm_env) environment interfaces, it allows loading and configuring [NVIDIA Isaac Gym](https://developer.nvidia.com/isaac-gym/) and [NVIDIA Omniverse Isaac Gym](https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/tutorial_gym_isaac_gym.html) environments, enabling agents' simultaneous training by scopes (subsets of environments among all available environments), which may or may not share resources, in the same run
 
 <br>
 
@@ -14,7 +14,7 @@ https://skrl.readthedocs.io/en/latest/
 
 <br>
 
-> **Note:** This project is under **active continuous development**. Please make sure you always have the latest version 
+> **Note:** This project is under **active continuous development**. Please make sure you always have the latest version. Visit the [develop](https://github.com/Toni-SM/skrl/tree/develop) branch or its [documentation](https://skrl.readthedocs.io/en/develop) to access the latest updates to be released.
 
 <br>
 

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -5,3 +5,5 @@ sphinx-tabs==3.2.0
 gym
 torch
 tensorboard
+tqdm
+packaging
diff --git a/docs/source/_static/imgs/manual_trainer.svg b/docs/source/_static/imgs/manual_trainer.svg
diff --git a/docs/source/_static/imgs/model_gaussian.svg b/docs/source/_static/imgs/model_gaussian.svg
diff --git a/docs/source/_static/imgs/model_multivariate_gaussian.svg b/docs/source/_static/imgs/model_multivariate_gaussian.svg
diff --git a/docs/source/_static/imgs/rl_schema.svg b/docs/source/_static/imgs/rl_schema.svg
diff --git a/docs/source/examples/deepmind/dm_manipulation_stack_sac.py b/docs/source/examples/deepmind/dm_manipulation_stack_sac.py
@@ -4,21 +4,21 @@
 import torch.nn as nn
 
 # Import the skrl components to build the RL system
-from skrl.models.torch import GaussianModel, DeterministicModel
+from skrl.models.torch import Model, GaussianMixin, DeterministicMixin
 from skrl.memories.torch import RandomMemory
 from skrl.agents.torch.sac import SAC, SAC_DEFAULT_CONFIG
 from skrl.trainers.torch import SequentialTrainer
 from skrl.envs.torch import wrap_env
 
 
-# Define the models (stochastic and deterministic models) for the SAC agent using the helper classes
+# Define the models (stochastic and deterministic models) for the SAC agent using the mixins.
 # - StochasticActor (policy): takes as input the environment's observation/state and returns an action
 # - Critic: takes the state and action as input and provides a value to guide the policy
-class StochasticActor(GaussianModel):
+class StochasticActor(GaussianMixin, Model):
     def __init__(self, observation_space, action_space, device, clip_actions=False,
                  clip_log_std=True, min_log_std=-20, max_log_std=2):
-        super().__init__(observation_space, action_space, device, clip_actions,
-                         clip_log_std, min_log_std, max_log_std)
+        Model.__init__(self, observation_space, action_space, device)
+        GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std)
 
         self.features_extractor = nn.Sequential(nn.Conv2d(3, 32, kernel_size=8, stride=3),
                                                 nn.ReLU(),
@@ -40,7 +40,7 @@ def __init__(self, observation_space, action_space, device, clip_actions=False,
 
         self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))
 
-    def compute(self, states, taken_actions):
+    def compute(self, states, taken_actions, role):
         # The dm_control.manipulation tasks have as observation/state spec a `collections.OrderedDict` object as follows:
         # OrderedDict([('front_close', BoundedArray(shape=(1, 84, 84, 3), dtype=dtype('uint8'), name='front_close', minimum=0, maximum=255)), 
         #              ('jaco_arm/joints_pos', Array(shape=(1, 6, 2), dtype=dtype('float64'), name='jaco_arm/joints_pos')), 
@@ -83,9 +83,10 @@ def compute(self, states, taken_actions):
                                               input["jaco_arm/joints_pos"].view(states.shape[0], -1), 
                                               input["jaco_arm/joints_vel"].view(states.shape[0], -1)], dim=-1))), self.log_std_parameter
 
-class Critic(DeterministicModel):
-    def __init__(self, observation_space, action_space, device, clip_actions = False):
-        super().__init__(observation_space, action_space, device, clip_actions)
+class Critic(DeterministicMixin, Model):
+    def __init__(self, observation_space, action_space, device, clip_actions=False):
+        Model.__init__(self, observation_space, action_space, device)
+        DeterministicMixin.__init__(self, clip_actions)
 
         self.features_extractor = nn.Sequential(nn.Conv2d(3, 32, kernel_size=8, stride=3),
                                                 nn.ReLU(),
@@ -105,7 +106,7 @@ def __init__(self, observation_space, action_space, device, clip_actions = False
                                  nn.ReLU(),
                                  nn.Linear(32, 1))
 
-    def compute(self, states, taken_actions):
+    def compute(self, states, taken_actions, role):
         # map the observations/states to the original space. 
         # See the explanation above (StochasticActor.compute)
         input = self.tensor_to_space(states, self.observation_space)
@@ -133,11 +134,12 @@ def compute(self, states, taken_actions):
 # Instantiate the agent's models (function approximators).
 # SAC requires 5 models, visit its documentation for more details
 # https://skrl.readthedocs.io/en/latest/modules/skrl.agents.sac.html#spaces-and-models
-models_sac = {"policy": StochasticActor(env.observation_space, env.action_space, device, clip_actions=True),
-              "critic_1": Critic(env.observation_space, env.action_space, device),
-              "critic_2": Critic(env.observation_space, env.action_space, device),
-              "target_critic_1": Critic(env.observation_space, env.action_space, device),
-              "target_critic_2": Critic(env.observation_space, env.action_space, device)}
+models_sac = {}
+models_sac["policy"] = StochasticActor(env.observation_space, env.action_space, device, clip_actions=True)
+models_sac["critic_1"] = Critic(env.observation_space, env.action_space, device)
+models_sac["critic_2"] = Critic(env.observation_space, env.action_space, device)
+models_sac["target_critic_1"] = Critic(env.observation_space, env.action_space, device)
+models_sac["target_critic_2"] = Critic(env.observation_space, env.action_space, device)
 
 # Initialize the models' parameters (weights and biases) using a Gaussian distribution
 for model in models_sac.values():

diff --git a/docs/source/examples/deepmind/dm_suite_cartpole_swingup_ddpg.py b/docs/source/examples/deepmind/dm_suite_cartpole_swingup_ddpg.py
@@ -5,42 +5,44 @@
 import torch.nn.functional as F
 
 # Import the skrl components to build the RL system
-from skrl.models.torch import DeterministicModel
+from skrl.models.torch import Model, DeterministicMixin
 from skrl.memories.torch import RandomMemory
 from skrl.agents.torch.ddpg import DDPG, DDPG_DEFAULT_CONFIG
 from skrl.resources.noises.torch import OrnsteinUhlenbeckNoise
 from skrl.trainers.torch import SequentialTrainer
 from skrl.envs.torch import wrap_env
 
 
-# Define the models (deterministic models) for the DDPG agent using a helper class
-# and programming with two approaches (layer by layer and torch.nn.Sequential class).
+# Define the models (deterministic models) for the DDPG agent using mixins
+# and programming with two approaches (torch functional and torch.nn.Sequential class).
 # - Actor (policy): takes as input the environment's observation/state and returns an action
 # - Critic: takes the state and action as input and provides a value to guide the policy 
-class DeterministicActor(DeterministicModel):
-    def __init__(self, observation_space, action_space, device, clip_actions = False):
-        super().__init__(observation_space, action_space, device, clip_actions)
+class DeterministicActor(DeterministicMixin, Model):
+    def __init__(self, observation_space, action_space, device, clip_actions=False):
+        Model.__init__(self, observation_space, action_space, device)
+        DeterministicMixin.__init__(self, clip_actions)
 
         self.linear_layer_1 = nn.Linear(self.num_observations, 400)
         self.linear_layer_2 = nn.Linear(400, 300)
         self.action_layer = nn.Linear(300, self.num_actions)
 
-    def compute(self, states, taken_actions):
+    def compute(self, states, taken_actions, role):
         x = F.relu(self.linear_layer_1(states))
         x = F.relu(self.linear_layer_2(x))
         return torch.tanh(self.action_layer(x))
 
-class DeterministicCritic(DeterministicModel):
-    def __init__(self, observation_space, action_space, device, clip_actions = False):
-        super().__init__(observation_space, action_space, device, clip_actions)
+class DeterministicCritic(DeterministicMixin, Model):
+    def __init__(self, observation_space, action_space, device, clip_actions=False):
+        Model.__init__(self, observation_space, action_space, device)
+        DeterministicMixin.__init__(self, clip_actions)
 
         self.net = nn.Sequential(nn.Linear(self.num_observations + self.num_actions, 400),
                                  nn.ReLU(),
                                  nn.Linear(400, 300),
                                  nn.ReLU(),
                                  nn.Linear(300, 1))
 
-    def compute(self, states, taken_actions):
+    def compute(self, states, taken_actions, role):
         return self.net(torch.cat([states, taken_actions], dim=1))
 
 
@@ -58,10 +60,11 @@ def compute(self, states, taken_actions):
 # Instantiate the agent's models (function approximators).
 # DDPG requires 4 models, visit its documentation for more details
 # https://skrl.readthedocs.io/en/latest/modules/skrl.agents.ddpg.html#spaces-and-models
-models_ddpg = {"policy": DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True),
-               "target_policy": DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True),
-               "critic": DeterministicCritic(env.observation_space, env.action_space, device),
-               "target_critic": DeterministicCritic(env.observation_space, env.action_space, device)}
+models_ddpg = {}
+models_ddpg["policy"] = DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True)
+models_ddpg["target_policy"] = DeterministicActor(env.observation_space, env.action_space, device, clip_actions=True)
+models_ddpg["critic"] = DeterministicCritic(env.observation_space, env.action_space, device)
+models_ddpg["target_critic"] = DeterministicCritic(env.observation_space, env.action_space, device)
 
 # Initialize the models' parameters (weights and biases) using a Gaussian distribution
 for model in models_ddpg.values():

diff --git a/docs/source/examples/gym/gym_cartpole_cem.py b/docs/source/examples/gym/gym_cartpole_cem.py
@@ -0,0 +1,82 @@
+import gym
+
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import the skrl components to build the RL system
+from skrl.models.torch import Model, CategoricalMixin
+from skrl.memories.torch import RandomMemory
+from skrl.agents.torch.cem import CEM, CEM_DEFAULT_CONFIG
+from skrl.trainers.torch import SequentialTrainer
+from skrl.envs.torch import wrap_env
+
+
+# Define the model (categorical model) for the CEM agent using mixin
+# - Policy: takes as input the environment's observation/state and returns an action
+class Policy(CategoricalMixin, Model):
+    def __init__(self, observation_space, action_space, device, unnormalized_log_prob=True):
+        Model.__init__(self, observation_space, action_space, device)
+        CategoricalMixin.__init__(self, unnormalized_log_prob)
+
+        self.linear_layer_1 = nn.Linear(self.num_observations, 64)
+        self.linear_layer_2 = nn.Linear(64, 64)
+        self.output_layer = nn.Linear(64, self.num_actions)
+
+    def compute(self, states, taken_actions, role):
+        x = F.relu(self.linear_layer_1(states))
+        x = F.relu(self.linear_layer_2(x))
+        return self.output_layer(x)
+
+
+# Load and wrap the Gym environment.
+# Note: the environment version may change depending on the gym version
+try:
+    env = gym.make("CartPole-v0")
+except gym.error.DeprecatedEnv as e:
+    env_id = [spec.id for spec in gym.envs.registry.all() if spec.id.startswith("CartPole-v")][0]
+    print("CartPole-v0 not found. Trying {}".format(env_id))
+    env = gym.make(env_id)
+env = wrap_env(env)
+
+device = env.device
+
+
+# Instantiate a RandomMemory (without replacement) as experience replay memory
+memory = RandomMemory(memory_size=1000, num_envs=env.num_envs, device=device, replacement=False)
+
+
+# Instantiate the agent's model (function approximator).
+# CEM requires 1 model, visit its documentation for more details
+# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.cem.html#spaces-and-models
+models_cem = {}
+models_cem["policy"] = Policy(env.observation_space, env.action_space, device)
+
+# Initialize the models' parameters (weights and biases) using a Gaussian distribution
+for model in models_cem.values():
+    model.init_parameters(method_name="normal_", mean=0.0, std=0.1)
+
+
+# Configure and instantiate the agent.
+# Only modify some of the default configuration, visit its documentation to see all the options
+# https://skrl.readthedocs.io/en/latest/modules/skrl.agents.cem.html#configuration-and-hyperparameters
+cfg_cem = CEM_DEFAULT_CONFIG.copy()
+cfg_cem["rollouts"] = 1000
+cfg_cem["learning_starts"] = 100
+# logging to TensorBoard and write checkpoints each 1000 and 5000 timesteps respectively
+cfg_cem["experiment"]["write_interval"] = 1000
+cfg_cem["experiment"]["checkpoint_interval"] = 5000
+
+agent_cem = CEM(models=models_cem, 
+                memory=memory, 
+                cfg=cfg_cem, 
+                observation_space=env.observation_space, 
+                action_space=env.action_space,
+                device=device)
+
+
+# Configure and instantiate the RL trainer
+cfg_trainer = {"timesteps": 100000, "headless": True}
+trainer = SequentialTrainer(env=env, agents=[agent_cem], cfg=cfg_trainer)
+
+# start training
+trainer.train()