Merge pull request #176 from utiasDSL/dev-sb3

Stable-baselines3 2.0 examples
utiasDSL · Nov 26, 2023 · b1bac77 · b1bac77
2 parents c91605b + b155e60
commit b1bac77
Show file tree

Hide file tree

Showing 36 changed files with 498 additions and 1,681 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,8 +4,6 @@ examples/debug.py
 examples/test.py
 
 # NumPy saves and videos
-files/logs/*.npy
-files/videos/*.mp4
 results/
 tmp/
 

diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ This is a minimalist refactoring of the original `gym-pybullet-drones` repositor
 
 > **NOTE**: if you prefer to access the original codebase, presented at IROS in 2021, please `git checkout [paper|master]` after cloning the repo, and refer to the corresponding `README.md`'s.
 
-<img src="files/readme_images/helix.gif" alt="formation flight" width="350"> <img src="files/readme_images/helix.png" alt="control info" width="450">
+<img src="gym_pybullet_drones/assets/helix.gif" alt="formation flight" width="350"> <img src="gym_pybullet_drones/assets/helix.png" alt="control info" width="450">
 
 ## Installation
 
@@ -24,18 +24,27 @@ pip3 install -e . # if needed, `sudo apt install build-essential` to install `gc
 
 ## Use
 
-### PID position control example
+### PID control examples
 
 ```sh
 cd gym_pybullet_drones/examples/
-python3 pid.py
+python3 pid.py # position and velocity reference
+python3 pid_velocity.py # desired velocity reference
 ```
 
-### Stable-baselines3 PPO RL example
+### Downwash effect example
 
 ```sh
 cd gym_pybullet_drones/examples/
-python3 learn.py
+python3 downwash.py
+```
+
+### Stable-baselines3 PPO RL 3'-training examples
+
+```sh
+cd gym_pybullet_drones/examples/
+python learn.py # task: single drone hover at z == 1
+python learn.py --multiagent true # task: 2-drone hover at z == 1.2 and 0.7
 ```
 
 ### Betaflight SITL example (Ubuntu only)
@@ -96,13 +105,14 @@ If you wish, please cite our [IROS 2021 paper](https://arxiv.org/abs/2103.02142)
 - C. Karen Liu and Dan Negrut (2020) [*The Role of Physics-Based Simulators in Robotics*](https://www.annualreviews.org/doi/pdf/10.1146/annurev-control-072220-093055)
 - Yunlong Song, Selim Naji, Elia Kaufmann, Antonio Loquercio, and Davide Scaramuzza (2020) [*Flightmare: A Flexible Quadrotor Simulator*](https://arxiv.org/pdf/2009.00563.pdf)
 
-## TODO
+## Core Team WIP
+
+- [ ] Multi-drone `crazyflie-firmware` SITL support (@spencerteetaert, @JacopoPan)
+
+## Desired Contributions/PRs
 
-- [ ] Add `crazyflie-firmware` SITL support @spencerteetaert
-- [ ] Add motor delay @JacopoPan / @spencerteetaert
-- [ ] Replace `rpy` with quaternions (and `ang_vel` with body rates) in `obs` @JacopoPan
-- [ ] Replace `BaseSingleAgentAviary` and `BaseMultiAgentAviary` with a single `RLAviary`, incl. PR #161 @JacopoPan
-- [ ] Add a multi-agent MDP with 2-drone chase through a gate @JacopoPan
+- [ ] Add motor delay by implementing a buffer in `BaseAviary._dynamics()`
+- [ ] Replace `rpy` with quaternions (and `ang_vel` with body rates) by editing `BaseAviary._updateAndStoreKinematicInformation()`, `BaseAviary._getDroneStateVector()`, and the `.computeObs()` methods of relevant subclasses
 
 -----
 > University of Toronto's [Dynamic Systems Lab](https://github.com/utiasDSL) / [Vector Institute](https://github.com/VectorInstitute) / University of Cambridge's [Prorok Lab](https://github.com/proroklab)
diff --git a/files/logs/foo.txt b/files/logs/foo.txt
diff --git a/files/readme_images/2020.gif b/files/readme_images/2020.gif
diff --git a/files/readme_images/dep.gif b/files/readme_images/dep.gif
diff --git a/files/readme_images/downwash.gif b/files/readme_images/downwash.gif
diff --git a/files/readme_images/downwash.png b/files/readme_images/downwash.png
diff --git a/files/readme_images/rgb.gif b/files/readme_images/rgb.gif
diff --git a/files/readme_images/seg.gif b/files/readme_images/seg.gif
diff --git a/gym_pybullet_drones/__init__.py b/gym_pybullet_drones/__init__.py
@@ -12,20 +12,10 @@
 
 register(
     id='hover-aviary-v0',
-    entry_point='gym_pybullet_drones.envs.single_agent_rl:HoverAviary',
+    entry_point='gym_pybullet_drones.envs:HoverAviary',
 )
 
 register(
-    id='flythrugate-aviary-v0',
-    entry_point='gym_pybullet_drones.envs.single_agent_rl:FlyThruGateAviary',
-)
-
-register(
-    id='flock-aviary-v0',
-    entry_point='gym_pybullet_drones.envs.multi_agent_rl:FlockAviary',
-)
-
-register(
-    id='leaderfollower-aviary-v0',
-    entry_point='gym_pybullet_drones.envs.multi_agent_rl:LeaderFollowerAviary',
+    id='multihover-aviary-v0',
+    entry_point='gym_pybullet_drones.envs:MultiHoverAviary',
 )
diff --git a/gym_pybullet_drones/assets/beta.txt → ...bullet_drones/assets/beta-presets-bak.txt b/gym_pybullet_drones/assets/beta.txt → ...bullet_drones/assets/beta-presets-bak.txt
diff --git a/gym_pybullet_drones/assets/beta.csv → gym_pybullet_drones/assets/beta-traj.csv b/gym_pybullet_drones/assets/beta.csv → gym_pybullet_drones/assets/beta-traj.csv
diff --git a/gym_pybullet_drones/assets/clone_bfs.sh b/gym_pybullet_drones/assets/clone_bfs.sh
@@ -18,7 +18,7 @@ cd ../../
 mkdir betaflight_sitl/
 cd betaflight_sitl/
 
-# Step 1: Clone and open betaflight's source:
+# Step 1: Clone and open betaflight's source (at the time of writing, branch `master`, future release 4.5)):
 git clone https://github.com/betaflight/betaflight temp/
 
 
@@ -27,8 +27,6 @@ git clone https://github.com/betaflight/betaflight temp/
 # from Betaflight's `SIMULATOR_BUILD`
 cd temp/
 sed -i "s/delayMicroseconds_real(50);/\/\/delayMicroseconds_real(50);/g" ./src/main/main.c
-sed -i "s/ret = udpInit(\&stateLink, NULL, 9003, true);/\/\/ret = udpInit(\&stateLink, NULL, PORT_STATE, true);/g" ./src/main/target/SITL/sitl.c
-sed -i "s/printf(\"start UDP server.../\/\/printf(\"start UDP server.../g" ./src/main/target/SITL/sitl.c
 
 # Prepare
 make arm_sdk_install 
@@ -47,7 +45,7 @@ for ((i = 0; i < desired_max_num_drones; i++)); do
     cp -r temp/ "bf${i}/"
     cd "bf${i}/"
 
-    # Step 3: Change the UDP ports used by each Betaflight SITL instancet
+    # Step 3: Change the UDP ports used by each Betaflight SITL instance
     replacement1="PORT_PWM_RAW    90${i}1"
     sed -i "s/$pattern1/$replacement1/g" ./src/main/target/SITL/sitl.c
     replacement2="PORT_PWM    90${i}2"

diff --git a/files/videos/ffmpeg_png2mp4.sh → gym_pybullet_drones/assets/ffmpeg_png2mp4.sh b/files/videos/ffmpeg_png2mp4.sh → gym_pybullet_drones/assets/ffmpeg_png2mp4.sh
diff --git a/files/readme_images/helix.gif → gym_pybullet_drones/assets/helix.gif b/files/readme_images/helix.gif → gym_pybullet_drones/assets/helix.gif
diff --git a/files/readme_images/helix.png → gym_pybullet_drones/assets/helix.png b/files/readme_images/helix.png → gym_pybullet_drones/assets/helix.png
diff --git a/gym_pybullet_drones/envs/BaseAviary.py b/gym_pybullet_drones/envs/BaseAviary.py
@@ -61,7 +61,7 @@ def __init__(self,
         gui : bool, optional
             Whether to use PyBullet's GUI.
         record : bool, optional
-            Whether to save a video of the simulation in folder `files/videos/`.
+            Whether to save a video of the simulation.
         obstacles : bool, optional
             Whether to add obstacles to the simulation.
         user_debug_gui : bool, optional
@@ -338,7 +338,6 @@ def step(self,
                                                           ) for i in range(self.NUM_DRONES)]
         #### Save, preprocess, and clip the action to the max. RPM #
         else:
-            self._saveLastAction(action)
             clipped_action = np.reshape(self._preprocessAction(action), (self.NUM_DRONES, 4))
         #### Repeat for as many as the aggregate physics steps #####
         for _ in range(self.PYB_STEPS_PER_CTRL):
@@ -466,7 +465,6 @@ def _housekeeping(self):
         self.GUI_INPUT_TEXT = -1*np.ones(self.NUM_DRONES)
         self.USE_GUI_RPM=False
         self.last_input_switch = 0
-        self.last_action = -1*np.ones((self.NUM_DRONES, 4))
         self.last_clipped_action = np.zeros((self.NUM_DRONES, 4))
         self.gui_input = np.zeros(4)
         #### Initialize the drones kinemaatic information ##########
@@ -916,23 +914,6 @@ def _normalizedActionToRPM(self,
 
     ################################################################################
 
-    def _saveLastAction(self,
-                        action
-                        ):
-        """Stores the most recent action into attribute `self.last_action`.
-
-        The last action can be used to compute aerodynamic effects.
-
-        Parameters
-        ----------
-        action : ndarray
-            Ndarray containing the current RPMs input for each drone.
-
-        """
-        self.last_action = np.reshape(action, (self.NUM_DRONES, 4))
-
-    ################################################################################
-
     def _showDroneLocalAxes(self,
                             nth_drone
                             ):

diff --git a/...vs/multi_agent_rl/BaseMultiagentAviary.py → gym_pybullet_drones/envs/BaseRLAviary.py b/...vs/multi_agent_rl/BaseMultiagentAviary.py → gym_pybullet_drones/envs/BaseRLAviary.py
@@ -2,19 +2,20 @@
 import numpy as np
 import pybullet as p
 from gymnasium import spaces
+from collections import deque
 
 from gym_pybullet_drones.envs.BaseAviary import BaseAviary
 from gym_pybullet_drones.utils.enums import DroneModel, Physics, ActionType, ObservationType, ImageType
 from gym_pybullet_drones.control.DSLPIDControl import DSLPIDControl
 
-class BaseMultiagentAviary(BaseAviary):
-    """Base multi-agent environment class for reinforcement learning."""
+class BaseRLAviary(BaseAviary):
+    """Base single and multi-agent environment class for reinforcement learning."""
 
     ################################################################################
 
     def __init__(self,
                  drone_model: DroneModel=DroneModel.CF2X,
-                 num_drones: int=2,
+                 num_drones: int=1,
                  neighbourhood_radius: float=np.inf,
                  initial_xyzs=None,
                  initial_rpys=None,
@@ -26,7 +27,7 @@ def __init__(self,
                  obs: ObservationType=ObservationType.KIN,
                  act: ActionType=ActionType.RPM
                  ):
-        """Initialization of a generic multi-agent RL environment.
+        """Initialization of a generic single and multi-agent RL environment.
 
         Attributes `vision_attributes` and `dynamics_attributes` are selected
         based on the choice of `obs` and `act`; `obstacles` is set to True 
@@ -54,27 +55,27 @@ def __init__(self,
         gui : bool, optional
             Whether to use PyBullet's GUI.
         record : bool, optional
-            Whether to save a video of the simulation in folder `files/videos/`.
+            Whether to save a video of the simulation.
         obs : ObservationType, optional
             The type of observation space (kinematic information or vision)
         act : ActionType, optional
             The type of action space (1 or 3D; RPMS, thurst and torques, waypoint or velocity with PID control; etc.)
 
         """
-        if num_drones < 2:
-            print("[ERROR] in BaseMultiagentAviary.__init__(), num_drones should be >= 2")
-            exit()
+        #### Create a buffer for the last 10 actions ###############
+        self.ACTION_BUFFER_SIZE = 10
+        self.action_buffer = deque(maxlen=self.ACTION_BUFFER_SIZE)
+        ####
         vision_attributes = True if obs == ObservationType.RGB else False
         self.OBS_TYPE = obs
         self.ACT_TYPE = act
-        self.EPISODE_LEN_SEC = 5
         #### Create integrated controllers #########################
         if act in [ActionType.PID, ActionType.VEL, ActionType.ONE_D_PID]:
             os.environ['KMP_DUPLICATE_LIB_OK']='True'
             if drone_model in [DroneModel.CF2X, DroneModel.CF2P]:
                 self.ctrl = [DSLPIDControl(drone_model=DroneModel.CF2X) for i in range(num_drones)]
             else:
-                print("[ERROR] in BaseMultiagentAviary.__init()__, no controller is available for the specified drone_model")
+                print("[ERROR] in BaseRLAviary.__init()__, no controller is available for the specified drone_model")
         super().__init__(drone_model=drone_model,
                          num_drones=num_drones,
                          neighbourhood_radius=neighbourhood_radius,
@@ -144,10 +145,14 @@ def _actionSpace(self):
         elif self.ACT_TYPE in [ActionType.ONE_D_RPM, ActionType.ONE_D_PID]:
             size = 1
         else:
-            print("[ERROR] in BaseMultiagentAviary._actionSpace()")
+            print("[ERROR] in BaseRLAviary._actionSpace()")
             exit()
         act_lower_bound = np.array([-1*np.ones(size) for i in range(self.NUM_DRONES)])
         act_upper_bound = np.array([+1*np.ones(size) for i in range(self.NUM_DRONES)])
+        #
+        for i in range(self.ACTION_BUFFER_SIZE):
+            self.action_buffer.append(np.zeros((self.NUM_DRONES,size)))
+        #
         return spaces.Box(low=act_lower_bound, high=act_upper_bound, dtype=np.float32)
 
     ################################################################################
@@ -179,6 +184,7 @@ def _preprocessAction(self,
             commanded to the 4 motors of each drone.
 
         """
+        self.action_buffer.append(action)
         rpm = np.zeros((self.NUM_DRONES,4))
         for k in range(action.shape[0]):
             target = action[k, :]
@@ -219,16 +225,16 @@ def _preprocessAction(self,
                 rpm[k,:] = np.repeat(self.HOVER_RPM * (1+0.05*target), 4)
             elif self.ACT_TYPE == ActionType.ONE_D_PID:
                 state = self._getDroneStateVector(k)
-                rpm, _, _ = self.ctrl[k].computeControl(control_timestep=self.CTRL_TIMESTEP,
+                res, _, _ = self.ctrl[k].computeControl(control_timestep=self.CTRL_TIMESTEP,
                                                         cur_pos=state[0:3],
                                                         cur_quat=state[3:7],
                                                         cur_vel=state[10:13],
                                                         cur_ang_vel=state[13:16],
                                                         target_pos=state[0:3]+0.1*np.array([0,0,target[0]])
                                                         )
-                rpm[k,:] = rpm
+                rpm[k,:] = res
             else:
-                print("[ERROR] in BaseMultiagentAviary._preprocessAction()")
+                print("[ERROR] in BaseRLAviary._preprocessAction()")
                 exit()
         return rpm
 
@@ -244,28 +250,34 @@ def _observationSpace(self):
 
         """
         if self.OBS_TYPE == ObservationType.RGB:
-            # return spaces.Dict({i: spaces.Box(low=0,
-            #                                   high=255,
-            #                                   shape=(self.IMG_RES[1], self.IMG_RES[0], 4), dtype=np.uint8
-            #                                   ) for i in range(self.NUM_DRONES)})
             return spaces.Box(low=0,
                               high=255,
                               shape=(self.NUM_DRONES, self.IMG_RES[1], self.IMG_RES[0], 4), dtype=np.uint8)
         elif self.OBS_TYPE == ObservationType.KIN:
-            ############################################################
-            #### OBS OF SIZE 20 (WITH QUATERNION AND RPMS)
-            #### Observation vector ### X        Y        Z       Q1   Q2   Q3   Q4   R       P       Y       VX       VY       VZ       WX       WY       WZ       P0            P1            P2            P3
-            # obs_lower_bound = np.array([-1,      -1,      0,      -1,  -1,  -1,  -1,  -1,     -1,     -1,     -1,      -1,      -1,      -1,      -1,      -1,      -1,           -1,           -1,           -1])
-            # obs_upper_bound = np.array([1,       1,       1,      1,   1,   1,   1,   1,      1,      1,      1,       1,       1,       1,       1,       1,       1,            1,            1,            1])          
-            # return spaces.Box( low=obs_lower_bound, high=obs_upper_bound, dtype=np.float32 )
             ############################################################
             #### OBS SPACE OF SIZE 12
-            obs_lower_bound = np.array([[-1,-1,0, -1,-1,-1, -1,-1,-1, -1,-1,-1] for i in range(self.NUM_DRONES)])
-            obs_upper_bound = np.array([[1,1,1, 1,1,1, 1,1,1, 1,1,1] for i in range(self.NUM_DRONES)])
+            #### Observation vector ### X        Y        Z       Q1   Q2   Q3   Q4   R       P       Y       VX       VY       VZ       WX       WY       WZ
+            lo = -np.inf
+            hi = np.inf
+            obs_lower_bound = np.array([[lo,lo,0, lo,lo,lo,lo,lo,lo,lo,lo,lo] for i in range(self.NUM_DRONES)])
+            obs_upper_bound = np.array([[hi,hi,hi,hi,hi,hi,hi,hi,hi,hi,hi,hi] for i in range(self.NUM_DRONES)])
+            #### Add action buffer to observation space ################
+            act_lo = -1
+            act_hi = +1
+            for i in range(self.ACTION_BUFFER_SIZE):
+                if self.ACT_TYPE in [ActionType.RPM, ActionType.VEL]:
+                    obs_lower_bound = np.hstack([obs_lower_bound, np.array([[act_lo,act_lo,act_lo,act_lo] for i in range(self.NUM_DRONES)])])
+                    obs_upper_bound = np.hstack([obs_upper_bound, np.array([[act_hi,act_hi,act_hi,act_hi] for i in range(self.NUM_DRONES)])])
+                elif self.ACT_TYPE==ActionType.PID:
+                    obs_lower_bound = np.hstack([obs_lower_bound, np.array([[act_lo,act_lo,act_lo] for i in range(self.NUM_DRONES)])])
+                    obs_upper_bound = np.hstack([obs_upper_bound, np.array([[act_hi,act_hi,act_hi] for i in range(self.NUM_DRONES)])])
+                elif self.ACT_TYPE in [ActionType.ONE_D_RPM, ActionType.ONE_D_PID]:
+                    obs_lower_bound = np.hstack([obs_lower_bound, np.array([[act_lo] for i in range(self.NUM_DRONES)])])
+                    obs_upper_bound = np.hstack([obs_upper_bound, np.array([[act_hi] for i in range(self.NUM_DRONES)])])
             return spaces.Box(low=obs_lower_bound, high=obs_upper_bound, dtype=np.float32)
             ############################################################
         else:
-            print("[ERROR] in BaseMultiagentAviary._observationSpace()")
+            print("[ERROR] in BaseRLAviary._observationSpace()")
 
     ################################################################################
 
@@ -291,35 +303,20 @@ def _computeObs(self):
                                           path=self.ONBOARD_IMG_PATH+"drone_"+str(i),
                                           frame_num=int(self.step_counter/self.IMG_CAPTURE_FREQ)
                                           )
-            return np.array([self.rgb[i] for i in range(self.NUM_DRONES)])
+            return np.array([self.rgb[i] for i in range(self.NUM_DRONES)]).astype('float32')
         elif self.OBS_TYPE == ObservationType.KIN:
-            ############################################################
-            #### OBS OF SIZE 20 (WITH QUATERNION AND RPMS)
-            # return {   i   : self._clipAndNormalizeState(self._getDroneStateVector(i)) for i in range(self.NUM_DRONES) }
             ############################################################
             #### OBS SPACE OF SIZE 12
             obs_12 = np.zeros((self.NUM_DRONES,12))
             for i in range(self.NUM_DRONES):
-                obs = self._clipAndNormalizeState(self._getDroneStateVector(i))
+                #obs = self._clipAndNormalizeState(self._getDroneStateVector(i))
+                obs = self._getDroneStateVector(i)
                 obs_12[i, :] = np.hstack([obs[0:3], obs[7:10], obs[10:13], obs[13:16]]).reshape(12,)
-            return np.array([obs_12[i, :] for i in range(self.NUM_DRONES)])
+            ret = np.array([obs_12[i, :] for i in range(self.NUM_DRONES)]).astype('float32')
+            #### Add action buffer to observation #######################
+            for i in range(self.ACTION_BUFFER_SIZE):
+                ret = np.hstack([ret, np.array([self.action_buffer[i][j, :] for j in range(self.NUM_DRONES)])])
+            return ret
             ############################################################
         else:
-            print("[ERROR] in BaseMultiagentAviary._computeObs()")
-
-    ################################################################################
-
-    def _clipAndNormalizeState(self,
-                               state
-                               ):
-        """Normalizes a drone's state to the [-1,1] range.
-
-        Must be implemented in a subclass.
-
-        Parameters
-        ----------
-        state : ndarray
-            Array containing the non-normalized state of a single drone.
-
-        """
-        raise NotImplementedError
+            print("[ERROR] in BaseRLAviary._computeObs()")
diff --git a/gym_pybullet_drones/envs/BetaAviary.py b/gym_pybullet_drones/envs/BetaAviary.py
@@ -59,7 +59,7 @@ def __init__(self,
         gui : bool, optional
             Whether to use PyBullet's GUI.
         record : bool, optional
-            Whether to save a video of the simulation in folder `files/videos/`.
+            Whether to save a video of the simulation.
         obstacles : bool, optional
             Whether to add obstacles to the simulation.
         user_debug_gui : bool, optional

diff --git a/gym_pybullet_drones/envs/CtrlAviary.py b/gym_pybullet_drones/envs/CtrlAviary.py
@@ -47,7 +47,7 @@ def __init__(self,
         gui : bool, optional
             Whether to use PyBullet's GUI.
         record : bool, optional
-            Whether to save a video of the simulation in folder `files/videos/`.
+            Whether to save a video of the simulation.
         obstacles : bool, optional
             Whether to add obstacles to the simulation.
         user_debug_gui : bool, optional