Skip to content

Commit

Permalink
docs: update README
Browse files Browse the repository at this point in the history
docs: update README
  • Loading branch information
Gaiejj authored Aug 19, 2023
2 parents 427fd66 + 7a1b1f9 commit d679a91
Show file tree
Hide file tree
Showing 10 changed files with 47 additions and 41 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

[![Organization](https://img.shields.io/badge/Organization-PKU--Alignment-blue)](https://github.com/PKU-Alignment)
[![License](https://img.shields.io/github/license/PKU-Alignment/OmniSafe?label=license)](#license)
[![codecov](https://codecov.io/gh/PKU-Alignment/Safe-Policy-Optimization/graph/badge.svg?token=KF0UM0UNXW)](https://codecov.io/gh/PKU-Alignment/Safe-Policy-Optimization)
[![Documentation Status](https://readthedocs.org/projects/safe-policy-optimization/badge/?version=latest)](https://safe-policy-optimization.readthedocs.io/en/latest/?badge=latest)

</div>

Expand Down
18 changes: 8 additions & 10 deletions docs/source/algorithms/comparision.rst
Original file line number Diff line number Diff line change
@@ -1,27 +1,25 @@
Trustworthy Implementation
==========================

To ensure that the implementation is trustworthy, we have compared our
implementation with open source implementations of the same algorithms.
To ensure that SafePO's implementation is trustworthy, we have compared
our algorithms' performance with open source implementations of the same algorithms.
As some of the algorithms can not be found in open source, we selected
``PPOLag``, ``TRPOLag``, ``CPO`` and ``FOCOPS`` for comparison.
``PPO-Lag``, ``TRPOLag``, ``CPO`` and ``FOCOPS`` for comparison.

We have compared the following algorithms:

- ``PPOLag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_
- ``PPO-Lag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_
- ``TRPOLag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
- ``CPO``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
- ``FOCOPS``: `Original Implementation <https://github.com/ymzhang01/focops>`_

We compared those alforithms in 14 tasks from `Safety-Gymnasium <https://github.com/PKU-Alignment/safety-gymnasium>`_,
We compared those alforithms in 12 tasks from `Safety-Gymnasium <https://github.com/PKU-Alignment/safety-gymnasium>`_,
they are:

- ``SafetyPointButton1-v0``
- ``SafetyPointCircle1-v0``
- ``SafetyPointGoal1-v0``
- ``SafetyPointPush1-v0``
- ``SafetyCarButton1-v0``
- ``SafetyCarCircle1-v0``
- ``SafetyCarGoal1-v0``
- ``SafetyCarPush1-v0``
- ``SafetyAntVelocity-v1``
Expand All @@ -35,11 +33,11 @@ The results are shown as follows.

.. tab-set::

.. tab-item:: PPOLag
.. tab-item:: PPO-Lag

.. raw:: html

<iframe src="https://wandb.ai/pku_rl/SafePO/reports/Comparison-of-PPOLag-s-Implementation--Vmlldzo1MTgxOTkx" style="border:none;width:90%; height:1000px" >
<iframe src="https://wandb.ai/pku_rl/SafePO/reports/Comparison-of-PPO-Lag-s-Implementation--Vmlldzo1MTgxOTkx" style="border:none;width:90%; height:1000px" >

.. raw:: html

Expand All @@ -49,7 +47,7 @@ The results are shown as follows.

.. raw:: html

<iframe src="https://wandb.ai/pku_rl/SafePO/reports/Comparison-of-TRPOLag-s-Implementation--Vmlldzo1MTgyMDAz" style="border:none;width:90%; height:1000px" >
<iframe src="https://wandb.ai/pku_rl/SafePO/reports/Comparison-of-TRPO-Lag-s-Implementation--Vmlldzo1MTgyMDAz" style="border:none;width:90%; height:1000px" >

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion docs/source/algorithms/curve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ First order

</iframe>

.. tab-item:: PPOLag
.. tab-item:: PPO-Lag

.. raw:: html

Expand Down
4 changes: 2 additions & 2 deletions docs/source/algorithms/first_order.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
First Order Projection
======================
First Order Projection Methods
==============================

Experiment Results
------------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/algorithms/lag.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Lagrangian Method
=================
Lagrangian Methods
==================

Experiment Results
------------------

.. tab-set::

.. tab-item:: PPOLag
.. tab-item:: PPO-Lag

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ results (eavluation outcomes, training curves) in ``safepo/results``.

.. toctree::
:hidden:
:caption: ALGORITHM
:caption: ALGORITHMS

algorithms/curve
algorithms/lag
Expand Down
4 changes: 2 additions & 2 deletions docs/source/usage/eval.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Evaluating Trained Model
========================
Evaluating Trained Models
=========================

Model Evaluation
----------------
Expand Down
4 changes: 2 additions & 2 deletions docs/source/usage/implement.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ Breifly, the ``PPO`` in SafePO has the following characteristics, which are also
Beyond the above characteristics, the ``PPO`` in SafePO also provides a training pipeline for data collection and training.
You can customize new alforithms based on it.

Next we will provide a detailed example to show how to customize the ``PPO`` algorithm to ``PPOLag`` algorithm.
Next we will provide a detailed example to show how to customize the ``PPO`` algorithm to ``PPO-Lag`` algorithm.

Example: PPOLag
Example: PPO-Lag
---------------

The Lagrangian multiplier is a useful tool to control the constraint violation in the Safe RL algorithms.
Expand Down
34 changes: 20 additions & 14 deletions docs/source/usage/make.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Efficient Command
=================
Efficient Commands
==================

To help users quickly reporduce our results,
we provide a command line tool for easy installation, benchmarking, and evaluation.
Expand All @@ -9,6 +9,11 @@ One line benchmark running

First, create a conda environment with Python 3.8.

.. code-block:: bash
conda create -n safepo python=3.8
conda activate safepo
Then, run the following command to install SafePO and run the full benchmark:

.. code-block:: bash
Expand Down Expand Up @@ -42,19 +47,19 @@ The terminal output would be like:
.. code-block:: bash
======= commands to run:
running python macpo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 2000 --num-envs 1
running python mappo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 2000 --num-envs 1
running python mappolag.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 2000 --num-envs 1
running python happo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 2000 --num-envs 1
running python macpo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python mappolag.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
running python happo.py --agent-conf 2x4 --scenario Ant --seed 0 --write-terminal False --experiment benchmark --headless True --total-steps 10000000
...
running python pcpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python ppo_lag.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python cup.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python focops.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python rcpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python trpo_lag.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python cpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python cppo_pid.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 2000 --num-envs 1 --steps-per-epoch 1000
running python pcpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python ppo_lag.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python cup.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python focops.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python rcpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python trpo_lag.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python cpo.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
running python cppo_pid.py --task SafetyAntVelocity-v1 --seed 0 --write-terminal False --experiment benchmark --total-steps 10000000
...
Plotting from...
==================================================
Expand All @@ -81,3 +86,4 @@ The terminal output would be like:
After 1 episodes evaluation, the focops in SafetyPointGoal1-v0 evaluation reward: 12.21±2.18, cost: 26.0±19.51, the reuslt is saved in ./results/benchmark/eval_result.txt
Start evaluating cppo_pid in SafetyPointGoal1-v0
After 1 episodes evaluation, the cppo_pid in SafetyPointGoal1-v0 evaluation reward: 13.42±0.44, cost: 18.79±2.1, the reuslt is saved in ./results/benchmark/eval_result.txt
...
12 changes: 6 additions & 6 deletions safepo/common/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ def create_env() -> Callable:

def make_sa_isaac_env(args, cfg, sim_params):
"""
Creates and returns a VecTaskPython environment for the single agent Shadow Hand task.
Creates and returns a VecTaskPython environment for the single agent Isaac Gym task.
Args:
args: Command-line arguments.
Expand All @@ -90,10 +90,10 @@ def make_sa_isaac_env(args, cfg, sim_params):
sim_params: Parameters for the simulation.
Returns:
env: VecTaskPython environment for the single agent Shadow Hand task.
env: VecTaskPython environment for the single agent Isaac Gym task.
Warning:
SafePO's single agent Shadow Hand task is not ready for use yet.
SafePO's single agent Isaac Gym task is not ready for use yet.
"""
# create native task and pass custom config
device_id = args.device_id
Expand All @@ -119,7 +119,7 @@ def make_sa_isaac_env(args, cfg, sim_params):

def make_ma_mujoco_env(scenario, agent_conf, seed, cfg_train):
"""
Creates and returns a multi-agent environment using Mujoco scenarios.
Creates and returns a multi-agent environment using MuJoCo scenarios.
Args:
args: Command-line arguments.
Expand Down Expand Up @@ -152,7 +152,7 @@ def init_env():

def make_ma_isaac_env(args, cfg, cfg_train, sim_params, agent_index):
"""
Creates and returns a multi-agent environment for the Shadow Hand task.
Creates and returns a multi-agent environment for the Isaac Gym task.
Args:
args: Command-line arguments.
Expand All @@ -162,7 +162,7 @@ def make_ma_isaac_env(args, cfg, cfg_train, sim_params, agent_index):
agent_index: Index of the agent within the multi-agent environment.
Returns:
env: A multi-agent environment for the Shadow Hand task.
env: A multi-agent environment for the Isaac Gym task.
"""
# create native task and pass custom config
device_id = args.device_id
Expand Down

0 comments on commit d679a91

Please sign in to comment.