Skip to content

Commit

Permalink
feat: update benchmark config
Browse files Browse the repository at this point in the history
feat: update benchmark config
  • Loading branch information
Gaiejj authored Aug 19, 2023
2 parents 669e688 + 5ec6dc7 commit 40b8052
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 7 deletions.
6 changes: 3 additions & 3 deletions docs/source/algorithms/comparision.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ Trustworthy Implementation
To ensure that SafePO's implementation is trustworthy, we have compared
our algorithms' performance with open source implementations of the same algorithms.
As some of the algorithms can not be found in open source, we selected
``PPO-Lag``, ``TRPOLag``, ``CPO`` and ``FOCOPS`` for comparison.
``PPO-Lag``, ``TRPO-Lag``, ``CPO`` and ``FOCOPS`` for comparison.

We have compared the following algorithms:

- ``PPO-Lag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_
- ``TRPOLag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
- ``TRPO-Lag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
- ``CPO``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
- ``FOCOPS``: `Original Implementation <https://github.com/ymzhang01/focops>`_

Expand Down Expand Up @@ -43,7 +43,7 @@ The results are shown as follows.

</iframe>

.. tab-item:: TRPOLag
.. tab-item:: TRPO-Lag

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion docs/source/algorithms/curve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Second order

</iframe>

.. tab-item:: TRPOLag
.. tab-item:: TRPO-Lag

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion docs/source/algorithms/lag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Experiment Results

</iframe>

.. tab-item:: TRPOLag
.. tab-item:: TRPO-Lag

.. raw:: html

Expand Down
15 changes: 15 additions & 0 deletions docs/source/usage/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,21 @@ figures in the paper. You can also run the multi-agent benchmarking tools by run
After running the benchmarking tools, you can run the `plooting tools and evaluation tools <./eval.html>`_ to
show the results.

.. note::

The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge.
You can run the ``Doggo`` agent by running:

.. code-block:: bash
cd safepo/single_agent
python benchmark.py --tasks \
SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \
SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \
SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \
SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \
--workers 1 --total-steps 100000000
.. warning::

The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers
Expand Down
15 changes: 15 additions & 0 deletions docs/source/usage/make.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,21 @@ The training logs would be saved in ``safepo/runs/benchmark``, while the evaluat
The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers
by changing the `workers` configuration in `safepo/single_agent/benchmark.py` and `safepo/multi_agent/benchmark.py`.

.. note::

The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge.
You can run the ``Doggo`` agent by running:

.. code-block:: bash
cd safepo/single_agent
python benchmark.py --tasks \
SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \
SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \
SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \
SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \
--workers 1 --total-steps 100000000
The terminal output would be like:

.. code-block:: bash
Expand Down
4 changes: 2 additions & 2 deletions safepo/single_agent/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import shlex
import subprocess

navi_robots = ['Car', 'Point', 'Racecar']
navi_robots = ['Car', 'Point', 'Racecar', 'Ant']
navi_tasks = ['Button', 'Circle', 'Goal', 'Push']
diffculies = ['1', '2']
vel_robots = ['Ant', 'HalfCheetah', 'Hopper', 'Walker2d', 'Swimmer', 'Humanoid']
Expand Down Expand Up @@ -57,7 +57,7 @@ def parse_args():
"--experiment", type=str, default="benchmark", help="name of the experiment"
)
parser.add_argument(
"--total-steps", type=int, default=1000000, help="total number of steps"
"--total-steps", type=int, default=10000000, help="total number of steps"
)
parser.add_argument(
"--num-envs", type=int, default=10, help="number of environments to run in parallel"
Expand Down

0 comments on commit 40b8052

Please sign in to comment.