feat: update benchmark config

PKU-Alignment · Aug 19, 2023 · 40b8052 · 40b8052
2 parents 669e688 + 5ec6dc7
commit 40b8052
Show file tree

Hide file tree

Showing 6 changed files with 37 additions and 7 deletions.
diff --git a/docs/source/algorithms/comparision.rst b/docs/source/algorithms/comparision.rst
@@ -4,12 +4,12 @@ Trustworthy Implementation
 To ensure that SafePO's implementation is trustworthy, we have compared 
 our algorithms' performance with open source implementations of the same algorithms.
 As some of the algorithms can not be found in open source, we selected
-``PPO-Lag``, ``TRPOLag``, ``CPO`` and ``FOCOPS`` for comparison. 
+``PPO-Lag``, ``TRPO-Lag``, ``CPO`` and ``FOCOPS`` for comparison. 
 
 We have compared the following algorithms:
 
 - ``PPO-Lag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_
-- ``TRPOLag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
+- ``TRPO-Lag``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
 - ``CPO``: `OpenAI Baselines: Safety Starter Agents <https://github.com/openai/safety-starter-agents>`_, `RL Safety Algorithms <https://github.com/SvenGronauer/RL-Safety-Algorithms>`_
 - ``FOCOPS``: `Original Implementation <https://github.com/ymzhang01/focops>`_
 
@@ -43,7 +43,7 @@ The results are shown as follows.
 
          </iframe>
 
-    .. tab-item:: TRPOLag
+    .. tab-item:: TRPO-Lag
 
       .. raw:: html
 

diff --git a/docs/source/algorithms/curve.rst b/docs/source/algorithms/curve.rst
@@ -87,7 +87,7 @@ Second order
 
          </iframe>
 
-    .. tab-item:: TRPOLag
+    .. tab-item:: TRPO-Lag
 
       .. raw:: html
 

diff --git a/docs/source/algorithms/lag.rst b/docs/source/algorithms/lag.rst
@@ -16,7 +16,7 @@ Experiment Results
 
          </iframe>
 
-    .. tab-item:: TRPOLag
+    .. tab-item:: TRPO-Lag
 
       .. raw:: html
 

diff --git a/docs/source/usage/benchmark.rst b/docs/source/usage/benchmark.rst
@@ -22,6 +22,21 @@ figures in the paper. You can also run the multi-agent benchmarking tools by run
 After running the benchmarking tools, you can run the `plooting tools and evaluation tools <./eval.html>`_  to
 show the results. 
 
+.. note::
+
+    The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge.
+    You can run the ``Doggo`` agent by running:
+
+    .. code-block:: bash
+
+        cd safepo/single_agent
+        python benchmark.py --tasks \
+        SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \
+        SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \
+        SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \
+        SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \
+        --workers 1 --total-steps 100000000
+
 .. warning::
 
     The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers

diff --git a/docs/source/usage/make.rst b/docs/source/usage/make.rst
@@ -42,6 +42,21 @@ The training logs would be saved in ``safepo/runs/benchmark``, while the evaluat
     The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers
     by changing the `workers` configuration in `safepo/single_agent/benchmark.py` and `safepo/multi_agent/benchmark.py`.
 
+.. note::
+
+    The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge.
+    You can run the ``Doggo`` agent by running:
+
+    .. code-block:: bash
+
+        cd safepo/single_agent
+        python benchmark.py --tasks \
+        SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \
+        SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \
+        SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \
+        SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \
+        --workers 1 --total-steps 100000000
+
 The terminal output would be like:
 
 .. code-block:: bash

diff --git a/safepo/single_agent/benchmark.py b/safepo/single_agent/benchmark.py
@@ -2,7 +2,7 @@
 import shlex
 import subprocess
 
-navi_robots = ['Car', 'Point', 'Racecar']
+navi_robots = ['Car', 'Point', 'Racecar', 'Ant']
 navi_tasks = ['Button', 'Circle', 'Goal', 'Push']
 diffculies = ['1', '2']
 vel_robots = ['Ant', 'HalfCheetah', 'Hopper', 'Walker2d', 'Swimmer', 'Humanoid']
@@ -57,7 +57,7 @@ def parse_args():
         "--experiment", type=str, default="benchmark", help="name of the experiment"
     )
     parser.add_argument(
-        "--total-steps", type=int, default=1000000, help="total number of steps"
+        "--total-steps", type=int, default=10000000, help="total number of steps"
     )
     parser.add_argument(
         "--num-envs", type=int, default=10, help="number of environments to run in parallel"