diff --git a/docs/source/algorithms/comparision.rst b/docs/source/algorithms/comparision.rst index c76c755..c40a92e 100644 --- a/docs/source/algorithms/comparision.rst +++ b/docs/source/algorithms/comparision.rst @@ -4,12 +4,12 @@ Trustworthy Implementation To ensure that SafePO's implementation is trustworthy, we have compared our algorithms' performance with open source implementations of the same algorithms. As some of the algorithms can not be found in open source, we selected -``PPO-Lag``, ``TRPOLag``, ``CPO`` and ``FOCOPS`` for comparison. +``PPO-Lag``, ``TRPO-Lag``, ``CPO`` and ``FOCOPS`` for comparison. We have compared the following algorithms: - ``PPO-Lag``: `OpenAI Baselines: Safety Starter Agents `_ -- ``TRPOLag``: `OpenAI Baselines: Safety Starter Agents `_, `RL Safety Algorithms `_ +- ``TRPO-Lag``: `OpenAI Baselines: Safety Starter Agents `_, `RL Safety Algorithms `_ - ``CPO``: `OpenAI Baselines: Safety Starter Agents `_, `RL Safety Algorithms `_ - ``FOCOPS``: `Original Implementation `_ @@ -43,7 +43,7 @@ The results are shown as follows. - .. tab-item:: TRPOLag + .. tab-item:: TRPO-Lag .. raw:: html diff --git a/docs/source/algorithms/curve.rst b/docs/source/algorithms/curve.rst index 29e4b00..1cd785f 100644 --- a/docs/source/algorithms/curve.rst +++ b/docs/source/algorithms/curve.rst @@ -87,7 +87,7 @@ Second order - .. tab-item:: TRPOLag + .. tab-item:: TRPO-Lag .. raw:: html diff --git a/docs/source/algorithms/lag.rst b/docs/source/algorithms/lag.rst index bbd93b5..7bdde4c 100644 --- a/docs/source/algorithms/lag.rst +++ b/docs/source/algorithms/lag.rst @@ -16,7 +16,7 @@ Experiment Results - .. tab-item:: TRPOLag + .. tab-item:: TRPO-Lag .. raw:: html diff --git a/docs/source/usage/benchmark.rst b/docs/source/usage/benchmark.rst index ad9fba2..bad036d 100644 --- a/docs/source/usage/benchmark.rst +++ b/docs/source/usage/benchmark.rst @@ -22,6 +22,21 @@ figures in the paper. You can also run the multi-agent benchmarking tools by run After running the benchmarking tools, you can run the `plooting tools and evaluation tools <./eval.html>`_ to show the results. +.. note:: + + The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge. + You can run the ``Doggo`` agent by running: + + .. code-block:: bash + + cd safepo/single_agent + python benchmark.py --tasks \ + SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \ + SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \ + SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \ + SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \ + --workers 1 --total-steps 100000000 + .. warning:: The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers diff --git a/docs/source/usage/make.rst b/docs/source/usage/make.rst index e0f19a6..3afe08a 100644 --- a/docs/source/usage/make.rst +++ b/docs/source/usage/make.rst @@ -42,6 +42,21 @@ The training logs would be saved in ``safepo/runs/benchmark``, while the evaluat The default number of workers is 1. To run the benchmarking tools in parallel, you can increase the number of workers by changing the `workers` configuration in `safepo/single_agent/benchmark.py` and `safepo/multi_agent/benchmark.py`. +.. note:: + + The ``Doggo`` agent is not included in the benchmarking tools because it needs 1e8 training steps to converge. + You can run the ``Doggo`` agent by running: + + .. code-block:: bash + + cd safepo/single_agent + python benchmark.py --tasks \ + SafetyDoggoButton1-v0 SafetyDoggoButton2-v0 \ + SafetyDoggoCircle1-v0 SafetyDoggoCircle2-v0 \ + SafetyDoggoPush1-v0 SafetyDoggoPush2-v0 \ + SafetyDoggoGoal1-v0 SafetyDoggoGoal2-v0 \ + --workers 1 --total-steps 100000000 + The terminal output would be like: .. code-block:: bash diff --git a/safepo/single_agent/benchmark.py b/safepo/single_agent/benchmark.py index 3bd5229..571eb17 100644 --- a/safepo/single_agent/benchmark.py +++ b/safepo/single_agent/benchmark.py @@ -2,7 +2,7 @@ import shlex import subprocess -navi_robots = ['Car', 'Point', 'Racecar'] +navi_robots = ['Car', 'Point', 'Racecar', 'Ant'] navi_tasks = ['Button', 'Circle', 'Goal', 'Push'] diffculies = ['1', '2'] vel_robots = ['Ant', 'HalfCheetah', 'Hopper', 'Walker2d', 'Swimmer', 'Humanoid'] @@ -57,7 +57,7 @@ def parse_args(): "--experiment", type=str, default="benchmark", help="name of the experiment" ) parser.add_argument( - "--total-steps", type=int, default=1000000, help="total number of steps" + "--total-steps", type=int, default=10000000, help="total number of steps" ) parser.add_argument( "--num-envs", type=int, default=10, help="number of environments to run in parallel"