Fetch Environments with SAC + Action Repeat

By using regular Soft Actor Critic (SAC) with an Action Repeat length of 3, we are able to achieve very good performance on the Fetch environments introduced here: https://openai.com/blog/ingredients-for-robotics-research/

The performance/sample efficiency is better than DDPG+HER (Hindsight Experience Replay) in some environments as in their report: https://arxiv.org/pdf/1802.09464.pdf, despite the lack of HER in our implementation.

We have 38 parallel rollouts and run for 4.75M timesteps in total, just like in the Fetch report. We performed a search around the unroll_length, which dictates the frequency at which we train our policy and Q functions. We found that an unroll length of 8 (50 (episode length) // [3 (Action repeat length) * 2]), which essentially doubles the number of training steps as in the report, is the best for both sample efficiency and final performance. All other parameters were chosen to match those in the report as closely as possible, although we do not incorporate an action L2 norm penalty or observation normalization.

Note that due to the Action Repeat length of 3, the timesteps below should be multiplied by 3 for the real environment steps taken, and the videos have a 3x speedup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!