Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reproduce 0.44m/s "Restricted"? #1

Open
YouJiacheng opened this issue Nov 2, 2023 · 3 comments
Open

How to reproduce 0.44m/s "Restricted"? #1

YouJiacheng opened this issue Nov 2, 2023 · 3 comments
Labels
question Further information is requested resolved

Comments

@YouJiacheng
Copy link

Dear author:
Thanks a lot for inventing APRL and open sourcing an official implementation.

I have a question about the performance of "Restricted".
Fig 3 in [46]Demonstrating a walk in the park: Learning to walk in 20 minutes with model-free reinforcement learning showed that robot can move on the flat ground at an average speed of 0.06m/s after 20min.
Fig 6. in the APRL paper showed that robot can move on the flat ground at an average speed of 0.44m/s after 20min.

There is a 7x difference. I have noticed that the APRL paper used Go1 while [46] used A1, and different velocity measurement might be applied (tracking camera vs. Kalman filter). I want to know if there is any other difference between "Restricted" and [46].

Thanks!

@realquantumcookie
Copy link
Owner

realquantumcookie commented Nov 2, 2023

Hi there,
Thank you for your question. The difference between "Restricted" and [46] are:

  1. The restricted method share the same action space and similar observation space as [46], but have a bit different reward shaping. The only difference in the observation space is that we used normalized foot contact forces (in the restricted method) instead of binary foot contact observations used in [46] due to our Go1 robot foot contact sensor being not very reliable...
  2. The velocity measurement for the restricted method comes from a tracking camera while the velocity measurement for [46] comes from a Kalman filter combining information from (1) forward kinematics (2) onboard accelerometer. This measurement is used in both the observation and the reward function.
  3. Please look at our project website for the reward function. The main changes affecting the learning speed is that we scaled up the reward for velocity and used a near quadratic term for the velocity reward. This kind of reward shaping makes the algorithm pick up reward signals earlier in training and thus makes the training faster.

@YouJiacheng
Copy link
Author

Thank you for your comprehensive and in-depth explanation!

@realquantumcookie realquantumcookie added question Further information is requested resolved labels Nov 3, 2023
@realquantumcookie
Copy link
Owner

Hi @YouJiacheng I will leave this issue open in case other people have similar questions as yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested resolved
Projects
None yet
Development

No branches or pull requests

2 participants