fix: gate foot_clearance_reward by command and body movement to avoid rewarding standing still #80
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The original impl of
foot_clearance_reward()encourages the agent to stand still, which could maximize the reward (since exp(-0)=1).This PR add command and current velocity weights to tell whether the robot is staying still or moving.
Wrong implementation of the function is the main reason why the examples in this repo require a huge amount of iterations to just learn to walk. On contrast, examples in Isaaclab official impl takes only about 500 iters with 4096 envs to walk in complex terrain (though it takes base_lin_vel in policy observation)