Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lower bounded target q #1330

Open
wants to merge 8 commits into
base: pytorch
Choose a base branch
from
Open

lower bounded target q #1330

wants to merge 8 commits into from

Conversation

le-horizon
Copy link
Contributor

minimum change for lower bounded value target (for episodic return, goal distance return, and n-step bootstrapped return)

@@ -118,6 +118,72 @@ def action_importance_ratio(action_distribution,
return importance_ratio, importance_ratio_clipped


def generalized_advantage_estimation(rewards,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't change generalized_advantage_estimation, only moved it closer to action_importance_ratio, git messed up the diff..

@le-horizon le-horizon mentioned this pull request May 17, 2022
improve_w_goal_return: Use return calculated from the distance to hindsight
goals. Only supports batch_length == 2, one step td.
improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest
bootstrapped return to lower bound the value target of the 1st step.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add formula

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Le Horizon added 2 commits June 27, 2022 10:57
Copy link
Contributor Author

@le-horizon le-horizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emailweixu and @hnyu , I've separated out the HER related logic (the part that move batch_info fields into alg_info and loss calculation) into a new file her_algorithms.

Let me know how you like this version.

Thanks,
Le

improve_w_goal_return: Use return calculated from the distance to hindsight
goals. Only supports batch_length == 2, one step td.
improve_w_nstep_bootstrap: Look ahead 2 to n steps, and take the largest
bootstrapped return to lower bound the value target of the 1st step.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant