-
Notifications
You must be signed in to change notification settings - Fork 937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TdLambdaReturns for alpha_zero_torch #940
base: master
Are you sure you want to change the base?
Conversation
Very cool!!! @christianjans are you able to take a look? Appreciate the continued help, this code is seeing some good use 👍 |
Thanks for another addition, @mattrek! And yes, it's my pleasure. I will likely have time this weekend or early next week to take a look at it. Happy to see the code is being used! |
Ok, sounds good, and thanks for the kind words. @lanctot: I emailed you some questions about OpenSpiel back in May, and you mentioned in reply something like this might help with backgammon... I finally got some time to work on it, and am happy to share. |
Yeah very cool! This is a great addition. The implementation is non-trivial :) So might take a bit of time to review, but it's great to support this case... and indeed I hope it helps for Backgammon! |
Just an update on my end: Took a quick look-through but will look at it more thoroughly on the weekend and publish the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks it looks really great. I just have a question to help my understanding.
I unfortunately do not have time right now to help verify that the implementation is correct, but I was wondering if you had any unit tests for this feature that you have already developed or can think of that can also be added to this PR?
As a related side note, when I get more time, I would like to add more unit tests for AlphaZero Torch in general.
I've been using this locally while hacking with backgammon. Lemme know if it's worth merging into the repo. Logs showing the state values of a trajectory and the training values returned from TdLambdaReturns for different settings are pasted below.
Sorry this is such a large change - I can break it up if needed for review:
Examples from tic_tac_toe for various settings: