Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for Dict action space #301

Merged
merged 4 commits into from
Mar 19, 2020
Merged

added support for Dict action space #301

merged 4 commits into from
Mar 19, 2020

Conversation

JaCoderX
Copy link
Contributor

@JaCoderX JaCoderX commented Feb 6, 2020

Epsilon greedy policy error when generating new action from Dict Action space. #276

Epsilon greedy policy error when generating new action from Dict Action space. #276
@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Feb 6, 2020

@googlebot I signed it!

@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link
Collaborator

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels Feb 6, 2020
DQN loss calculation error when using Dict Action space #297
@JaCoderX JaCoderX changed the title added support for Dict action space for Epsilon greedy policy added support for Dict action space Feb 6, 2020
@tfboyd tfboyd requested a review from oars February 6, 2020 17:18
@tfboyd
Copy link
Member

tfboyd commented Feb 6, 2020

@oars I saw you did a +2 on the internal sync, so I added you as the reviewer here, you can assign someone else.

@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 7, 2020

We're trying to tighten up the contract that Agents present to the world, in particular we don't want to allow arbitrary nested structures "so long as its only got one entry", if the agent does not support more than one action in the action space. Better if you just add a wrapper that converts the tensor to a dict.

If we start allowing this kind of soft contract, it makes it much harder for future developers to know which agents really support multiple action spaces; and makes it harder for us to add pytype to tf-agents. Sorry; will have to reject this PR.

@ebrevdo ebrevdo closed this Feb 7, 2020
@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 7, 2020

(a good followup PR would be to disallow nested action_specs in this agent in the first place).

@JaCoderX
Copy link
Contributor Author

JaCoderX commented Feb 7, 2020

We're trying to tighten up the contract that Agents present to the world, in particular we don't want to allow arbitrary nested structures "so long as its only got one entry", if the agent does not support more than one action in the action space.

@ebrevdo, I agree with you that the code need to be simple and clean for the agents, especially if a certain agent implementation (this case DQN) is meant for just one dimensional action space.

but this PR contain 2 commits, one for the DQN and the second is for the epsilon greedy policy.
As far as I understand, the epsilon greedy policy isn't bounded only for DQN and can be used by other agents that can have a structured action space? is that commit also too case specific?

personally I just wanted to test the tf-agent framework with an external and complex gym project and see they work well together. I use the DQN tutorial just for an easy entry to tf-agent framework.

anyway thanks for helping out :)

@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 21, 2020

Ah I didn't see the changes to epsilon_greedy. That change is fine! Can you revert the change to DQN agent?

@ebrevdo ebrevdo reopened this Feb 21, 2020
@@ -546,7 +547,8 @@ def _compute_next_q_values(self, next_time_steps):
# action constraints are respected and helps centralize the greedy logic.
greedy_actions = self._target_greedy_policy.action(
next_time_steps, dummy_state).action

greedy_actions = tf.nest.flatten(greedy_actions)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this file please.

@JaCoderX
Copy link
Contributor Author

DQN changes had been reverted

@ebrevdo
Copy link
Contributor

ebrevdo commented Mar 17, 2020

Is there a unit test?

@tfboyd
Copy link
Member

tfboyd commented Mar 17, 2020

@JacobHanouna If there is a unit test, please reference it. If not can you add one and we can merge this can get it closed. Not to be a drag, but without a test the chances of it getting broken by some other change increases greatly. Thank you for understanding. We are so close on this and appreciate you powering through to the end.

@JaCoderX
Copy link
Contributor Author

@tfboyd
Copy link
Member

tfboyd commented Mar 17, 2020

So close to high fives all around.

@JaCoderX JaCoderX closed this Mar 17, 2020
@JaCoderX JaCoderX reopened this Mar 17, 2020
@tfboyd
Copy link
Member

tfboyd commented Mar 19, 2020

MERGED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants