-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to enable discrete actions > 2 #121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good. I would like to test these changes + the changes to the plugin on my machine. Would you be able to share the example you made? Ideally as a PR to the examples repo.
Thanks for the review. I don't have a custom environment for this as I was testing it on a modified example environment from the examples repo. Also since I had to check all 3 cases (continuous only / mixed / discrete only), one environment wouldn't cover checking everything. However, here are the steps that should get it working quickly:
func get_action_space() -> Dictionary:
return {
"discrete_size_10" : {
"size": 10,
"action_type": "discrete"
},
"discrete_size_5" : {
"size": 5,
"action_type": "discrete"
},
} func get_action_space() -> Dictionary:
return {
"discrete_size_2" : {
"size": 2,
"action_type": "discrete"
},
"continuous_size_2" : {
"size": 2,
"action_type": "continuous"
},
} func get_action_space() -> Dictionary:
# Should throw an error in python, discrete size 2 is current
# maximum supported for mixed spaces
return {
"discrete_size_10" : {
"size": 10,
"action_type": "discrete"
},
"continuous_size_1" : {
"size": 1,
"action_type": "continuous"
},
}
func set_action(action) -> void:
print(action) You can use the same setting for testing training and inference on the exported .onnx file. Finally, if you want to train the agent with e.g. discrete actions only to see if it learns a good policy, set_action will need to be modified accordingly based on the example, but the modification should be simple for RingPong or BallChase. Additional notes / considerations:
|
I've made a small test environment which only outputs action values (includes the updated compatible plugin already). I've made a separate branch in the fork for it since it's a test only environment. I'm not sure to what branch and whether we should merge it, but it might be useful to have some test environments for checking regression later on. Usage: In the current PR, the second dictionary |
Please hold off on the review as an important issue was spotted by @yaelatletl during his review that needs to be addressed in my experimental branch of the plugin before potential merging. To keep things synchronized and avoid merge conflicts, once the PR edbeeching/godot_rl_agents_plugin#18 is merged, I will sync my plugin fork and then re-implement the changes necessary to make this PR work in my experimental branch. |
@Ivan-267 What is the current state of the PR? Any chance we can clean up the mess I have made? :D |
I am still using a variant of this branch for experimenting with my discrete only scenes locally (without using onnx), but it will need to be reworked before a potential merge (specifically the plugin side, but potentially something here as well). Since it alters the onnx model to output discrete actions directly when only discrete actions are used, separate handling is necessary for discrete and continuous actions on the plugin side, in both the C# code and the gdscript code. I made a mistake in my implementation of the plugin changes initially, and there are pending changes that are being worked on in the PR I linked above. After that PR is merged I could try to re-apply the changes that handle discrete only actions. There are also different possible alternative methods to the one used here for certain things, e.g.:
Edit: The plugin was updated. As is, this would still not work with discrete only actions in onnx models exported using rllib (if they provide logits as output), but should work with sb3. |
5d31dba
to
3c849e9
Compare
Superseded by #181. The branch can still be kept for reference since this implementation has an advantage, it sends only action values for discrete actions (not logits), so it could be more efficient with large spaces. |
This is my attempt to get discrete only actions to work properly on onnx exported models with SB3 and also discrete actions > 2 to work with SB3. As I'm not very familiar with the code, libraries used and ML/RL, I will describe the approach and changes below, however in case of merging a review is requested. I'm marking this as draft as well, so any potential changes can be made before merging. I've only tested this with Stable Baselines and CleanRL example so far.
Update: After a quick test with rllib (using the yaml file from this repo and small file changes to make training and export work for me) with discrete actions only used in the env, the onnx model outputted floats, which wouldn't work with this update (to make it compatible with the update, the onnx model would need to output integers for discrete actions in discrete-only action environments). It's also not likely to work well with the current version as the Godot plugin does not handle logits for discrete-only action spaces (adding the conversion there instead of integrating it into the models may be a potential alternative approach).
This change is connected to the draft PR for plugin (https://github.com/edbeeching/godot_rl_agents_plugin/pull/16/files#) - changes to both repositories are necessary.
Changes to the main repository:
Main changes to
godot_rl/core/utils.py
:When training using SB3 on environments with only discrete actions, currently SB3 will output integer values for an action (e.g. 0,1,2,3,4), when using CleanRL example, action values will always be continuous/float values, and when using mixed spaces with SB3 the values will be continuous/float also.
np.int64
, it will be added to actions without modification, otherwise the previous logic remains to convert the discrete action from the received float value to 1 or 0 based on value.Main changes to
godot_rl/wrappers/onnx/stable_baselines_export.py
Currently, when exporting models that have only discrete actions using the sb3 example, the .onnx model will produce logits for the discrete actions, while sb3 model produces integers for each discrete action. This produces an error on the
verify_onnx_export
assertion.forward
method, and in case the action space isMultiDiscrete
, it should select the best action from the distribution (deterministic actions). Otherwise (for continuous or mixed actions), the old behavior is used. I've implemented this after checking the sb3 policies file to see how discrete actions are processed, however I cannot guarantee I've implemented it correctly. In a few small test cases I've tried, this makes theverify_onnx_export
tests pass and the model works in Godot.With that change, if the exported model uses discrete actions only (the example from the image has 2 discrete actions), the following operations are added which should select the action with the highest probability from the logits:
![Netron_ZXj1dVNn7V](https://private-user-images.githubusercontent.com/61947090/250606250-ce6086d5-071c-4674-b925-59fafab24acc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODEzNDQsIm5iZiI6MTczOTU4MTA0NCwicGF0aCI6Ii82MTk0NzA5MC8yNTA2MDYyNTAtY2U2MDg2ZDUtMDcxYy00Njc0LWI5MjUtNTlmYWZhYjI0YWNjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAwNTcyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWExZDFmNjA0ODQyOTVlYWI0YjQ4ZWJlZDBkZTFmZTA0ZmQyNDVlNDUwYzQ4YTI0OGU4M2U2NDkwODE2ODAwNjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Hy00C40xC_NNBiie_UW3Fh0hjJuQxWNNYOTXTlf2OTA)
Changes to the plugin repository:
ONNXInference.cs:
Previously, the exported onnx model would output float values for actions in case of either continuous or discrete actions used. After the changes above, onnx will output discrete actions as integers if only discrete actions are used in the environment. To handle that, I've made a few changes:
session
andresults
disposable values after use.sync.gd:
Simple test of output values when only discrete actions are used with the sb3 example for training after the changes (printing the received actions directly in Godot):