-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
40 align gym and pettingzoo environments with corl counterparts #41
40 align gym and pettingzoo environments with corl counterparts #41
Conversation
Validate multidocking env
…oints observations, and zoo inspected points score observations. Added collision terminal condition and status to zoo environments.
…f inspection validation test passing (w rounding error caveat)
…ing to tests, updated translational inspection test
Additional observations:
|
…ion to use thrust, fixed reward component dict bug in info dict
…nspected points reward, enabled initial inspected points to count towards reward
…of 6dof inspection
… live_timestep, and docking timeout rewards
…of pettingzoo, updated delta_v (rewards)
Point 1: sounds good |
I was able to add these with poetry. They should be included in the project requirements now. |
Point 1: This seems likely. Both implementation use |
safe_autonomy_sims/pettingzoo/inspection/weighted_multi_inspection_v0.py
Show resolved
Hide resolved
Additional discrepancies found:
|
Regarding Point 2: In order to match the multiagent 6dof inspection pettingzoo obs space with the multiagent 6dof inspection CoRL obs space, the current observations would need to be stripped down to include: position, velocity, num inspected points, uninspected points cluster, sun angle, priority vector, inspected points score, orientation quaternion, and angular velocity. This involves removing the position magnorm, velocity magnorm, facing chief dot product, and the position/uninspected points dot product components from the pettingzoo obs space. The Euler orientation would need to be changed to a quaternion representation. The relative position to the uninspected points cluster would need to be changed to the absolute position of the uninspected points cluster. The relative priority vector would need to be changed to the absolute priority vector. Given that there were significant differences in the obs spaces for the gymnasium single agent 6dof, CoRL single agent 6dof, pettingzoo multiagent 6dof, and CoRL multiagent 6dof environments (and given that the single agent CoRL 6dof obs space was the most mature) I decided to model both the gymnasium and pettingzoo 6dof obs off of the single agent CoRL 6dof environment. Do you think that was a mistake? |
Thanks for doing that! |
Regarding Point 2: Yes that was for the CoRL multiagent environments. I believe Kyle tried and was not able to replicate the bug. I was able to get around the issue by creating unique agent.yml configs for each of the deputies in the episode. |
…ith corl, 6dof test passes
…ewards from pettingzoo 6dof inspection
I made changes to the facing chief reward so that now all rewards and environments are aligned and all validation tests pass. The only open question I have is regarding our handling of the multiagent 6dof inspection environment. Both the rewards and obs space in CoRL are very different than their single agent counterpart. I have changed the observations of 6dof inspection pettingzoo to match the single agent CoRL 6dof inspection environment. I can also change the rewards to match the single agent version as well. This is the direction I would recommend for the environments, as Kochise has refined the single agent CoRL 6dof rewards and obs space to enable convergence when training with PPO. However, I understand I was tasked with aligning obs and rewards across environments. Let me know which version of the obs + reward we want to use. |
Regarding the discrepancies in the obs space for the multiagent environments I think we should go the route of aligning with the single agent environments. I think it makes more sense to have the user handle updating the obs space to converge to a solution. So I'm going to recommend keeping Kochise's observations in CoRL and in a separate issue look at moving the CoRL configs and related CoRL stuff to build from the gymnasium/pettingzoo envs. This way we can provide Kochise's observations in the CoRL configs rather than forcing the base environments to conform to them. |
…gent 6dof inspection environments
Address issue #40