Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

40 align gym and pettingzoo environments with corl counterparts #41

Conversation

JohnMcCarroll
Copy link
Collaborator

Address issue #40

@JohnMcCarroll JohnMcCarroll self-assigned this Nov 23, 2024
@JohnMcCarroll JohnMcCarroll linked an issue Nov 23, 2024 that may be closed by this pull request
John McCarroll added 7 commits November 23, 2024 17:24
@JohnMcCarroll
Copy link
Collaborator Author

Additional observations:

@jamie-cunningham
Copy link
Collaborator

Additional discrepancies found:

  • weighted 6dof inspection: fov and focal length changed to match corl configs
  • multiagent weighted 6dof inspection: CoRL observation space is different than single agent CoRL version. Current CoRL multiagent weighted 6dof inspection observation space is similar to weighted translational inspection, with the addition of orientation and angular velocity. The pettingzoo version of the multiagent 6dof inspection contains yet another interpretation of the observation space, similar to the gymnasium 6dof inspection version of the task.
  • safe-autonomy-simulation required an update to InspectionPointsSet's get_num_points_inspected method in order to support the calculation of only the points inspected by the given deputy and a bug fix for get_total_weight_inspected: enable entity-specific calculation of inspected points + inspected points score safe-autonomy-simulation#28

Point 1: sounds good
Point 2: we need to align on obs spaces across environments. What work needs to be done to match the pettingzoo obs space with the CoRL obs space?
Point 3: PR under review

@jamie-cunningham
Copy link
Collaborator

TODO: onnx and onnxruntime need to be added as dependencies. I had trouble simply using poetry add as the packages were unable to be found.

I was able to add these with poetry. They should be included in the project requirements now.

@jamie-cunningham
Copy link
Collaborator

Additional observations:

Point 1: This seems likely. Both implementation use np.random.choice for this method. We should get a better understand of why/what's going on here.
Point 2: Is this for the CoRL envs?

@JohnMcCarroll
Copy link
Collaborator Author

Additional discrepancies found:

  • DeltaV functions in CoRL use thrust while gymnasium/pettingzoo environments used velocity.
  • Corl environments implements reward scaling for the DeltaV reward and WinLoseDoneStateReward, while gymnasium/pettingzoo environments do not (this has not been addressed yet).
  • Initial inspected points (on reset) were not counted towards reward in gymnasium/pettingzoo envs.
  • No MaxDistanceReward was present in 6DOF inspection gymnasium/pettingzoo envs.
  • 6DOF inspection's FacingChiefReward is not aligned with CoRL's facing chief reward (this has not been addressed yet).
  • CoRL's 6DOF multiagent inspection does not have the same rewards configured as CoRL's single agent 6DOF inspection.

@JohnMcCarroll
Copy link
Collaborator Author

Additional discrepancies found:

  • weighted 6dof inspection: fov and focal length changed to match corl configs
  • multiagent weighted 6dof inspection: CoRL observation space is different than single agent CoRL version. Current CoRL multiagent weighted 6dof inspection observation space is similar to weighted translational inspection, with the addition of orientation and angular velocity. The pettingzoo version of the multiagent 6dof inspection contains yet another interpretation of the observation space, similar to the gymnasium 6dof inspection version of the task.
  • safe-autonomy-simulation required an update to InspectionPointsSet's get_num_points_inspected method in order to support the calculation of only the points inspected by the given deputy and a bug fix for get_total_weight_inspected: enable entity-specific calculation of inspected points + inspected points score safe-autonomy-simulation#28

Point 1: sounds good Point 2: we need to align on obs spaces across environments. What work needs to be done to match the pettingzoo obs space with the CoRL obs space? Point 3: PR under review

Regarding Point 2:

In order to match the multiagent 6dof inspection pettingzoo obs space with the multiagent 6dof inspection CoRL obs space, the current observations would need to be stripped down to include: position, velocity, num inspected points, uninspected points cluster, sun angle, priority vector, inspected points score, orientation quaternion, and angular velocity. This involves removing the position magnorm, velocity magnorm, facing chief dot product, and the position/uninspected points dot product components from the pettingzoo obs space. The Euler orientation would need to be changed to a quaternion representation. The relative position to the uninspected points cluster would need to be changed to the absolute position of the uninspected points cluster. The relative priority vector would need to be changed to the absolute priority vector.

Given that there were significant differences in the obs spaces for the gymnasium single agent 6dof, CoRL single agent 6dof, pettingzoo multiagent 6dof, and CoRL multiagent 6dof environments (and given that the single agent CoRL 6dof obs space was the most mature) I decided to model both the gymnasium and pettingzoo 6dof obs off of the single agent CoRL 6dof environment. Do you think that was a mistake?

@JohnMcCarroll
Copy link
Collaborator Author

TODO: onnx and onnxruntime need to be added as dependencies. I had trouble simply using poetry add as the packages were unable to be found.

I was able to add these with poetry. They should be included in the project requirements now.

Thanks for doing that!

@JohnMcCarroll
Copy link
Collaborator Author

Point 1: This seems likely. Both implementation use np.random.choice for this method. We should get a better understand of why/what's going on here.

Regarding Point 2:

Yes that was for the CoRL multiagent environments. I believe Kyle tried and was not able to replicate the bug. I was able to get around the issue by creating unique agent.yml configs for each of the deputies in the episode.

@JohnMcCarroll
Copy link
Collaborator Author

I made changes to the facing chief reward so that now all rewards and environments are aligned and all validation tests pass.

The only open question I have is regarding our handling of the multiagent 6dof inspection environment. Both the rewards and obs space in CoRL are very different than their single agent counterpart. I have changed the observations of 6dof inspection pettingzoo to match the single agent CoRL 6dof inspection environment. I can also change the rewards to match the single agent version as well. This is the direction I would recommend for the environments, as Kochise has refined the single agent CoRL 6dof rewards and obs space to enable convergence when training with PPO. However, I understand I was tasked with aligning obs and rewards across environments. Let me know which version of the obs + reward we want to use.

@jamie-cunningham
Copy link
Collaborator

Regarding the discrepancies in the obs space for the multiagent environments I think we should go the route of aligning with the single agent environments. I think it makes more sense to have the user handle updating the obs space to converge to a solution. So I'm going to recommend keeping Kochise's observations in CoRL and in a separate issue look at moving the CoRL configs and related CoRL stuff to build from the gymnasium/pettingzoo envs. This way we can provide Kochise's observations in the CoRL configs rather than forcing the base environments to conform to them.

@JohnMcCarroll JohnMcCarroll merged commit 7a7f97f into main Jan 10, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Align Gym and Pettingzoo environments with CoRL counterparts
2 participants