40 align gym and pettingzoo environments with corl counterparts #41

JohnMcCarroll · 2024-11-23T21:32:20Z

Address issue #40

…st-corl

Validate multidocking env

…oints observations, and zoo inspected points score observations. Added collision terminal condition and status to zoo environments.

…n test

…f inspection validation test passing (w rounding error caveat)

…ing to tests, updated translational inspection test

JohnMcCarroll · 2024-12-03T19:35:09Z

Additional observations:

kmeans_find_nearest_cluster still behaves stochastically. Presumably this stems from the use of "np.random.choice" https://github.com/act3-ace/safe-autonomy-sims/blob/main/safe_autonomy_sims/simulators/inspection_simulator.py#L266
using default configs, some multiagent tasks suffered from a bug where all agents received the same initial conditions. This led to the CollisionDoneFunction triggering on the first step of every episode.

…ion to use thrust, fixed reward component dict bug in info dict

…nspected points reward, enabled initial inspected points to count towards reward

…of 6dof inspection

… live_timestep, and docking timeout rewards

…of pettingzoo, updated delta_v (rewards)

jamie-cunningham · 2024-12-12T19:29:01Z

Additional discrepancies found:

weighted 6dof inspection: fov and focal length changed to match corl configs

multiagent weighted 6dof inspection: CoRL observation space is different than single agent CoRL version. Current CoRL multiagent weighted 6dof inspection observation space is similar to weighted translational inspection, with the addition of orientation and angular velocity. The pettingzoo version of the multiagent 6dof inspection contains yet another interpretation of the observation space, similar to the gymnasium 6dof inspection version of the task.

safe-autonomy-simulation required an update to InspectionPointsSet's get_num_points_inspected method in order to support the calculation of only the points inspected by the given deputy and a bug fix for get_total_weight_inspected: enable entity-specific calculation of inspected points + inspected points score safe-autonomy-simulation#28

Point 1: sounds good
Point 2: we need to align on obs spaces across environments. What work needs to be done to match the pettingzoo obs space with the CoRL obs space?
Point 3: PR under review

jamie-cunningham · 2024-12-12T19:29:36Z

TODO: onnx and onnxruntime need to be added as dependencies. I had trouble simply using poetry add as the packages were unable to be found.

I was able to add these with poetry. They should be included in the project requirements now.

jamie-cunningham · 2024-12-12T19:34:51Z

Additional observations:

kmeans_find_nearest_cluster still behaves stochastically. Presumably this stems from the use of "np.random.choice" https://github.com/act3-ace/safe-autonomy-sims/blob/main/safe_autonomy_sims/simulators/inspection_simulator.py#L266

using default configs, some multiagent tasks suffered from a bug where all agents received the same initial conditions. This led to the CollisionDoneFunction triggering on the first step of every episode.

Point 1: This seems likely. Both implementation use np.random.choice for this method. We should get a better understand of why/what's going on here.
Point 2: Is this for the CoRL envs?

safe_autonomy_sims/gym/inspection/inspection_v0.py

safe_autonomy_sims/gym/inspection/reward.py

safe_autonomy_sims/gym/inspection/sixdof_inspection_v0.py

safe_autonomy_sims/gym/inspection/weighted_inspection_v0.py

safe_autonomy_sims/pettingzoo/inspection/multi_inspection_v0.py

safe_autonomy_sims/pettingzoo/inspection/weighted_multi_inspection_v0.py

JohnMcCarroll · 2024-12-13T18:53:29Z

Additional discrepancies found:

DeltaV functions in CoRL use thrust while gymnasium/pettingzoo environments used velocity.
Corl environments implements reward scaling for the DeltaV reward and WinLoseDoneStateReward, while gymnasium/pettingzoo environments do not (this has not been addressed yet).
Initial inspected points (on reset) were not counted towards reward in gymnasium/pettingzoo envs.
No MaxDistanceReward was present in 6DOF inspection gymnasium/pettingzoo envs.
6DOF inspection's FacingChiefReward is not aligned with CoRL's facing chief reward (this has not been addressed yet).
CoRL's 6DOF multiagent inspection does not have the same rewards configured as CoRL's single agent 6DOF inspection.

JohnMcCarroll · 2024-12-17T18:56:56Z

Additional discrepancies found:

weighted 6dof inspection: fov and focal length changed to match corl configs

multiagent weighted 6dof inspection: CoRL observation space is different than single agent CoRL version. Current CoRL multiagent weighted 6dof inspection observation space is similar to weighted translational inspection, with the addition of orientation and angular velocity. The pettingzoo version of the multiagent 6dof inspection contains yet another interpretation of the observation space, similar to the gymnasium 6dof inspection version of the task.

safe-autonomy-simulation required an update to InspectionPointsSet's get_num_points_inspected method in order to support the calculation of only the points inspected by the given deputy and a bug fix for get_total_weight_inspected: enable entity-specific calculation of inspected points + inspected points score safe-autonomy-simulation#28

Point 1: sounds good Point 2: we need to align on obs spaces across environments. What work needs to be done to match the pettingzoo obs space with the CoRL obs space? Point 3: PR under review

Regarding Point 2:

In order to match the multiagent 6dof inspection pettingzoo obs space with the multiagent 6dof inspection CoRL obs space, the current observations would need to be stripped down to include: position, velocity, num inspected points, uninspected points cluster, sun angle, priority vector, inspected points score, orientation quaternion, and angular velocity. This involves removing the position magnorm, velocity magnorm, facing chief dot product, and the position/uninspected points dot product components from the pettingzoo obs space. The Euler orientation would need to be changed to a quaternion representation. The relative position to the uninspected points cluster would need to be changed to the absolute position of the uninspected points cluster. The relative priority vector would need to be changed to the absolute priority vector.

Given that there were significant differences in the obs spaces for the gymnasium single agent 6dof, CoRL single agent 6dof, pettingzoo multiagent 6dof, and CoRL multiagent 6dof environments (and given that the single agent CoRL 6dof obs space was the most mature) I decided to model both the gymnasium and pettingzoo 6dof obs off of the single agent CoRL 6dof environment. Do you think that was a mistake?

JohnMcCarroll · 2024-12-17T18:57:55Z

TODO: onnx and onnxruntime need to be added as dependencies. I had trouble simply using poetry add as the packages were unable to be found.

I was able to add these with poetry. They should be included in the project requirements now.

Thanks for doing that!

JohnMcCarroll · 2024-12-17T19:07:31Z

Point 1: This seems likely. Both implementation use np.random.choice for this method. We should get a better understand of why/what's going on here.

Regarding Point 2:

Yes that was for the CoRL multiagent environments. I believe Kyle tried and was not able to replicate the bug. I was able to get around the issue by creating unique agent.yml configs for each of the deputies in the episode.

…ith corl, 6dof test passes

…ewards from pettingzoo 6dof inspection

JohnMcCarroll · 2024-12-20T19:55:19Z

I made changes to the facing chief reward so that now all rewards and environments are aligned and all validation tests pass.

The only open question I have is regarding our handling of the multiagent 6dof inspection environment. Both the rewards and obs space in CoRL are very different than their single agent counterpart. I have changed the observations of 6dof inspection pettingzoo to match the single agent CoRL 6dof inspection environment. I can also change the rewards to match the single agent version as well. This is the direction I would recommend for the environments, as Kochise has refined the single agent CoRL 6dof rewards and obs space to enable convergence when training with PPO. However, I understand I was tasked with aligning obs and rewards across environments. Let me know which version of the obs + reward we want to use.

jamie-cunningham · 2025-01-09T18:50:08Z

Regarding the discrepancies in the obs space for the multiagent environments I think we should go the route of aligning with the single agent environments. I think it makes more sense to have the user handle updating the obs space to converge to a solution. So I'm going to recommend keeping Kochise's observations in CoRL and in a separate issue look at moving the CoRL configs and related CoRL stuff to build from the gymnasium/pettingzoo envs. This way we can provide Kochise's observations in the CoRL configs rather than forcing the base environments to conform to them.

…gent 6dof inspection environments

John McCarroll and others added 20 commits November 4, 2024 16:59

framework for docking gymnasium validation test

a6714e5

aligned ICs and episode lengths

2dd8b63

added tolerances for rouunding errors

757a31c

pushing scripts to generate test artifacts

da813b2

inspection_v0 test running

bce02b4

inspection_v0 validation test passes

f91e76f

weighted inspection test passes

3180afb

test for multidocking_v0

3d4207c

sixdof inspection gym validation test

976960e

Test passes

dbb1c88

added multiagent translational inspection validation test

75439e2

multiagent weighted inspection test

734c340

multiagent 6dof test

19b938a

update 6dof test

e7ced19

set up weighted 6dof inspection validation test

ebf654d

Merge branch 'main' into 23-validate-gympettingzoo-environments-again…

4bd370a

…st-corl

back merged branch 23

042af6d

Merge pull request #39 from act3-ace/validate-multidocking-env

5ad14db

Validate multidocking env

clean up

2364aaa

remove prints

64e713d

JohnMcCarroll requested a review from jamie-cunningham November 23, 2024 21:32

JohnMcCarroll self-assigned this Nov 23, 2024

JohnMcCarroll linked an issue Nov 23, 2024 that may be closed by this pull request

Align Gym and Pettingzoo environments with CoRL counterparts #40

Closed

John McCarroll added 7 commits November 23, 2024 17:24

changed gym/zoo frame rates, inspected points radius, zoo inspected p…

62a2e78

…oints observations, and zoo inspected points score observations. Added collision terminal condition and status to zoo environments.

adjust collision radius for docking

cbc33d8

first pass at replicating corl 6dof obs

4c477b5

fixed collision dones, updated weighted multi translational inspectio…

bc70a8f

…n test

updated multi translational inspection test

13e387f

updated camera sensor fov + focal length for 6dof inspection, got 6do…

828b9ab

…f inspection validation test passing (w rounding error caveat)

changed fov + focal length for multiagent inspection 6dof, added seed…

42258ad

…ing to tests, updated translational inspection test

code clean up

256c079

John McCarroll and others added 10 commits December 4, 2024 18:52

updated docking delta_v reward

9d163cf

added rewards validation for docking gymnasium, changed delta_v funct…

77c454c

…ion to use thrust, fixed reward component dict bug in info dict

fixed reward component bug

85c0026

updated inspection delta_v function to use thrust, changed scale of i…

b4c8228

…nspected points reward, enabled initial inspected points to count towards reward

updated weighted inspection rewards

154d23c

count initial inspected points towards rewards, updated some rewards …

8e90bf4

…of 6dof inspection

added rewards to multi docking test, fixed info dicts, fixed delta_v,…

3ad063a

… live_timestep, and docking timeout rewards

fixed 6dof inspection delta v reward

b4cdb43

updated all tests to include rewards, added max distance reward to 6d…

de2bc40

…of pettingzoo, updated delta_v (rewards)

add onnx to project dependencies

4ec5c3c

jamie-cunningham requested changes Dec 12, 2024

View reviewed changes

John McCarroll added 2 commits December 17, 2024 14:51

changed angle representation in _init_sim

cf0b99c

updated docstrings

b2b48a4

JohnMcCarroll requested a review from jamie-cunningham December 17, 2024 20:27

John McCarroll added 2 commits December 19, 2024 16:59

fixed max time for livetimestep reward, aligned facing chief reward w…

da2591d

…ith corl, 6dof test passes

added tolerance gradient to gym 6dof inspection test, removed extra r…

9cc94d3

…ewards from pettingzoo 6dof inspection

jamie-cunningham approved these changes Jan 9, 2025

View reviewed changes

aligned rewards and added baseline observations to single and multi a…

a671e3f

…gent 6dof inspection environments

JohnMcCarroll merged commit 7a7f97f into main Jan 10, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

40 align gym and pettingzoo environments with corl counterparts #41

40 align gym and pettingzoo environments with corl counterparts #41

JohnMcCarroll commented Nov 23, 2024

JohnMcCarroll commented Dec 3, 2024

jamie-cunningham commented Dec 12, 2024

jamie-cunningham commented Dec 12, 2024

jamie-cunningham commented Dec 12, 2024

JohnMcCarroll commented Dec 13, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 20, 2024

jamie-cunningham commented Jan 9, 2025

40 align gym and pettingzoo environments with corl counterparts #41

40 align gym and pettingzoo environments with corl counterparts #41

Conversation

JohnMcCarroll commented Nov 23, 2024

JohnMcCarroll commented Dec 3, 2024

jamie-cunningham commented Dec 12, 2024

jamie-cunningham commented Dec 12, 2024

jamie-cunningham commented Dec 12, 2024

JohnMcCarroll commented Dec 13, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 17, 2024

JohnMcCarroll commented Dec 20, 2024

jamie-cunningham commented Jan 9, 2025