Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ocm-machine-pools] nodepool subnet specdrift #4140

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

geoberle
Copy link
Contributor

@geoberle geoberle commented Mar 1, 2024

specifying the subnet IDs of HCP nodepools in the clusters file upfront the cluster installation does not work well with how the rosa cli creates nodepools during cluster creation. there is no way to influence nodepool subnet placement at this stage. so it makes sense to not specify the subnets of HCP nodepools in app-interface during cluster creation

BUT

we can patch them in the cluster file once they are known!

this PR opens an MR to add the subnet IDs to the cluster file if it is not defined yet.

bonus: this PR will also ignore clusters that are not ready yet for machine pool management (spec.id not set yet). this avoidsocm-machine-pool reconcile errors during cluster provisioning.

reminder:

  • when a nodepool does not exist yet in OCM, app-interface is the source of truth for the subnet
  • once a nodepool exists in OCM, the subnet ID can't be changed anymore and OCM acts as the source of truth (the existing code reflects that by not allowing a subnet ID change on existing nodepools)

demo MR: https://gitlab.cee.redhat.com/goberlec/app-interface/-/merge_requests/12/diffs

part of https://issues.redhat.com/browse/APPSRE-9883

or self.subnet != pool.subnet
# if the nodepool in app-interface does not define a subnet explicitely
# we don't consider it a diff. we don't manage it
or (pool.subnet is not None and self.subnet != pool.subnet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this means we'll have non matching data between app-interface and the reality ?

Copy link
Contributor Author

@geoberle geoberle Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added the promised phase 2 to this MR in the to open an app-interface MR to patch the missing subnets. this way we don't have non-matching data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we should never apply subnet from app-interface to ocm? When app-interface pool subnet is not None and we have a different subnet value in current pool, this condition will be true, and show as diff, is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can and we should specify subnets for day2 node pool operations. but at installtime we can't influence it and we end up with a set of pools where ROSA decides how to spread them accross the available private subnets.

we can improve on this behaviour once there is the ability to create HCP clusters without node pools at install time. becaues then we wait until the cluster is ready and ocm-machine-pools can create the pools in the explicitely defined subnets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we should never apply subnet from app-interface to ocm? When app-interface pool subnet is not None and we have a different subnet value in current pool, this condition will be true, and show as diff, is this expected?

once a node pool has been created, its subnet ID can't be changed anymore. at this point OCM becomes the source of truth for it. we implemented the notion of an invalid_diff to highlight the problem, so it is expected that has_diff shows a diff in subnet between a-i and OCM.

So for the use case of nodepools have been recreated on OCM side without app-interface involvement, this diff will show in has_diff but will fail invalid_diff check, the integration will fail until patch mr merged, is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it still makes us aware and we need to intervene. we should not do things manually in OCM.

reconcile/ocm_machine_pools.py Outdated Show resolved Hide resolved
or self.subnet != pool.subnet
# if the nodepool in app-interface does not define a subnet explicitely
# we don't consider it a diff. we don't manage it
or (pool.subnet is not None and self.subnet != pool.subnet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we should never apply subnet from app-interface to ocm? When app-interface pool subnet is not None and we have a different subnet value in current pool, this condition will be true, and show as diff, is this expected?

reconcile/ocm_machine_pools.py Outdated Show resolved Hide resolved
reconcile/utils/mr/cluster_machine_pool_updates.py Outdated Show resolved Hide resolved
reconcile/utils/mr/cluster_machine_pool_updates.py Outdated Show resolved Hide resolved
reconcile/utils/mr/cluster_machine_pool_updates.py Outdated Show resolved Hide resolved
@geoberle geoberle force-pushed the ignore-undefined-subnet-on-nodepool branch from f5e21ce to 03af593 Compare March 4, 2024 10:17
@geoberle
Copy link
Contributor Author

geoberle commented Mar 4, 2024

Does that mean we should never apply subnet from app-interface to ocm? When app-interface pool subnet is not None and we have a different subnet value in current pool, this condition will be true, and show as diff, is this expected?

once a node pool has been created, its subnet ID can't be changed anymore. at this point OCM becomes the source of truth for it. we implemented the notion of an invalid_diff to highlight the problem, so it is expected that has_diff shows a diff in subnet between a-i and OCM.

@geoberle geoberle force-pushed the ignore-undefined-subnet-on-nodepool branch from 97868ab to 2c1b5c4 Compare March 4, 2024 12:59
Copy link
Contributor

@hemslo hemslo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@geoberle geoberle changed the title [ocm-machine-pools] ignore missing subnet definitions for nodepools [ocm-machine-pools] nodepool subnet specdrift Mar 5, 2024
defining the subnet id of a `machinePool` is optional in app-interface. if
such a machinepool maps to a nodepool of an HCP, we don't highlight it
as a diff and don't manage this property.

Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
Signed-off-by: Gerd Oberlechner <[email protected]>
@geoberle geoberle force-pushed the ignore-undefined-subnet-on-nodepool branch from 2c1b5c4 to 9eb593a Compare March 5, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants