-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PX IPv6 support #7224
Draft
swagner-de
wants to merge
10
commits into
master
Choose a base branch
from
px-enable-v6
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
PX IPv6 support #7224
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
swagner-de
commented
Oct 11, 2024
- Use StatefulSet instead of deployments
- Simplify PX charts: Region names not put all over all places
- Use different naming for statefulset
- [PX] Set bird as default container
- Remove metricRelabelings as the same labels are already created
- Support legacy config file path and new config file path
- Use different naming for statefulset
- Use 1 statefulset with replicas:2 instead of 2 statefulsets
- Let router-id be determined from init-network
- Add bird IPv6 statefulsets once an IPv6 network was supplied.
The deployment controller in case of `strategy: Recreate` terminates the old pod and creates the new pod already. Because of our static IP assignment via multus this would lead to duplicate addresses being used while the old pod is not fully terminated and the new pod's networking stack is already up. This did not pose a problem with IPv4 (aside from it being still undesirable) but IPv6's duplicate address detection will fail on the new pod, leading to it binding the address only tentatively. This is of course undesired behavior. Hence we make the switch to StatefulSets. Statefulsets make sure that there is only ever one instance of a StatefulSets ordinal. By definition they are anyhow more suiting to bird, as they maintain a stable identity and are tailored to stateful applications.
Historically(TM) the PX charts would reference the region at all places. ConfigMaps, Services, Ingress, Deployments, all of them were prefixed with the region name. This comes with very little benefit (operator generally should well understand to which region they are scoped) but with the disadvantage that the region must carried into all subcharts. This commit also rids all useless prefixes like `cfg-`. We know it's a ConfigMap, we don't need the prefix.
IPv6 is coming. So we need a new naming. Including the region in the statefulset name has bothered me for quite some time as it provided little value but instead bloats the pod name and requires the region to be passed to everything. The new naming now tells the operator what we are, a routeserver, which address family we serve, which service number we do, and in which domain we run. We end up with `routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`
Tried this out, turns out we have exact same labels already on the pod, so no need to parse the info out of the name here.
With the introduction of IPv6 we will also need a new v6 only config. That has to be encoded in the file name but also in the name of the configMap. Hence, we move the part that checks if the config is present out into a helper. In that helper, we try the new path, if it is present we return that path. If not we fall back to the legacy path. We also give the configMap a naming that is independent of the config used but basically equals the new config file name without the `.conf` suffix.
IPv6 is coming. So we need a new naming. Including the region in the statefulset name has bothered me for quite some time as it provided little value but instead bloats the pod name and requires the region to be passed to everything. The new naming now tells the operator what we are, a routeserver, which address family we serve, which service number we do, and in which domain we run. We end up with `routeserver-{{ $afi }}-service-{{ $service }}-domain-{{ $domain }}-{{ $instance }}`
The PX instances share everything, but the IP which is determined by the network attachment definition (NAD). So the idea is, to remove the IP from the NAD and merge the 2 statefulsets. That's what we did here: 1. Remove the IPAM part from the NAD. This leads to the pod coming up with the interface bound l2 wise, but no l3 config present. 2. We don't assign IP addresses anymore, we assume by convention that the first #replicas IPs in the PX peering network are determined for the route server. With this in mind, and the PX peering network known, we can calculate this pod's IP based on the statefulset ordinal. This IP can then be configured. All of this is done through the `init-network` container. It works for IPv4 and IPv6. 3. The looking glass needs to be able to reach each router-server individually. We did that earlier by having a service per statefulset. If we kept doing this, the looking glass would be load balanced between the statefulsets. Headless services [1] to the rescue! We add one service per helm release, that becomes a suffix in the pod's dns name [2]. We also backreference this service in the statefulset allowing us to address each pod of the statefulset by its stable name. [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services [2] https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id
The router id is a 4 byte value that is written in IPv4 notation. Bird used the line `router id from "vlan*";` meaning it takes the IP address from an interface starting with vlan* as router id. With v6 that does not work anymore because the router id is still the same value. There are 2 options: 1. hardcode the router id in the config 2. Dynamically determine router id at runtime (1) has the disadvantage that we need a dedicated configmap per route server instance, meaning we would also back out from the plan that we run both router-server instances from a single statefulset as 2 configmaps need 2 statefulsets. It would also mean, that the configbuilder, which right now has no knowledge of the IP addresses bird is served from must get this information, likely by getting a secrets-repo reference or by it bying replicated. This is all not needed for (2), the only downside would be that some parts of the config cannot be validated upon building/merging/deploying it. We would create an additional config that is referenced from the main config that only contains a single line, the router id. As the router id will be the corrresponding router-servers v4 address we always feed the v4 address into the pod, even if it is a v6 pod. We let the init-network script calculate the link IP and the router id. It will generate a single line into a file that the bird config will include.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.