Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network-agent netns mode #122

Open
chdxD1 opened this issue May 23, 2024 · 2 comments
Open

network-agent netns mode #122

chdxD1 opened this issue May 23, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@chdxD1
Copy link
Member

chdxD1 commented May 23, 2024

Based on #121 (and the reason why we want to have the split).

This should be an alternative to the current "vrf-ibgp" (this is how I would call it today) mode. Either there will be a network-agent-netns and a network-agent-vrf-ibgp or it will be a network-agent with configuration flags. This is left for the implementor to decide.

The current network-operator architecture is based on some workarounds and is highly integrated into the host. This makes it complicated to make the network-operator useful for other people as well. It relies on VRFs which are connected by using veth interfaces in between.

Our goal is to move away from the veth interfaces between VRFs and to a traditional route-leaking setup.

The network-agent (in netns mode) would run in a network namespace / container (HBR container), completely separate from the Kubernetes side.

Currently the setup looks like this:
VRFs:
image
Layer2s:
image

With this in mind it would end up like this:
image

This will require changes to the FRR templates (which I can provide, they do not need to be implemented here) and to the netlink interface.

Looking at the image above we focus on the two upper links: veth between node/HBR and veth Trunk.

We (for now) assume that there is an interface called hbn (attached to VRF hbr, also pre-created) inside the container and an interface called tr (not to be created by network-agent).

For VRFs network-agent must perform the following steps:

  • Create a VRF, vxlan interface and bridge interface (similar to today)
  • Skip creating a veth pair (different to today)
  • Configure FRR with different sets of templates (to be provided when implementation progresses)

For Layer2s network-agent must perform the following steps:

  • Create a vxlan interface and bridge interface (similar to today), attach them to VRF hbr (default / main VRF) or a different VRF from above.
  • Create an interface of type vlan which references the interface tr (name something like tr.) and set the master of this interface to the bridge created before

For Layer2 there is an additional step where I am unsure where that could end up. There might be the need for a very small network-agent-host that creates the tr. interfaces on the host network namespace side, see picture above.

@chdxD1 chdxD1 added the enhancement New feature or request label May 23, 2024
@p-strusiewiczsurmacki-mobica
Copy link
Contributor

attach them to VRF hbr (default / main VRF) or a different VRF from above

How should it be decided which VRF to select?

@p-strusiewiczsurmacki-mobica
Copy link
Contributor

p-strusiewiczsurmacki-mobica commented Jun 4, 2024

@chdxD1
Hi, I've created branch with some changes here. https://github.com/p-strusiewiczsurmacki-mobica/das-schiff-network-operator/blob/agent-gradual-netns (this contains changes relevant to #110 #112 #121 as well).

I've separated the code that I think will be common for netns and vrf-ibgp modes in pkg/adapters/netlink.

For netns mode - in pkg/adapters/netns:

For VRFs (Layer 3):

For Layer2:

Currently I'm trying to figure out how to test this.

I have also some questions:

  • How is the tr.<number> interface and the bridge master determined? Should those be specified in Layer2NetworkConfiguration? Or should those be created dynamically?

  • Should all the code that is used to reconciling existing Layer 2 configurations (https://github.com/telekom/das-schiff-network-operator/blob/main/pkg/nl/layer2.go#L250) be also used here? I can't see much relevance between creation of those interfaces described above and reconciliation of existing configurations or cleanup when configuration is being deleted, so it might be that I am missing something here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants