Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a hostNetwork setting to the Driver and ControllerPlugin Specs #194

Merged
merged 2 commits into from
Feb 11, 2025

Conversation

obnoxxx
Copy link
Collaborator

@obnoxxx obnoxxx commented Jan 13, 2025

This bool setting can be added to the controllerPlugin section of the Driver Spec.

It will be propagated to all controller plugin pods.

This implements the following design:

https://github.com/ceph/ceph-csi-operator/blob/main/docs/design/hostNetwork.md

see issue #157 for background and context.

Describe what this PR does

This work in progress is a draft PR for the implementation of host Networking for controller plugin pods as described in https://github.com/ceph/ceph-csi-operator/blob/main/docs/design/hostNetwork.md

Is there anything that requires special attention

This needs thorough testing.

Is the change backward compatible?

should be but this needs testing.

Are there concerns around backward compatibility?

none right now

For example:

Related issues

Fixes: #157

Future concerns

none.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Jan 13, 2025

@nb-ohad, @Madhu-1 , @rohan47, this is a replacement for the closed stale PR #176
this is a replacement

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 5, 2025

PR updated to address the port collisions that @rohan47 observed in testing.

I have built an image and pushed it to quay:

$ docker pull quay.io/madam/ceph-csi-operator:host-net`

or:

 $ podman pull quay.io/madam/ceph-csi-operator:host-net

internal/utils/csi.go Outdated Show resolved Hide resolved
@obnoxxx obnoxxx force-pushed the host-network branch 2 times, most recently from 6be915c to 1eb0b25 Compare February 6, 2025 09:44
@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 6, 2025

@Madhu-1 : updated with names ControllerPluginCsiAddonsContainerPort and NodePluginCsiAddonsContainerPort. We can still add logic to differentiate between drivers as follow-up.

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 6, 2025

CI failures seem to be mostly due to problems with setup-go (network gateway timeouts)

@obnoxxx obnoxxx force-pushed the host-network branch 2 times, most recently from 8fb941f to 12052b9 Compare February 6, 2025 12:21
@rohan47
Copy link

rohan47 commented Feb 6, 2025

Verified the changes. The controller plugin pods are coming up with hostNetwork and there is no port collision
Checked with changing the settings in operatorconfig as well as driver

@obnoxxx obnoxxx requested review from Madhu-1 and rohan47 February 6, 2025 13:58
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

Verified the changes. The controller plugin pods are coming up with hostNetwork and there is no port collision Checked with changing the settings in operatorconfig as well as driver

@rohan47 can you please share the oc get po -owide output and also make sure you try to run both cephfs and rbd deployment(pod) on the same node.

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 6, 2025

According to @rohan47, even with the latest build, port collision is happening between rbd and cephfs controller plugin pods.

I am going to update with a fix soon.

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 6, 2025

Updated to try and set different ports for rbd and cephfs drivers.

But I have some typo and could not even get it to compile ...

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 6, 2025

updated , and now it builds.

also pushed new images to quay.

@rohan47
Copy link

rohan47 commented Feb 8, 2025

Tested the latest changes
Verified that the controllerplugin pods are using hostNetwork: true

openshift-storage.cephfs.csi.ceph.com-ctrlplugin-5c6c7698fpbr5d   7/7     Running   0          26m    10.0.43.133   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-67cf97774hsfmj   0/7     Pending   0          17m    <none>        <none>                        <none>           <none>
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-67cf97774nbrkn   7/7     Running   0          18m    10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-dk4zf            3/3     Running   0          26m    10.0.43.133   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-h9zsw            3/3     Running   0          26m    10.0.46.43    ip-10-0-46-43.ec2.internal    <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-sl7gx            3/3     Running   0          26m    10.0.45.77    ip-10-0-45-77.ec2.internal    <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-vg7k8            3/3     Running   0          26m    10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-vjtvk            3/3     Running   0          26m    10.0.35.91    ip-10-0-35-91.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-ctrlplugin-5f498878f4-r5wsx    8/8     Running   0          26m    10.131.0.83   ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.rbd.csi.ceph.com-ctrlplugin-5f498878f4-ts9tz    8/8     Running   0          26m    10.129.2.45   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.rbd.csi.ceph.com-ctrlplugin-6585b5b4d5-nwqk5    0/8     Pending   0          21m    <none>        <none>                        <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-94gf2               4/4     Running   0          25m    10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-9lngz               4/4     Running   0          25m    10.0.45.77    ip-10-0-45-77.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-dzdxv               4/4     Running   0          26m    10.0.43.133   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-fd86t               4/4     Running   0          25m    10.0.35.91    ip-10-0-35-91.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-qqz6v               4/4     Running   0          25m    10.0.46.43    ip-10-0-46-43.ec2.internal    <none>           <none>

On the node

netstat -tlnp | grep csi
tcp6       0      0 :::9071                 :::*                    LISTEN      2371714/csi-addons-
tcp6       0      0 :::9080                 :::*                    LISTEN      2379152/csi-addons-
tcp6       0      0 :::10309                :::*                    LISTEN      3357/csi-node-drive

internal/utils/csi.go Outdated Show resolved Hide resolved
internal/utils/csi.go Outdated Show resolved Hide resolved
internal/controller/driver_controller.go Outdated Show resolved Hide resolved
internal/controller/driver_controller.go Outdated Show resolved Hide resolved
internal/controller/driver_controller.go Show resolved Hide resolved
internal/controller/driver_controller.go Outdated Show resolved Hide resolved
This bool setting can be added to the controllerPlugin section of the
Driver Spec.

It will be propagated to all controller plugin pods.

This implements the following design:

https://github.com/ceph/ceph-csi-operator/blob/main/docs/design/hostNetwork.md

Signed-off-by: Michael Adam <[email protected]>
@obnoxxx obnoxxx force-pushed the host-network branch 4 times, most recently from 8e96823 to c199901 Compare February 10, 2025 12:22
@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 10, 2025

@Madhu-1 , @rohan47,

review requests addressed, additional build fixed applied, new images pushed to quay.io.

Please re-review/re-test 😄

@obnoxxx obnoxxx requested a review from Madhu-1 February 10, 2025 12:31
internal/controller/driver_controller.go Outdated Show resolved Hide resolved
internal/utils/csi.go Outdated Show resolved Hide resolved
internal/controller/driver_controller.go Outdated Show resolved Hide resolved
@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 10, 2025

@Madhu-1 : latest change requests addressed.
@rohan47 : image on quay updated.

@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 10, 2025

updated one more time with build fix. image pushed to quay.

@obnoxxx obnoxxx requested a review from Madhu-1 February 10, 2025 14:36
Madhu-1
Madhu-1 previously approved these changes Feb 10, 2025
@obnoxxx
Copy link
Collaborator Author

obnoxxx commented Feb 10, 2025

updated one more time to fix build/vet error.

image updated on quay.

@rohan47
Copy link

rohan47 commented Feb 10, 2025

Tested the latest changes

oc get pods -o wide | grep csi
ceph-csi-controller-manager-5f4c54494d-dkpnp                      2/2     Running   0          7m59s   10.131.1.38   ip-10-0-9-50.ec2.internal     <none>           <none>
csi-addons-controller-manager-7d9b586844-2fvwb                    2/2     Running   0          29m     10.131.1.27   ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-67cf977749b8mp   7/7     Running   0          104s    10.0.57.243   ip-10-0-57-243.ec2.internal   <none>           <none>
openshift-storage.cephfs.csi.ceph.com-ctrlplugin-67cf97774nnns8   7/7     Running   0          102s    10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-48mgz            3/3     Running   0          6m58s   10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-67gd8            3/3     Running   0          6m58s   10.0.35.91    ip-10-0-35-91.ec2.internal    <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-7vq89            3/3     Running   0          6m58s   10.0.57.243   ip-10-0-57-243.ec2.internal   <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-f8nqv            3/3     Running   0          6m58s   10.0.43.133   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-qqmgc            3/3     Running   0          6m58s   10.0.46.43    ip-10-0-46-43.ec2.internal    <none>           <none>
openshift-storage.cephfs.csi.ceph.com-nodeplugin-zqnvr            3/3     Running   0          6m58s   10.0.45.77    ip-10-0-45-77.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-ctrlplugin-6585b5b4d5-4fdck    8/8     Running   0          101s    10.0.57.243   ip-10-0-57-243.ec2.internal   <none>           <none>
openshift-storage.rbd.csi.ceph.com-ctrlplugin-6585b5b4d5-6jcqn    8/8     Running   0          104s    10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-2d9mb               4/4     Running   0          6m58s   10.0.46.43    ip-10-0-46-43.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-2db86               4/4     Running   0          6m58s   10.0.45.77    ip-10-0-45-77.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-bk944               4/4     Running   0          6m58s   10.0.9.50     ip-10-0-9-50.ec2.internal     <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-d98gz               4/4     Running   0          6m58s   10.0.35.91    ip-10-0-35-91.ec2.internal    <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-ssbjk               4/4     Running   0          6m58s   10.0.43.133   ip-10-0-43-133.ec2.internal   <none>           <none>
openshift-storage.rbd.csi.ceph.com-nodeplugin-znqhd               4/4     Running   0          6m58s   10.0.57.243   ip-10-0-57-243.ec2.internal   <none>           <none>

On Node

netstat -tlnp | grep csi
tcp6       0      0 :::9070                 :::*                    LISTEN      269036/csi-addons-s
tcp6       0      0 :::9071                 :::*                    LISTEN      264633/csi-addons-s
tcp6       0      0 :::9080                 :::*                    LISTEN      269990/csi-addons-s
tcp6       0      0 :::10309                :::*                    LISTEN      3357/csi-node-drive

internal/utils/csi.go Outdated Show resolved Hide resolved
Usubg host network produces port collisions.

So we use different ports for controller plugin deployments and
node plugin deamonsets  to avoid collisions.

We also make sure that rbd and cephfs drivers don't collide

Signed-off-by: Michael Adam <[email protected]>
@Madhu-1 Madhu-1 merged commit 6ce9837 into ceph:main Feb 11, 2025
13 checks passed
@obnoxxx obnoxxx deleted the host-network branch February 11, 2025 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enforce host networking for Ceph CSI controller plugin pods
3 participants