Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split controller and node binaries #95

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jarrpa
Copy link
Contributor

@jarrpa jarrpa commented Nov 12, 2018

This PR splits the controller and node CSI roles into two separate binaries. Driver tests are not currently implemented as the CSI sanity test suite does not yet support split driver roles (PR submitted for that: kubernetes-csi/csi-test#135).

Signed-off-by: Jose A. Rivera [email protected]

@ghost ghost assigned jarrpa Nov 12, 2018
@ghost ghost added the in progress label Nov 12, 2018
@centos-ci
Copy link
Collaborator

Can one of the admins verify this patch?

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 12, 2018

add to whitelist

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 12, 2018

retest this please

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 12, 2018

@jarrpa can you give some more context why we need two separate binaries?

@JohnStrunk
Copy link
Member

After taking a quick look, I think this is going to need a design doc and significant discussion. Perhaps I'm misunderstanding things like the client directory, but it looks like there's code in there that enumerates all PVs in a kube cluster. And there's also some sort of cache of servers and username?

This seems like more than "split controller and node binaries"

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 12, 2018

@Madhu-1 This allows logic isolation of the different roles, and is a precursor to bringing in different types of Controllers, such as for heketi and gluster-block.

@JohnStrunk Yes. The reason for the cache is because when not using default REST parameters there is no way to communicate with the target Gluster cluster from a DeleteVolume request. This is the result of me accidentally squashing two patches together, because I did the REST changes first and the split second. I'll try to untangle them after the fact.

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 12, 2018

@JohnStrunk Patches untangled, PTAL

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 13, 2018

I'll also note that I do have a local version of driver_test.go that works with a modified csi-test repo, and that succeeds. If the csi-test PR merges I'll update this PR.

@JohnStrunk
Copy link
Member

@Madhu-1 This allows logic isolation of the different roles, and is a precursor to bringing in different types of Controllers, such as for heketi and gluster-block.

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 14, 2018

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

That's not true. Each CSI driver just needs to build its own Controller container. We can keep one Node container compatible with multiple Controllers. This is what I do in my WIP code for having a heketi Controller driver, a gd2 Controller driver, and a GlusterFS Node driver that works with both. This would then allow you to, for example, update the Controller containers without touching the Node container.

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 14, 2018

It should be noted, the way I designed my multi-Controller implementation was that both Controller pods listened as the glusterfs.gluster.org provisioner, they just had tests to verify if the configured REST client met their particular API. This was sufficient to allow the correct Controller to do the provisioning. The Node easily mounted volumes from either provisioner.

@JohnStrunk
Copy link
Member

I'm still not understanding the motivation. Now each type of CSI driver has multiple containers that must be built.
Modularizing the code (e.g., abstracting the client stuff to an interface) is good, but I don't see what we're getting from multiple binaries and containers per driver type. I can't figure out how that provides benefits.

That's not true. Each CSI driver just needs to build its own Controller container. We can keep one Node container compatible with multiple Controllers. This is what I do in my WIP code for having a heketi Controller driver, a gd2 Controller driver, and a GlusterFS Node driver that works with both. This would then allow you to, for example, update the Controller containers without touching the Node container.

Only having a Controller container may work for Heketi vs. GD2 for gluster-fuse, but I'm skeptical that a common node container works in all cases. Again, what does this get us?
Today, we have a single binary that has controller, identity, attacher, and node APIs. You are proposing to split this into multiple binaries, and for the Heketi vs. GD2 use case, I get that the node binary may be usable, unchanged.

My point is that having a combined binary and single container model doesn't provide any less code sharing while keeping it simple... For Heketi, use image 1 everywhere; for GD2, use image 2 everywhere. Later, when you want to use g-b, that's image 3, and something else like loopback is image 4, etc. Keeping track of which Controller container I can use with which node container on top of that sounds like it's asking for errors in deployment, so if it doesn't get us something really tangible, why do it?

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 15, 2018

My point is that having a combined binary and single container model doesn't provide any less code sharing while keeping it simple... For Heketi, use image 1 everywhere; for GD2, use image 2 everywhere. Later, when you want to use g-b, that's image 3, and something else like loopback is image 4, etc. Keeping track of which Controller container I can use with which node container on top of that sounds like it's asking for errors in deployment, so if it doesn't get us something really tangible, why do it?

There was already talk of having a separate driver for gluster-block, at the very least, under the name of something like glusterblock.gluster.org. It would seem like an unnecessary burden to implement all future drivers into the same binary. The argument that it sets up potential misconfiguration scenarios is moot if you consider this all mostly coming from an Operator, and in the manual case it seems like a discredit to our users to say that simply having options is a hassle. It's only end users that likely don't want to have to care about their storage configuration. Anyone who is actually interested in deploying storage should and likely will have some interest in configuration options, at which point it'd only be on us to make sure they're properly documented and don't have confusing names.

I personally see no tangible benefit in keeping things in a single binary, so it seems only natural to split the binaries at logical boundaries for crash isolation at the very least.

@JohnStrunk
Copy link
Member

There was already talk of having a separate driver for gluster-block, at the very least, under the name of something like glusterblock.gluster.org. It would seem like an unnecessary burden to implement all future drivers into the same binary.

Yes, each driver should have its own binary and container. That binary supports the entire set of supported CSI APIs for that driver. This patch has been about potentially having multiple binaries and containers per driver by splitting the CSI API implementation across binaries and containers.

Today, we have a single binary that is deployed in the provisioner, attacher, and node roles (all as separate pods). Only a portion of the API is used in each of those roles, but the unified binary keeps it all together as a single unit that is built, tested, and deployed as a consistent version. Requiring an admin to specify different images for each is simply unnecessary and it increases the likelihood of error. It's also a burden on QE and the maintainers to ensure all images are built and in-sync.

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 16, 2018

Today, we have a single binary that is deployed in the provisioner, attacher, and node roles (all as separate pods). Only a portion of the API is used in each of those roles, but the unified binary keeps it all together as a single unit that is built, tested, and deployed as a consistent version. Requiring an admin to specify different images for each is simply unnecessary and it increases the likelihood of error. It's also a burden on QE and the maintainers to ensure all images are built and in-sync.

These are non-existent burdens if our automation is at least half-way decent. You don't have to make the admin remember different versions, just tag all the images with the same versions if you want. And again, this all becomes moot with an Operator to handle the versioning for you. If all else fails, just build all container images every time one of them changes and version all of them, with a stated policy that we only support deployments with matching tags between driver images.

The primary reason for all this is that I don't want to have two identical node drivers running per node in the case where a cluster is running both heketi- and gd2-based controller drivers. The only way to accomplish this would be to have a single driver that handles both heketi and gd2 APIs dynamically, which I believe we also wanted to avoid.

Given all this, I still see no good reason not to do this, especially given the benefit of keeping as much logic separated as possible.

@Madhu-1
Copy link
Member

Madhu-1 commented Nov 23, 2018

@JohnStrunk any update on this PR?

@JohnStrunk
Copy link
Member

@JohnStrunk any update on this PR?

I have nothing to add to my previous comments.

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 27, 2018

@JohnStrunk If you don't have anything else to add, that would indicate that you do not refute the final points I made. Is that the case, and if so may this PR go through?

@jarrpa
Copy link
Contributor Author

jarrpa commented Nov 27, 2018

In the interest of moving some things along, I created #110 to merge the work of at least splitting the driver into different libraries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants