-
Notifications
You must be signed in to change notification settings - Fork 53
Critical flaw in scale method allows for entire namespace deletion #1313
Comments
Can someone look at this and tell me if it does what I think it does? If the Deis guys are too busy with other work then I will try and tackle this myself and create a pull request. I don't want to do it if my investigation is wrong. |
@lshemesh I think your analysis is correct, unfortunately. In the outermost |
If you can check the details of KubeException and figure out that the original exception wasn't because of a non existing namespace should the code just error out? I didn't mention it in my previous post but I believe this happened when I ran two scale commands back to back. |
Yes, I think that would be appropriate. Just somehow skip trying to delete the namespace in that case, and allow the |
Fix for deis#1313 An unhandled exception in ns.create due to "Conflict namespaces" will wipe out an entire namespace. I've added a catch here to protect against that occurring.
@mboersma check out my masterpiece |
Fix for #1313 An unhandled exception in ns.create due to "Conflict namespaces" will wipe out an entire namespace. I've added a catch here to protect against that occurring.
I brought this up in the Slack channel and it didn't seem to be taken very seriously so I'm creating this issue in hopes that someone takes a deeper look. Based on my own investigation it seems that there is a certain scenario in which
deis scale
can actually destroy an entire app within it's namespace. This also includes other components deployed to the same namespace.Calling
deis scale
runs the scale method in app.py which then calls self.create which is where the problem occurs. On line 199 there is a call to_scheduler.ns.get(namespace)
which could raise a KubeException error. The code assumes that an error thrown at that moment means the namespace doesn't exist but this isn't always the case. In the controller log I got a "There was a problem retrieving data from the Kubernetes API server". This DOES NOT mean the namespace doesn't exist.Following the exception from line 199, an attempt to create the namespace is made on line 201. This call will fail because of "409 Conflict namespaces [namespace] already exists". And that's when the outer exception is caught on line 215.
From there it's pretty clear what happens. The code assumes that "something really horrible" happened and proceeds to delete the apps namespace on line 218.
Let me know if there's anything I can provide to assist or if I'm just way off in my investigation of the code.
The text was updated successfully, but these errors were encountered: