Skip to content
This repository was archived by the owner on May 6, 2020. It is now read-only.

Critical flaw in scale method allows for entire namespace deletion #1313

Closed
lshemesh opened this issue Jul 21, 2017 · 5 comments
Closed

Critical flaw in scale method allows for entire namespace deletion #1313

lshemesh opened this issue Jul 21, 2017 · 5 comments

Comments

@lshemesh
Copy link
Contributor

lshemesh commented Jul 21, 2017

I brought this up in the Slack channel and it didn't seem to be taken very seriously so I'm creating this issue in hopes that someone takes a deeper look. Based on my own investigation it seems that there is a certain scenario in which deis scale can actually destroy an entire app within it's namespace. This also includes other components deployed to the same namespace.

Calling deis scale runs the scale method in app.py which then calls self.create which is where the problem occurs. On line 199 there is a call to _scheduler.ns.get(namespace) which could raise a KubeException error. The code assumes that an error thrown at that moment means the namespace doesn't exist but this isn't always the case. In the controller log I got a "There was a problem retrieving data from the Kubernetes API server". This DOES NOT mean the namespace doesn't exist.

Following the exception from line 199, an attempt to create the namespace is made on line 201. This call will fail because of "409 Conflict namespaces [namespace] already exists". And that's when the outer exception is caught on line 215.

From there it's pretty clear what happens. The code assumes that "something really horrible" happened and proceeds to delete the apps namespace on line 218.

Let me know if there's anything I can provide to assist or if I'm just way off in my investigation of the code.

@lshemesh
Copy link
Contributor Author

lshemesh commented Aug 1, 2017

Can someone look at this and tell me if it does what I think it does? If the Deis guys are too busy with other work then I will try and tackle this myself and create a pull request. I don't want to do it if my investigation is wrong.

@mboersma
Copy link
Member

mboersma commented Aug 3, 2017

@lshemesh I think your analysis is correct, unfortunately. In the outermost try / except block maybe we need to look at the details of the KubeException, or somehow pass enough context so that we know we're actually in the midst of deis scale on an existing app.

@lshemesh
Copy link
Contributor Author

lshemesh commented Aug 3, 2017

If you can check the details of KubeException and figure out that the original exception wasn't because of a non existing namespace should the code just error out? I didn't mention it in my previous post but I believe this happened when I ran two scale commands back to back.

@mboersma
Copy link
Member

mboersma commented Aug 3, 2017

If you can check the details of KubeException and figure out that the original exception wasn't because of a non existing namespace should the code just error out?

Yes, I think that would be appropriate. Just somehow skip trying to delete the namespace in that case, and allow the raise ServiceUnavailable line to raise the actual error.

lshemesh added a commit to lshemesh/controller that referenced this issue Aug 7, 2017
Fix for deis#1313

An unhandled exception in ns.create due to "Conflict
namespaces" will
wipe out an entire namespace. I've added a catch here to
protect against
that occurring.
@lshemesh
Copy link
Contributor Author

lshemesh commented Aug 7, 2017

@mboersma check out my masterpiece

bacongobbler pushed a commit that referenced this issue Aug 8, 2017
Fix for #1313

An unhandled exception in ns.create due to "Conflict
namespaces" will
wipe out an entire namespace. I've added a catch here to
protect against
that occurring.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants