-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator takes down druid cluster upon re-creation #170
Comments
Operator will not mess up any current state unless something changes on the desired state ie the CR.
I faced similar issue long back, as i had uninstalled the operator chart. But then we did add --keep-crds which worked fine. DId you mark the CR for deletion, did helm do it ? i m curious. Also is your operator running cluster scope ? |
We are using Helm via Terraform
We are using We first took down the current operator in the namespace
This removed the whole operator helm chart.
This brought up the new operator successfully as described above. |
I am pretty sure, tf is re-creating CRD's. Try to use --keep-crds flag. Not sure where to add it in tf. |
I don't think that's the case, see we ran into #169 before - so we had a case where the druid operator was applied successfully, terraform was done, but unable to start because there was this exec format error. |
For such issues, I am confused with terminology being used and mentioned
Please look into
In your case its tf > helm for applying and operator for reconciling. There is an abstraction b/w the two points mentioned above. Please note operator performs lookups for CRD. Operator does not perform any lookups for CR. Apply config for CR is totally an event driven mechanism. Operator won't delete any CR, until a deletion timestamp is set and Operator will never delete a CRD. The way you are applying configurations to CRD and CR is something to be looked into. Regarding issue #169 , once i get time will push amd64 image. |
Small update on this before our meeting later: We found that both the CRD and the CR do have a |
@AdheipSingh we just tested the scenario we discussed in the meeting and Helm was the one responsible for deleting the CRD. |
We currently have the druid-operator and the druid cluster in the same namespace.
We'll soon add a second druid cluster and to keep things cleaner we would like to move the operator to it's own namespace but while we were testing this we ran into a couple of issues.
First, here is how we imaged this should work:
Step 1 and 2 worked as expected, we could bring down the operator without affecting our existing druid cluster.
On step 3 we faced a couple of issues, first the operator did not support helms
--skip-crd
flag which prevented the new operator from coming up while the CRD already existed, a fix for this already got merged hereA similar issue then occurs for clusterRoles, we fixed this in our local chart by adding an option to skip the clusterRoles as well and of course we can also open a PR for this.
Now, with both of the above in place we were able to bring up the druid operator in it's own namespace but once the operator was up it somehow removed the existing CRD and because of the owner dependence the whole druid cluster was gone with it.
We are yet to figure out why exactly the CRD got removed. It also did not create a new one, we were left with a running operator but no druid cluster.
We saw the following events in our kubernetes cluster which seem to be related
That said, we haven't yet found the root cause of the operator removing the CRD
The text was updated successfully, but these errors were encountered: