Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cruise Control Not Recreated On Deletion #990

Open
2 tasks done
Spazzy757 opened this issue Jun 6, 2023 · 7 comments
Open
2 tasks done

Cruise Control Not Recreated On Deletion #990

Spazzy757 opened this issue Jun 6, 2023 · 7 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Spazzy757
Copy link

Spazzy757 commented Jun 6, 2023

Description

I updated the Helm Chart to Latest and KOperator to 0.24.1 and was having some issue with the dpeloyment/upgrade

During this process Cruise Control Was deleted, however I cant seem to get the operator to recreate it.

I see there is an issue here: #740 but it says that it started working again, but Im on the latest version and have this issue

Expected Behavior

Cruise Control Should be Recreated

Actual Behavior

Cruise Control Does not exist and I keep getting errors:

{
   "level":"error",
   "ts":"2023-06-06T07:15:49.909Z",
   "msg":"failed to get unavailable brokers at rebalance",
   "controller":"CruiseControlTask",
   "controllerGroup":"kafka.banzaicloud.io",
   "controllerKind":"KafkaCluster",
   "KafkaCluster":{
      "name":"kafka",
      "namespace":"default"
   },
   "namespace":"default",
   "name":"kafka",
   "reconcileID":"f3a57a35-bb1b-49d7-893f-2751f6937fa4",
   "error":"failed to get list of volumes per broker from Cruise Control: sending HTTP request failed: Get \"http://kafka-cruisecontrol-svc.default.svc.cluster.local:8090/kafkacruisecontrol/kafka_cluster_state?json=true&verbose=true\": dial tcp: lookup kafka-cruisecontrol-svc.default.svc.cluster.local on 10.91.0.10:53: no such host",
   "errorVerbose":"sending HTTP request failed: Get \"http://kafka-cruisecontrol-svc.default.svc.cluster.local:8090/kafkacruisecontrol/kafka_cluster_state?json=true&verbose=true\": dial tcp: lookup kafka-cruisecontrol-svc.default.svc.cluster.local on 10.91.0.10:53: no such host\nfailed to get list of volumes per broker from Cruise Control\ngithub.com/banzaicloud/koperator/controllers.checkBrokerLogDirsAvailability\n\t/workspace/controllers/cruisecontroltask_controller.go:239\ngithub.com/banzaicloud/koperator/controllers.(*CruiseControlTaskReconciler).Reconcile\n\t/workspace/controllers/cruisecontroltask_controller.go:178\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594",
   "stacktrace":"github.com/banzaicloud/koperator/controllers.(*CruiseControlTaskReconciler).Reconcile\n\t/workspace/controllers/cruisecontroltask_controller.go:180\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234"
}

Affected Version

0.24.1

Steps to Reproduce

  1. Fresh Install of Koperator
  2. Cretae kafkacluster
  3. Delete Cruise Control

Checklist

@Spazzy757
Copy link
Author

I also tried going onto you slack group but it seems you need a Cisco email address 🥲

@Spazzy757
Copy link
Author

Seems the reason is due to the CC Topic creation:

{
   "level":"info",
   "ts":"2023-06-06T11:12:13.412Z",
   "logger":"webhooks.KafkaTopic",
   "msg":"rejected",
   "name":"kafka-cruise-control-topic",
   "namespace":"default",
   "invalid field(s)":"spec.name: Invalid value: \"__CruiseControlMetrics\": topic \"__CruiseControlMetrics\" already exists on kafka cluster and it is not managed by Koperator,\n\t\t\t\t\tif you want it to be managed by Koperator so you can modify its configurations through a KafkaTopic CR,\n\t\t\t\t\tadd this \"managedBy: koperator\" annotation to this KafkaTopic CR"
}

@Spazzy757
Copy link
Author

Okay for anybody that runs into this, there seems to be a race condition, when cruise control Is deleted and the __CruiseControlMetrics still exists, operator will not recreate the cruise control deployment, the way I handled this was I deleted the topic by running:

kubectl run kafka-topic -it  \
  --image=ghcr.io/banzaicloud/kafka:2.13-3.1.0 \
  --rm=true \
  --restart=Never \
  -- /opt/kafka/bin/kafka-topics.sh --bootstrap-server kafka-headless:29092 --topic __CruiseControlMetrics --delete

This deleted the topic and reconcile logic then created the Cruise Control Deployment, I think the issue arises from a race condition here

@panyuenlau panyuenlau added bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers labels Jun 8, 2023
@panyuenlau
Copy link
Member

panyuenlau commented Jun 8, 2023

@Spazzy757 thank you for reporting this issue and the information! Please don't hesitate to open a PR for the fix if you found the root-cause already.


About the Slack issue, can you please share a screen to show why it would take Cisco email for you to join so we can get it fixed by our admins?

In the meantime, please feel free to join via this temporary link: https://emergingtechcommunity.slack.com/archives/CK75ATB29

@Spazzy757
Copy link
Author

Slack:
image

But i got this link from one of the issues I beleive and the one you provided works without an issue thank you!

I dont have a ton of time at the moment but will swing around next week to see if anybody has picked this up or not and I might give it a crack then

@panyuenlau
Copy link
Member

@Spazzy757 Did you try with Google or Apple? I just tried with a fake Google account and I was able to join our Slack with the link we have in README https://eti.cisco.com/slack

@Spazzy757
Copy link
Author

yes it works for me, the other one I got was from a comment from an issue, hence probably why it was not working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants