-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace offloading results in liqo-controller-manager pod error: invalid memory address or nil pointer dereference #1980
Comments
Additional logs and comment: _In the logs, old foreign cluster names and tenant namespaces are getting printed._These resources do not exist now. E0904 16:25:28.300569 1 controller.go:324] "Reconciler error" err="namespaces "liqo-tenant-adm1-npp1-6eb08f" not found" controller="resourceoffer" controllerGroup="sharing.liqo.io" controllerKind="ResourceOffer" ResourceOffer="liqo-tenant-adm1-npp1-6eb08f/adm1-npp1" namespace="liqo-tenant-adm1-npp1-6eb08f" name="adm1-npp1" reconcileID="b9180d88-279b-4136-a0e4-4d5f969a6299" |
Hi @sumitAtDigital thanks for your issue. I've just tried your configuration (k8s version + liqo version) on KinD and it works. Unfortunately, I don't have experience with PKS/TKGI and at the moment we don't have an infrastructure which supports it. @aleoli do you have any experience? |
Hi @cheina97, Thanks for the reply. This was working till 0.8.3 version. Cluster inbound Peering is working fine with v0.9.3. I deleted the previous foreign clusters peered earlier, to test the updated liqo version and changed the cluster name in the values file to make it short. **Commands: **
But, whenever, I unoffload the namespace, controller pod comes back to normal with below logs: E0904 17:33:30.021742 1 deletion-routine.go:105] error removing finalizer: namespaces "liqo-tenant-throbbing-darkness-ec30d7" not found |
In addition to that, we are getting below warning intermittently, and in this case pods are not offloaded to the required cluster: _> liqoctl offload namespace adm2 --namespace-mapping-strategy EnforceSameName --pod-offloading-strategy LocalAndRemote --selector "kubernetes.io/hostname=liqo-adm1-npp1" |
Hi, @sumitAtDigital still cannot find a way to replicate your problem. We linked a PR, but we don't think it resolves your issue. |
HI @cheina97, Thanks for performing tests. Let me try to elaborate further. Below are some observations:
Please check once, where these conditions written in the golang code, which is throwing runtime error. W0906 08:54:18.253257 1 cache.go:246] foreignclusters.discovery.liqo.io "foreign cluster with ID f4cb997d-0bfc-4c0a-a8c2-dbb7b4c40cdf" not found
|
@cheina97 : It would be great, if you please share more info on the liqo controller errors/logs, as why those specific lies of codes throwing runtime errors and why it is searching for already deleted foreign clusters, while only one existing in reality. |
Hi @sumitAtDigital I'm a bit confused about the problem you are having. I don't understand if you are encountering these problems only after you upgraded from v0.8.3 to v0.9.3 or even if you start from a clean cluster and install the v0.9.3 I also don't understand if you are still observing all the errors presented in the issue. Have you tried to restart the controllers killing all Liqo pods? |
|
I suggest to you to operate in this order:
|
@cheina97 : Thanks for pointing out. We have done already these steps. |
@cheina97 : This is really helpful. As, we were not aware about the residue left after deletion. ran the command provided and below are results:
|
Tried deleting, but not working with force even:
Is there any other command reference, or patch that can remove the deleting resourceoffers.sharing.liqo.io? Please suggest. |
Have you removed the finalized on the resource? |
Tried, but no success: kubectl --kubeconfig ./config-file patch resourceoffers.sharing.liqo.io adm1-npp1 -n liqo-tenant-adm1-npp1-6eb08f -p '{"metadata":{"finalizers":null}}' --type=merge |
What about if you use kubectl edit? |
Tried deleting that by removing the finalizers and saving, but no success: apiVersion: sharing.liqo.io/v1alpha1
|
That's really strange, can you check if the resourceoffers are recreating? Use kubectl get resourceoffers -A -w to check if the resources are updating |
It's simply stuck/hanging from last 2-3 minutes:
|
Ok, so it is not strange |
Can you get the single resource? |
sure, it hangs without final results: |
@cheina97 : Removed version 0.9.3 and installed 0.9.4, but still the same error: I0920 15:26:26.279404 1 finalizer.go:37] Removing finalizer virtualnode-controller.liqo.io/finalizer from virtual-node liqo-adm1-npp1 |
DId you restart from a clean cluster? |
@cheina97 : Yes, it was done from scratch (complete deletion and then fresh install), but on the same cluster and with same name. We will let you know, if we resume again with Liqo in future. |
Closed for inactivity |
What happened: Testing cluster peering and namespace/pod offloading. Namespace offloading results in below liqo-controller-manager pod error:
I0904 16:15:12.246218 1 namespaceoffloading_controller.go:96] NamespaceOffloading "adm2/offloading" status correctly updated
I0904 16:15:12.246253 1 controller.go:114] "Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" controller="namespaceoffloading" controllerGroup="offloading.liqo.io" controllerKind="NamespaceOffloading" NamespaceOffloading="adm2/offloading" namespace="adm2" name="offloading" reconcileID="1276e086-6b1c-4df6-93a1-524933ad0de5"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x108 pc=0x19a5e42]
goroutine 558 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:115 +0x1e5
Environment: Developemnt
kubectl version
): v1.23.3The text was updated successfully, but these errors were encountered: