Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: conversion issues in v1beta2 appears inconsistently in dns.gcp.upbound.io #667

Open
1 task done
marccortinas opened this issue Dec 9, 2024 · 3 comments
Open
1 task done
Labels
bug Something isn't working needs:triage

Comments

@marccortinas
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Affected Resource(s)

dns.gcp.upbound.io/v1beta1 - RecordSet

Resource MRs required to reproduce the bug

---
apiVersion: gcp.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: toosl
spec:
  credentials:
    impersonateServiceAccount:
      name: ""
    source: InjectedIdentity
  projectID: toosl
---
apiVersion: dns.gcp.upbound.io/v1beta1
kind: RecordSet
metadata:
  name: podinfo-private
spec:
  deletionPolicy: Delete
  forProvider:
    managedZone: some-zone
    name: podinfo-some-zone.
    project: tools
    rrdatas:
      - gw.some-other-zone.
    ttl: 300
    type: CNAME
  managementPolicies:
    - '*'
  providerConfigRef:
    name: tools

Steps to Reproduce

I shared previously the manifests yaml before with the providerconfig and the recourse recorset with a typo error in the provideconfig referenced

  1. Create a provider config using gcp upbound family called "toosl"
  2. Create some recordset using the providerconfig bad referenced "tools"

What happened?

By introducing a typo in the name of a GCP providerconfig, we created resources that referenced a non-existent providerconfig. This generated a conversion error in the Kubernetes API, preventing the creation of the resources.

During a recent operation, a typo was introduced into the name of a GCP provider configuration. Instead of naming it "tools", we mistakenly named it "toosl".

We believe it would be beneficial to implement a validation that prevents the creation of resources that reference non-existent providerconfigs. Additionally, we suggest improving error messages to make them more descriptive and facilitate the identification of the problem.

Subsequently, when creating "recordset" objects, this incorrect configuration was referenced. As a result, ArgoCD and the Kubernetes API returned the following error: Error from server: conversion webhook for dns.gcp.upbound.io/v1beta1, Kind=RecordSet failed: Post "https://provider-gcp-dns.crossplane-system.svc:9443/convert?timeout=30s": net/http: request canceled while waiting for the connection (Client.Timeout exceeded while awaiting headers)"

To temporarily resolve this issue, we had to delete the "dns.gcp.upbound.io" CRD. However, we would like to know if deleting the "providerconfigusage" objects where the "resource-kind" was "RecordSet" would have been sufficient. Would this action have prevented ArgoCD or the Kubernetes API from returning the aforementioned error?"

In this particular case, we had to delete the corresponding CRD to resolve the error. We would like to know if deleting the objects of type providerconfigusage where the resource-kind is RecordSet would have been sufficient to prevent the error.

Relevant Error Output Snippet

Argocd output
 to load initial state of resource recordset.dns.gcp.upbound.io: conversion webhook for dns.gcp.upbound.io/v1beta1, kind=recordset failed: post "https://provider-gcp-dns.crossplane-system.svc:9443/convert?timeout=30s

Crossplane Version

v1.18.0

Provider Version

v1.10.0

Kubernetes Version

v1.30.5-gke.1443001

Kubernetes Distribution

GKE

Additional Info

By introducing a typo in the name of a GCP providerconfig, we created resources that referenced a non-existent providerconfig. This generated a conversion error in the Kubernetes API, preventing the creation of the resources.

We believe it would be beneficial to implement a validation that prevents the creation of resources that reference non-existent providerconfigs. Additionally, we suggest improving error messages to make them more descriptive and facilitate the identification of the problem.

In this particular case, we had to delete the corresponding CRD to resolve the error. We would like to know if deleting the objects of type providerconfigusage where the resource-kind is RecordSet would have been sufficient to prevent the error.

@marccortinas marccortinas added bug Something isn't working needs:triage labels Dec 9, 2024
@turkenf
Copy link
Collaborator

turkenf commented Dec 10, 2024

Hi @marccortinas,

Thank you for the detailed issue report. I tried to reproduce the issue in a kind cluster but I couldn't. When I try to create a resource with the wrong ProviderConfig I get the expected error below:

  conditions:
  - lastTransitionTime: "2024-12-09T17:28:59Z"
    message: 'connect failed: cannot initialize the Terraform plugin SDK async external
      client: cannot get terraform setup: cannot get referenced ProviderConfig: ProviderConfig.gcp.upbound.io
      "tools" not found'
    reason: ReconcileError
    status: "False"
    type: Synced

The problem here seems to be that the conversion webhook mechanism is not working properly. Does this error occur consistently in your environment?

@marccortinas
Copy link
Author

We're using :crossplane: in multiple gke cluster and this issue only appears in one of them. Let me add I uninstalled/installed crossplane three times (basically via helm and after that "kubectl deletes") in this cluster and it seems there are not working fine. Let me explain....

Now, all is working fine as expected, crossplane, providerconfig,
it's strange that the recordset is created but the get command fails with a timeout.

I checked and I can see fine:

➜ kubectl get providers.pkg.crossplane.io
NAME                          INSTALLED   HEALTHY   PACKAGE                                                      AGE
provider-gcp-dns              True        True      xpkg.upbound.io/upbound/provider-gcp-dns:v1.10.0             3h5m
provider-gcp-secretmanager    True        True      xpkg.upbound.io/upbound/provider-gcp-secretmanager:v1.10.0   3h5m
upbound-provider-family-gcp   True        True      xpkg.upbound.io/upbound/provider-family-gcp:v1.11.1          3h5m

 ➜ kubectl get providerrevisions.pkg.crossplane.io upbound-provider-family-gcp-f4e89e32a1cf
NAME                                       HEALTHY   REVISION   IMAGE                                                 STATE    DEP-FOUND   DEP-INSTALLED   AGE
upbound-provider-family-gcp-f4e89e32a1cf   True      1          xpkg.upbound.io/upbound/provider-family-gcp:v1.11.1   Active                               3h1m

 ➜ kubectl get providerrevisions.pkg.crossplane.io provider-gcp-dns-d621c8ebcfeb
NAME                            HEALTHY   REVISION   IMAGE                                              STATE    DEP-FOUND   DEP-INSTALLED   AGE
provider-gcp-dns-d621c8ebcfeb   True      1          xpkg.upbound.io/upbound/provider-gcp-dns:v1.10.0   Active   1           1               3h2m   



➜ kubectl get providerconfigs.gcp.upbound.io -o yaml|kubectl neat
apiVersion: v1
items:
- apiVersion: gcp.upbound.io/v1beta1
  kind: ProviderConfig
  metadata:
    annotations:
      argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    labels:
      argocd.argoproj.io/instance: gcp-providerconfig-edge-some-gcp-project-name-gke1
    name: gcp-some-gcp-project-name
  spec:
    credentials:
      impersonateServiceAccount:
        name: ""
      source: InjectedIdentity
    projectID: some-gcp-project-name
- apiVersion: gcp.upbound.io/v1beta1
  kind: ProviderConfig
  metadata:
    annotations:
      argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
    labels:
      argocd.argoproj.io/instance: gcp-providerconfig-edge-some-gcp-project-name-gke1
    name: gcp-some-gcp-project-name-resources
  spec:
    credentials:
      impersonateServiceAccount:
        name: ""
      source: InjectedIdentity
    projectID: edo-prod-resources
kind: List
metadata: {}

The issue appears always when I try to get the recordset.

➜ kubectl get RecordSet abollado
Error from server: conversion webhook for dns.gcp.upbound.io/v1beta1, Kind=RecordSet failed: Post "https://provider-gcp-dns.crossplane-system.svc:9443/convert?timeout=30s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

➜ kubectl getRecordSet
Error from server: conversion webhook for dns.gcp.upbound.io/v1beta1, Kind=RecordSet failed: Post "https://provider-gcp-dns.crossplane-system.svc:9443/convert?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

But we can get with file specifying....

❯ kubectl get -f brais-dns-prod.yaml
NAME       SYNCED   READY   EXTERNAL-NAME   AGE
abollado   True     True    projects/ofuscated/managedZones/ofuscated-edo-world/rrsets/abollado.ofuscated.ofuscated./CNAME   166m

I increase logs to debug level and I can see the following:

➜ kubectl logs provider-gcp-dns-d621c8ebcfeb-cc8997ffd-k5n9l
2024-12-19T14:57:23Z	DEBUG	provider-gcp	Starting	{"sync-interval": "1h0m0s", "poll-interval": "10m0s", "poll-jitter": "30s", "max-reconcile-rate": 100}
2024-12-19T14:57:26Z	INFO	provider-gcp	Beta feature enabled	{"flag": "EnableBetaManagementPolicies"}
2024-12-19T14:57:26Z	INFO	provider-gcp	Alpha feature enabled	{"flag": "EnableAlphaExternalSecretStores"}
2024-12-19T14:57:26Z	INFO	provider-gcp	ESS TLS certificates path is set. Loading mTLS configuration.
2024-12-19T14:57:26Z	DEBUG	provider-gcp	Calling the inner handler for Create event.	{"gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet", "name": "abollado", "queueLength": 0}
2024-12-19T14:57:26Z	DEBUG	provider-gcp	Reconciling	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}}
2024-12-19T14:57:26Z	DEBUG	provider-gcp	Connecting to the service provider	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T14:57:26Z	DEBUG	provider-gcp	Instance state not found in cache, reconstructing...	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T14:57:26Z	DEBUG	provider-gcp	Observing the external resource	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T14:57:27Z	DEBUG	provider-gcp	External resource is up to date	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}, "uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "version": "1935994642", "external-name": "projects/ofuscated/managedZones/ofuscated-edo-world/rrsets/abollado.ofuscated.edo.world./CNAME", "requeue-after": "2024-12-19T15:06:58Z"}
2024-12-19T15:06:58Z	DEBUG	provider-gcp	Reconciling	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}}
2024-12-19T15:06:58Z	DEBUG	provider-gcp	Connecting to the service provider	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T15:06:58Z	DEBUG	provider-gcp	Observing the external resource	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T15:06:58Z	DEBUG	provider-gcp	External resource is up to date	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}, "uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "version": "1935994642", "external-name": "projects/ofuscated/managedZones/ofuscated-edo-world/rrsets/abollado.ofuscated.edo.world./CNAME", "requeue-after": "2024-12-19T15:17:26Z"}
2024-12-19T15:17:26Z	DEBUG	provider-gcp	Reconciling	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}}
2024-12-19T15:17:26Z	DEBUG	provider-gcp	Connecting to the service provider	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T15:17:26Z	DEBUG	provider-gcp	Observing the external resource	{"uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "name": "abollado", "gvk": "dns.gcp.upbound.io/v1beta1, Kind=RecordSet"}
2024-12-19T15:17:26Z	DEBUG	provider-gcp	External resource is up to date	{"controller": "managed/dns.gcp.upbound.io/v1beta1, kind=recordset", "request": {"name":"abollado"}, "uid": "ef6806d4-1781-4e7b-a7c1-a5b453798096", "version": "1935994642", "external-name": "projects/ofuscated/managedZones/ofuscated-edo-world/rrsets/abollado.ofuscated.edo.world./CNAME", "requeue-after": "2024-12-19T15:27:33Z"}```


@brais-real-edo
Copy link

brais-real-edo commented Dec 27, 2024

Hi @turkenf
I'm @marccortinas team mate and I'm working in the same issue.
We've been further investigating and we can extend the information about this issue
We've tried upgrading crossplane operator and provider to the latest version, but we obtain the same behaviour.
Then we tried downgrading the provider and we found that the last working version is 1.1.0
Checking changelog for version 1.2.0 v1.1.0...v1.2.0 we found that a new CRD version is introduced: v1beta2
First thought was issue comes with this version and we are right:
Applying this manifest is accepted:

apiVersion: dns.gcp.upbound.io/v1beta1
kind: RecordSet
metadata:
  name: abollado
spec:
  deletionPolicy: Delete
  forProvider:
    managedZone: <managed-zone>
    name: abollado.<domain>.
    project: <project>
    rrdatas:
      - www.google.es.
    ttl: 300
    type: CNAME
  managementPolicies:
    - '*'
  providerConfigRef:
    name: <project>

but applying this manifest is not accepted:

apiVersion: dns.gcp.upbound.io/v1beta2
kind: RecordSet
metadata:
  name: abollado
spec:
  deletionPolicy: Delete
  forProvider:
    managedZone: <managed-zone>
    name: abollado.<domain>.
    project: <project>
    rrdatas:
      - www.google.es.
    ttl: 300
    type: CNAME
  managementPolicies:
    - '*'
  providerConfigRef:
    name: <project>

We get the following error:

Error from server: conversion webhook for dns.gcp.upbound.io/v1beta1, Kind=RecordSet failed: Post "https://provider-gcp-dns.crossplane-system.svc:9443/convert?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

We have tried deleting providers and crossplane itself from this cluster, but every time we reinstall it, we find the same issue. We haven't been able to reproduce the issue in other clusters, so this doesn't occur consistently.

As we want to keep this cluster and avoid migrating all its applications to a new one, could you help us in some way? Do you have any clue? Which things do you recommend us to check? provider logs aren't useful as @marccortinas mentioned before... our guess is that something remains in the cluster although we uninstall everything carefully. Do you thing we are missing something?

Thanks in advance!

@marccortinas marccortinas changed the title [Bug]: Prevent creating managed resources from Crossplane without an existing providerconfig [Bug]: conversion issues in v1beta2 appears inconsistently in dns.gcp.upbound.io Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs:triage
Projects
None yet
Development

No branches or pull requests

3 participants