Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadBalancerClass error #5543

Open
aki263 opened this issue Feb 5, 2024 · 7 comments
Open

LoadBalancerClass error #5543

aki263 opened this issue Feb 5, 2024 · 7 comments

Comments

@aki263
Copy link

aki263 commented Feb 5, 2024

Bug Report

What version of Kubernetes are you using?

1.27
What version of TiDB Operator are you using?

1.5.2
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?

Not relevant

What's the status of the TiDB cluster pods?

All running

What did you do?

I installed TIDB operator 1.5.2 via helm and my tidb cluster config looks something like this


apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: tidb-cluster
  namespace: tidb-cluster
spec:
  version: v7.4.0
  timezone: EST
  configUpdateStrategy: RollingUpdate
  pvReclaimPolicy: Delete
  schedulerName: default-scheduler
  topologySpreadConstraints:
  - topologyKey: topology.kubernetes.io/zone
  enableDynamicConfiguration: true
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    maxFailoverCount: 0
    replicas: 1
    storageClassName: gp3
    requests:
      storage: "10Gi"
    config: |
      [dashboard]
        internal-proxy = true
      [replication]
        max-replicas = 3
    nodeSelector:
      dedicated: pd
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 0
    replicas: 1
    storageClassName: gp3
    requests:
      storage: "2048Gi"
    config: {}
    nodeSelector:
      dedicated: tikv

  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 0
    replicas: 1
    hostNetwork: true
    config: |
      [log]
        level = "debug"
        format = "json"
        enable-timestamp = true
        [log.file]
            filename = "/var/log/tidb/tidb-general.log"
    storageClassName: gp3
    service:
      # loadBalancerClass: "service.k8s.aws/nlb"
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
        service.beta.kubernetes.io/aws-load-balancer-target-node-labels: dedicated=tidb
        external-dns.alpha.kubernetes.io/hostname: xyz.com
      exposeStatus: false
      externalTrafficPolicy: Local
      type: NodePort
    config: |
      [performance]
        tcp-keep-alive = true
    annotations:
      tidb.pingcap.com/sysctl-init: "true"
    podSecurityContext:
      sysctls:
      - name: net.ipv4.tcp_keepalive_time
        value: "300"
      - name: net.ipv4.tcp_keepalive_intvl
        value: "75"
      - name: net.core.somaxconn
        value: "32768"
    separateSlowLog: true
    nodeSelector:
      dedicated: tidb

Everything works as expected but I can not make any changes to tidb component like upgrade version or update the replicas because of following error in tidb-operator

I0205 16:26:13.315960       1 service_control.go:91] update Service: [tidb-cluster/tidb-cluster-pd] successfully, kind: , name: tidb-cluster
I0205 16:26:13.379126       1 equality.go:87] Service spec diff for tidb-cluster/tidb-cluster-tidb:   v1.ServiceSpec{
  	Ports: []v1.ServicePort{
  		{
  			... // 3 identical fields
  			Port:       4000,
  			TargetPort: {IntVal: 4000},
- 			NodePort:   0,
+ 			NodePort:   31593,
  		},

  	},
  	Selector:  {"app.kubernetes.io/component": "tidb", "app.kubernetes.io/instance": "tidb-cluster", "app.kubernetes.io/managed-by": "tidb-operator", "app.kubernetes.io/name": "tidb-cluster"},
  	ClusterIP: "",
  	... // 6 identical fields
  	ExternalName:             "",
  	ExternalTrafficPolicy:    "Local",
- 	HealthCheckNodePort:      0,
+ 	HealthCheckNodePort:      31590,
  	PublishNotReadyAddresses: false,
  	SessionAffinityConfig:    nil,
  	... // 4 identical fields
  }
I0205 16:26:13.379153       1 tidb_member_manager.go:518] Sync TiDB service tidb-cluster/tidb-cluster-tidb, spec equal: false, annotations equal: true, label equal: true
E0205 16:26:13.385425       1 tidb_cluster_controller.go:143] TidbCluster: tidb-cluster/tidb-cluster, sync failed Service "tidb-cluster-tidb" is invalid: spec.loadBalancerClass: Invalid value: "null": may not change once set, requeuing

I am using AWS loadbalancer controller 2.7.0 and external DNS.

What did you expect to see?

What did you see instead?

@csuzhangxc
Copy link
Member

Is the existing Service with loadBalancerClass set?

spec.loadBalancerClass: Invalid value: "null": may not change once set, requeuing

@aki263
Copy link
Author

aki263 commented Feb 6, 2024

@csuzhangxc I think alb controller is setting this automatically. Its set to loadBalancerClass: service.k8s.aws/nlb in my service automatically.

@csuzhangxc
Copy link
Member

currently, tc.spec.tidb.service doesn't support all fields of the K8s service, the workaround for this may create an extra service out of the TidbCluster CR.

if we need to fix it in TiDB Operator, we may need to add all fields support in TidbCluster or update the check&update for a Service.

@verbotenj
Copy link

This controller uses service.k8s.aws/nlb as the default LoadBalancerClass, you can customize it to a different value via the controller flag --load-balancer-class.
https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.4/guide/service/nlb/

@evertoncsilva
Copy link

evertoncsilva commented Oct 29, 2024

I am having the same problem on operator version 1.6, EKS version 1.31.
Installing the cluster at first works, but when I attempted to changed the replicas or the resource request/limits it was not applying.
Checking the operator logs are continuously repeating the message:

1 tidb_cluster_controller.go:144] TidbCluster: tidb-cluster/tidb-br2, sync failed Service "tidb-br2-tidb" is invalid: spec.loadBalancerClass: Invalid value: "null": may not change once set, requeuing

Workaround for now was to just change the service to ClusterIP, I will later manually create a loadbalancer service.

One thing I noticed that might be of help in other test clusters I created previously: without AWS Load Balancer Controller installed this error didn't happen.

@csuzhangxc
Copy link
Member

external LB controller (like "AWS Load Balancer Controller") often needs to update some fileds of the Service and this may cause conflicts between TiDB Operator. So the recommended way is to create another Service for LB

@evertoncsilva
Copy link

It would be nice to have this issue with LB controllers more clearly specified in the docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants