Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.29] - Agent loadbalancer may deadlock when servers are removed #10515

Closed
brandond opened this issue Jul 14, 2024 · 1 comment
Closed
Assignees
Milestone

Comments

@brandond
Copy link
Member

Backport fix for Agent loadbalancer may deadlock when servers are removed

@aganesh-suse
Copy link

Validated on release-1.29 branch with version v1.29.7-rc1+k3s1

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.29.7-rc1+k3s1' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Identify the server that the agent is connected to : netstat -na | grep 6443
  2. Disconnect the network on that server: ip link set dev eth0 down (or whatever interface that node is using).
  3. Look up the journal logs for a loadbalancer update happening.

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.29.6+k3s1 (83ae095a)
go version go1.21.11
level=error msg="Remotedialer proxy error; reconnecting..." error="dial tcp <ip1>:6443: connect: connection timed out" url="wss://<ip1>:6443/v1-k3s/connect"
level=info msg="Connecting to proxy" url="wss://<ip1>:6443/v1-k3s/connect"
level=debug msg="Failed over to new server for load balancer k3s-agent-load-balancer: <ip1>:6443 -> <ip2>:6443"

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.29.7-rc1+k3s1 (93fc1897)
go version go1.22.5
level=info msg="Removing server from load balancer k3s-agent-load-balancer: <ip1>:6443"
level=info msg="Updated load balancer k3s-agent-load-balancer server addresses -> [<ip2>:6443 <ip3>:6443] [default: <ip1>:6443]"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants