Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RP tuner fails to complete on RHEL #177

Closed
pmw-rp opened this issue May 25, 2023 · 1 comment
Closed

RP tuner fails to complete on RHEL #177

pmw-rp opened this issue May 25, 2023 · 1 comment

Comments

@pmw-rp
Copy link
Contributor

pmw-rp commented May 25, 2023

When using a RHEL image on Azure, the tuner fails to make progress and times out after 15 mins.

The RHEL image was defined in vars.tf as follows:

variable "vm_image" {
  description = "Source image reference for the VMs"
  type = object({
    publisher = string
    offer     = string
    sku       = string
    version   = string
  })
  default = {
    publisher = "RedHat"
    offer     = "RHEL"
    sku       = "8-lvm-gen2"
    version   = "latest"
  } 
}

During the issue, we see the following line repeatedly in the logs:

May 25 10:01:21 redpanda0 rpk[32651]: WARN  2023-05-25 10:01:21,464 [shard 0] cluster - cluster_discovery.cc:247 - Error requesting cluster bootstrap info from {host: 10.0.1.4, port: 33145}, retrying. std::__1::system_error (error system:113, No route to host)

However, the port is open:

[root@redpanda0 ~]# netstat -plutan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1674/sshd           
tcp        0      0 10.0.1.5:33145          0.0.0.0:*               LISTEN      32651/redpanda      
tcp        0      0 0.0.0.0:5355            0.0.0.0:*               LISTEN      1025/systemd-resolv 
...

This seems to be related to firewalling. After logging in to each broker VM and running sudo systemctl stop firewalld, the playbook ran to completion successfully on retrying.

@hcoyote
Copy link
Contributor

hcoyote commented May 25, 2023

would be interesting to see what happens with firewalld enabled, and indivudally running the rpk redpanda tuner enabling one at a time and seeing how startup fails.

I bet it's the fstrim one. docs imply that this makes some socket call out to dbus to (but should be happening over a unix socket, not a network socket)

https://docs.redpanda.com/docs/reference/rpk/rpk-redpanda/rpk-redpanda-tune-list/

@gene-redpanda gene-redpanda closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants