-
-
Notifications
You must be signed in to change notification settings - Fork 732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in 2.3.1 compared to 2.2.8 with virtual_server #2435
Comments
Unfortunately I cannot test this issue on Foobar Linux since I do not have a subscription for it. However, nothing has changed in keepalived in relation to module loading since keepalived v2.2.8, and I have never come across this problem (nor has it been reported) before. Once the ip_vs kernel module is loaded (and keepalived does this if necessary) then any of the other kernel modules (such as ip_vs_rr) should be automatically loaded by the kernel; certainly keepalived has never had, and has never needed to have, functionality to load any of the other ipvs modules (the ipvsadm utility likewise only loads the ip_vs module). I have looked at the code that handles loading the ip_vs module, and in certain circumstances errno may not be set appropriately so the "no such file or directory" error may be incorrect, but it doesn't look like that to me. I will tidy that up soon. My guess about what is occurring at startup is that the ip_vs module is being loaded by keepalived, but that the subsequent call of ipvs_init() (see ipvs_start() in keepalived/check/ipvswrapper.c) is occurring too quickly after loading the module and returns an error; the keepalived_healthchecker process then terminates. When you restart keepalived, since the ip_vs module is already loaded, the problem no longer occurs. As a workaround you add a startup script. For example add:
and /etc/keepalived/keepalived_start.sh:
If necessary you could add a loop after the modprobe to check until the ip_vs module has loaded. |
I guess it is timing issue. You can test with centos stream 9 - it has same kernel and same tooling. |
But again. No other changes than building 2.3.1 instead of 2.2.8 - with same options and this problem appeared. |
I can reproduce this on Centos Stream 9. If I remove the ip_vs module before running keepalived, then he problem occurs. I will investigate further. |
It appears that if we need to install the ip_vs module (via keepalived_modprobe), then the first ipvs_nl_send_message() call fails, but it succeeds thereafter. On most distros the call genl_ctrl_resolve(sock, IPVS_GENL_NAME) loads the ip_vs module, so we don't need to call keepalived_modprobe(), and we don't need to retry the ipvs_nl_send_message(). On the other hand, if genl_ctrl_resolve() does not load the ip_vs module, then we need to make the second call of ipvs_nl_send_message(). It appears that RHEL based distros (including Centos Stream but not Fedora) do not load the ip_vs module when genl_ctrl_resolve() is called, but most other distros do. It appears that if there is an entry: It was commit c7bade7 that caused the problem, although on the face of it it shouldn't have made any difference. Moving the check for !msg to be first code in ipvs_nl_send_message() meant that the rather bizarre ipvs_nl_send_message(NULL, NULL, NULL) in ipvs_init() no longer called open_nl_sock(). Moving the check for !msg to after the call of open_nl_sock() resolved this issue, but was clearly wrong since nlmsg_free(msg) was then called with a NULL pointer (although nlmsg_free() does check for that). The solution is to call ipvs_nl_send_message() twice in ipvs_getinfo() if we have loaded the ip_vs module. Commit a0b6d3b resolves this issue. |
Describe the bug
When keepalived.service is starting and virtual_servers have been configured, after machine boot keepalived fail to start.
After first start try only ip_vs module is loaded.
When restarting keepalived.service, on second start it manages to load ip_vs_rr module and starts properly.
To Reproduce
Happens every time you reboot the machine, on first startup virtual_server code fails to start.
Expected behavior
I'd expect all modules to be loaded automatically and service to work.
Keepalived version
2.3.1
Output of
keepalived -v
Distro (please complete the following information):
Details of any containerisation or hosted service (e.g. AWS)
QEMU/KVM VM
Configuration file:
Notify and track scripts
In this case notify script was not configured so it was a noop
System Log entries
Did keepalived coredump?
No
Additional context
With 2.2.8 version exactly same configuration worked just fine so this is regression after that release.
The text was updated successfully, but these errors were encountered: