Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oned fails with segfault being unable to read monitord configuration file #6687

Closed
3 tasks
OpenNebulaSupport opened this issue Aug 8, 2024 · 1 comment
Closed
3 tasks

Comments

@OpenNebulaSupport
Copy link
Collaborator

Description
oned fails with segfault if the /etc/one/monitord.conf file can not be read by oneadmin user.

The message from the system log (/var/log/message in case of RHEL-based distro and /var/log/syslog in case of Debian-based one):
Ubuntu 22.04:
Aug 8 09:35:17 <host> kernel: [193417.063587] oned[555688]: segfault at 560428f3b39c ip 0000560147e09bcf sp 00007fdb64ff8a50 error 4 in oned[560147da5000+2b1000]

RHEL 7:
Aug 8 09:24:53 <host> kernel: oned[4357]: segfault at 55b71e3b4760 ip 000055b71e3b4760 sp 00007ffb41ffac98 error 15

To Reproduce

systemctl stop opennebula.service

ls -al /etc/one/monitord.conf 
-rw-r-----. 1 root oneadmin 8952 Nov 24  2023 /etc/one/monitord.conf

chgrp root /etc/one/monitord.conf

ls -al /etc/one/monitord.conf 
-rw-r-----. 1 root root 8952 Nov 24  2023 /etc/one/monitord.conf

systemctl start opennebula.service
Job for opennebula.service failed because a fatal signal was delivered to the control process. See "systemctl status opennebula.service" and "journalctl -xe" for details.

Check system logs for oned segfault.

Details

  • Affected Component: Core
  • Version: shows up on one-6.4.5 on CentOS 7 and one-6.8.3 on ubuntu 22.04.

Progress Status

  • Code committed
  • Testing - QA
  • Documentation (Release notes - resolved issues, compatibility, known issues)
@paczerny
Copy link
Member

paczerny commented Aug 9, 2024

Note: Double check that after restart of onemonitord, the monitoring is working correctly (including HA deployments)

@paczerny paczerny added this to the Release 6.10.1 milestone Aug 22, 2024
@paczerny paczerny self-assigned this Aug 22, 2024
rsmontero added a commit to OpenNebula/docs that referenced this issue Sep 5, 2024
rsmontero pushed a commit that referenced this issue Sep 5, 2024
* Cleanup oned in case of initialization error

* Fix monitoring after onemonitord restart. The code includes a "hook" point in case a driver is re-started so custom code can be executed. InformationManager sends the list of hosts and raft status in this case.

* B #5801: Update error msg, in case of duplicated drivers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants