Rolling pool update does not resume after reboot. #7613

tuxpowered · 2024-04-29T22:31:19Z

Are you using XOA or XO from the sources?

XO from the sources

Which release channel?

None

Provide your commit number

0794a

Describe the bug

When performing a "Rolling Update" on an HA cluster, XO proceeds to migrate all VM's off the primary node to other nodes. (good). The primary node then issues a reboot, however when it comes back on line the other nodes in the HA cluster do not resume downloading and applying patches.

Error message

Text
From Settings > Logs:

server.enable
{
  "id": "0bce7468-93e5-4376-93c1-c75082f8f436"
}
{
  "name": "ConnectTimeoutError",
  "code": "UND_ERR_CONNECT_TIMEOUT",
  "call": {
    "method": "session.login_with_password",
    "params": "* obfuscated *"
  },
  "message": "Connect Timeout Error",
  "stack": "ConnectTimeoutError: Connect Timeout Error
    at onConnectTimeout (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:190:24)
    at /opt/xen-orchestra/node_modules/undici/lib/core/connect.js:133:46
    at Immediate._onImmediate (/opt/xen-orchestra/node_modules/undici/lib/core/connect.js:174:9)
    at processImmediate (node:internal/timers:476:21)
    at process.callbackTrampoline (node:internal/async_hooks:128:17)"
}

pool.rollingUpdate
{
  "pool": "62d8471c-e515-0d7a-d77f-5ac38a945507"
}
{
  "message": "Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart",
  "name": "Error",
  "stack": "Error: Host 1f4b8cd7-e9da-414e-8558-8059a3165b98 took too long to restart
    at Xapi.rollingPoolReboot (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/pool.mjs:127:9)
    at Xapi.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xapi/mixins/patching.mjs:501:5)
    at XenServers.rollingPoolUpdate (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:689:5)
    at Xo.rollingUpdate (file:///opt/xen-orchestra/packages/xo-server/src/api/pool.mjs:231:3)
    at Api.#callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:366:20)"
}

To reproduce

Go to 'Home > Pools > Select HA Pool'
Click on 'Patches > Rolling pool Update'
See error (non displayed review logs)

Expected behavior

On reboot of the primary node, the migration of VM's back should resume and the process should go on to the next pool and repeat

Screenshots

No response

Node

18.20.0

Hypervisor

8.2.1

Additional context

It appears that the HA Master properly has VM's migrated and patches applied first.
Systems all have 10GB dedicated storage and 1GB interface for VM access and management.

The text was updated successfully, but these errors were encountered:

Danp2 · 2024-04-29T23:51:17Z

commit number 0794a

You are about a month behind on updates. Also, have you seen the latest revisions to the documentation where it explains how to increase the timeout period? https://xen-orchestra.com/docs/manage_infrastructure.html#rolling-pool-updates-rpu

tuxpowered · 2024-04-30T00:17:53Z

Oh wow, that far behind already? Seems like it was just a few weeks ago I updated.
Did not see the timeout update. I will update and review.
It is odd because I have 2 clusters one updates fine np the other has an issue (just started testing the other cluter)

Danp2 · 2024-04-30T10:53:39Z

[Rolling Pool Update/Reboot] Use XO tasks for better reportability (PR #7578)

This was merged earlier today, which will make monitoring the RPU much easier.

b-Nollet · 2024-05-21T09:51:59Z

We've recently made some changes to the RPU, including a fix for a bug introduced by the release earlier this month.
Can you update to the latest version and test if the problem is still present? (and provide us with the XO task logs)

tuxpowered added the type: bug 🐛 label Apr 29, 2024

marcungeschikts assigned b-Nollet May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling pool update does not resume after reboot. #7613

Rolling pool update does not resume after reboot. #7613

tuxpowered commented Apr 29, 2024 •

edited

Loading

Danp2 commented Apr 29, 2024

tuxpowered commented Apr 30, 2024

Danp2 commented Apr 30, 2024

b-Nollet commented May 21, 2024

Rolling pool update does not resume after reboot. #7613

Rolling pool update does not resume after reboot. #7613

Comments

tuxpowered commented Apr 29, 2024 • edited Loading

Are you using XOA or XO from the sources?

Which release channel?

Provide your commit number

Describe the bug

Error message

To reproduce

Expected behavior

Screenshots

Node

Hypervisor

Additional context

Danp2 commented Apr 29, 2024

tuxpowered commented Apr 30, 2024

Danp2 commented Apr 30, 2024

b-Nollet commented May 21, 2024

tuxpowered commented Apr 29, 2024 •

edited

Loading