Skip to content

Disk fixtures rework #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Disk fixtures rework #330

wants to merge 25 commits into from

Conversation

ydirson
Copy link
Contributor

@ydirson ydirson commented Jun 18, 2025

This replaces the existing disk fixtures, which rely on the assumption that we can take a dom0 device name on the pool master, and expect to find it across the whole pool with the same meaning. This was causing issues for GlusterFS and Linstor tests when some hosts in the test pool get their disk device names swapped after a reboot.

The old mechanism used a mandatory --sr_disk flag to control the behavior of various sr_disk* fixtures, with little code-sharing. The new one is based on a central disks fixture to properly introspect the pool's disks, which comes with its own --disks flag to filter which disks to expose to the tests, but by default exposes all of them. It then uses additional fixtures to filter disks using different criteria (for now: availability and blocksize of the disk), and then some pool-level fixtures allowing testing of distributed SR without risking to (try to) use the wrong disk.

Builds on:

@ydirson ydirson changed the title Diskfeatures rework Disk fixtures rework Jun 18, 2025
@ydirson ydirson force-pushed the disks-rework branch 6 times, most recently from ac95623 to 1f72326 Compare June 19, 2025 16:41
@ydirson
Copy link
Contributor Author

ydirson commented Jun 19, 2025

Last push:

  • adds a first --disk implementation
  • fixes unused_disks ignoring any filtering done in disks
  • extends a bit the use of the type aliases
  • notes in WIP that pool_with_unused_disk only applies to pool A, we'll need
    • to make that more explicit
    • something better for linstor pool-join

@ydirson ydirson force-pushed the disks-rework branch 4 times, most recently from 0c6c08c to 6642deb Compare June 24, 2025 12:58
@ydirson ydirson requested a review from Nambrok June 24, 2025 12:58
@ydirson ydirson force-pushed the disks-rework branch 3 times, most recently from 18ad9aa to c1c6bfb Compare June 26, 2025 08:03
@ydirson ydirson marked this pull request as ready for review June 26, 2025 08:09
conftest.py Outdated
Comment on lines 381 to 415
def unused_disks(disks: dict[Host, list[DiskDevName]]
) -> Generator[dict[Host, list[DiskDevName]]]:
ret = {host: [disk for disk in host_disks if host.disk_is_available(disk)]
for host, host_disks in disks.items()
}
logging.debug("available disks collected: %s", {host.hostname_or_ip: value for host, value in ret.items()})
yield ret

@pytest.fixture(scope='session')
def sr_disk_for_all_hosts(pytestconfig, request, host):
disks = pytestconfig.getoption("sr_disk")
if len(disks) != 1:
pytest.fail("This test requires exactly one --sr-disk parameter")
disk = disks[0]
master_disks = host.available_disks()
assert len(master_disks) > 0, "a free disk device is required on the master host"

if disk != "auto":
assert disk in master_disks, \
f"disk or block device {disk} is either not present or already used on master host"
master_disks = [disk]

candidates = list(master_disks)
for h in host.pool.hosts[1:]:
other_disks = h.available_disks()
candidates = [d for d in candidates if d in other_disks]

if disk == "auto":
assert len(candidates) > 0, \
f"a free disk device is required on all pool members. Pool master has: {' '.join(master_disks)}."
logging.info(f">> Found free disk device(s) on all pool hosts: {' '.join(candidates)}. Using {candidates[0]}.")
else:
assert len(candidates) > 0, \
f"disk or block device {disk} was not found to be present and free on all hosts"
logging.info(f">> Disk or block device {disk} is present and free on all pool members")
yield candidates[0]
def unused_4k_disks(unused_disks: dict[Host, list[DiskDevName]]
) -> Generator[dict[Host, list[DiskDevName]]]:
ret = {host: [disk for disk in host_disks if host.disk_is_4k(disk)]
for host, host_disks in unused_disks.items()
}
logging.debug("available 4k disks collected: %s", {host.hostname_or_ip: value for host, value in ret.items()})
yield ret
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one thing here to be noted: unlike sr_disk which ended up only getting access to 512B-blocks disks, unused_disks gets all disks regardless of blocksize, and unused_4k_disks is just an additional filter on top of the former.
This means that in theory a test that would need both a standard disk and a 4k one could see the additional one get allocated the 4k disk of the host, and when requesting a 4k one there would be none left, if the fixtures are specified in that order. Specifying them in the "most specific to less specific order" naturally works around that issue.

Do we want to do better? Like making sure unused_disks picks 512B disks first? I think it might not be worth the added complexity, but do I miss some cases?

Copy link
Contributor

@Nambrok Nambrok Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous sr_disk gave only 512B disks because a 4KiB disk in non-LargeBlock SR would fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then maybe we should introduce unused_512B_disks to get things right and make this visible.
But at some point those SR currently limited to 512B will be able to use larger blocksize, and we'll want to test them on both 512B and 4k. Maybe this should indeed be anticipated and use blocksize as a test parameter already, so it will be trivial to add those new tests when time comes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing SR would work for 4KiB only with QCOW2 images, trying to create a VHD would still fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looks like a case for having linked parameters, as in

(("512B", "vhd"),
 ("512B", "qcow"),
 ("4k", "qcow")
)

Those would parametrize vdi_on_*_sr and the tests explicitly creating a VDI or importing a VM, we should be able to pass the proper parameter values to the SR-creation fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sr_disk is going away here, I'm trying to see how must useful the replacement can be.
Currently we doing 512B without even thinking about it, we should IMHO make it explicit.

Will push an update proposal shortly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looks like a case for having linked parameters, as in

(("512B", "vhd"),
 ("512B", "qcow"),
 ("4k", "qcow")
)

Those would parametrize vdi_on_*_sr and the tests explicitly creating a VDI or importing a VM, we should be able to pass the proper parameter values to the SR-creation fixture.

I do have a prototype working, based on this fixture in toplevel conftest.py:

@pytest.fixture(params=(512, 4096), scope='session')
def disk_blocksize(request) -> Generator[int]:
    return request.param

overridable in narrower scopes with a subset of the values, eg. for largeblock/conftest.py:

@pytest.fixture(params=(4096,), scope='session')
def disk_blocksize(request) -> Generator[int]:
    return request.param

however making this a test parameter, rather than a restriction on disk selection, has a nasty side-effect: we wouldn't be able to use both 512B and 4k SRs in a single test, preventing e.g. migration tests between the 2 kinds of disk. Thus I'm withdrawing this part of the proposal, and will get something closer to what we have today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then maybe we should introduce unused_512B_disks to get things right and make this visible.

Went this way

But at some point those SR currently limited to 512B will be able to use larger blocksize, and we'll want to test them on both 512B and 4k. Maybe this should indeed be anticipated and use blocksize as a test parameter already, so it will be trivial to add those new tests when time comes?

At that point we should be able to introduce a blocksize param on the test, and pull either unused_512B_disks or unused_4k_disks depending on the param value. @rushikeshjadhav I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 512B disk on LargeBlock would also fail BTW

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 512B disk on LargeBlock would also fail BTW

Yes, that's why I switched to explicit unused_512B_disks and unused_4k_disks. I believe everything is better now :)

conftest.py Outdated
Comment on lines 381 to 415
def unused_disks(disks: dict[Host, list[DiskDevName]]
) -> Generator[dict[Host, list[DiskDevName]]]:
ret = {host: [disk for disk in host_disks if host.disk_is_available(disk)]
for host, host_disks in disks.items()
}
logging.debug("available disks collected: %s", {host.hostname_or_ip: value for host, value in ret.items()})
yield ret

@pytest.fixture(scope='session')
def sr_disk_for_all_hosts(pytestconfig, request, host):
disks = pytestconfig.getoption("sr_disk")
if len(disks) != 1:
pytest.fail("This test requires exactly one --sr-disk parameter")
disk = disks[0]
master_disks = host.available_disks()
assert len(master_disks) > 0, "a free disk device is required on the master host"

if disk != "auto":
assert disk in master_disks, \
f"disk or block device {disk} is either not present or already used on master host"
master_disks = [disk]

candidates = list(master_disks)
for h in host.pool.hosts[1:]:
other_disks = h.available_disks()
candidates = [d for d in candidates if d in other_disks]

if disk == "auto":
assert len(candidates) > 0, \
f"a free disk device is required on all pool members. Pool master has: {' '.join(master_disks)}."
logging.info(f">> Found free disk device(s) on all pool hosts: {' '.join(candidates)}. Using {candidates[0]}.")
else:
assert len(candidates) > 0, \
f"disk or block device {disk} was not found to be present and free on all hosts"
logging.info(f">> Disk or block device {disk} is present and free on all pool members")
yield candidates[0]
def unused_4k_disks(unused_disks: dict[Host, list[DiskDevName]]
) -> Generator[dict[Host, list[DiskDevName]]]:
ret = {host: [disk for disk in host_disks if host.disk_is_4k(disk)]
for host, host_disks in unused_disks.items()
}
logging.debug("available 4k disks collected: %s", {host.hostname_or_ip: value for host, value in ret.items()})
yield ret
Copy link
Contributor

@Nambrok Nambrok Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous sr_disk gave only 512B disks because a 4KiB disk in non-LargeBlock SR would fail.

ydirson added 2 commits August 1, 2025 17:07
Will help readability and improve typechecking.

Signed-off-by: Yann Dirson <[email protected]>
@ydirson
Copy link
Contributor Author

ydirson commented Aug 1, 2025

... and rebased

@ydirson ydirson requested a review from Nambrok August 1, 2025 15:08
@stormi stormi removed their request for review August 8, 2025 13:04
@stormi
Copy link
Member

stormi commented Aug 8, 2025

I leave it to the rest of you to review, by lack of time

ydirson added 17 commits August 19, 2025 16:34
Those info typically don't change over the course of a test session, but
Host.rescan_block_devices_info() is provided for when we'll have more
sophisticated test cases.

We use a custom TypedDict to be able to use this type more largely in
upcoming commits.

Reimplements disks() and available_disks() on top of this.

Signed-off-by: Yann Dirson <[email protected]>
This method was not used since 7435c99
(largeblock: add tests largeblock driver + fixture), this repurposes it
for a new life.

Signed-off-by: Yann Dirson <[email protected]>
Fills the role of --sr_disk, except that:
- it is per-host
- the default is "all disks", use empty device list to forbid all disks
  on a host

Signed-off-by: Yann Dirson <[email protected]>
For some reason pyright would not consider the type known otherwise.

Also use unquoted type hint, since we have deferred annotations and we
use it that way in all other places.

Signed-off-by: Yann Dirson <[email protected]>
Only used for everything ZFS

Signed-off-by: Yann Dirson <[email protected]>
Signed-off-by: Yann Dirson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants