Skip to content

btrfs-progs: check that device byte values in superblock match those in chunk root #991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: devel
Choose a base branch
from

Conversation

maharmstone
Copy link
Contributor

The superblock of each device contains a copy of the corresponding struct btrfs_dev_item that lives in the chunk root.

Add a check that the total_bytes and bytes_used values of these two copies match.

@adam900710
Copy link
Collaborator

adam900710 commented May 29, 2025

The CI failure is caused by an old image which is created by some older kernel.

I have updated the image with this PR: #992

Which should solve the test failure for fsck/020.

But there are still some more problems need to be solved:

  • fsck/047
    This requires proper repair support, your enhanced check didn't re-read the updated superblock after repair, thus failling that test case.
    This need some updates on your patch.

  • fsck/057
    The check is not handling seed device correctly.
    The sprout fs can modify the old chunks on the seed device, e.g. remove the old empty SYSTEM chunk.
    But since the seed device is completely read-only, the device item can not be updated, thus causing check error.

    The patch should not check a seed device against the sprouted fs.

@maharmstone
Copy link
Contributor Author

Cheers Qu, I'll have a look

intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 29, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

 # cd
 # dd if=/dev/zero of=test bs=4096 count=2621440
 # mkfs.btrfs test
 # mount test /root/temp

 # cd /root/temp
 # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
 # sync
 # rm *
 # sync
 # btrfs balance start -dusage=0 .
 # sync

 # cd
 # umount /root/temp
 # btrfs check test

(For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.)

Change btrfs_remove_dev_extents() so that it adds the devices to the
post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Signed-off-by: Mark Harmstone <[email protected]>
@kdave kdave force-pushed the devel branch 6 times, most recently from ef43ce6 to 3eff852 Compare May 30, 2025 14:14
kdave pushed a commit to btrfs/linux that referenced this pull request Jun 4, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
adam900710 and others added 16 commits June 6, 2025 08:24
This new option allows end users to specify certain per-inode flags for
specified file/directory inside rootdir.

And mkfs will follow the kernel behavior by inheriting the inode flag
from the parent.

For example:

 rootdir
 |- file1
 |- file2
 |- dir1/
 |  |- file3
 |- subv/     << will be created as a subvolume using --subvol option
    |- dir2/
    |  |- file4
    |- file5

When `mkfs.btrfs --rootdir rootdir --subvol subv --inode-flags
nodatacow:dir1 --inode-flags nodatacow:subv", then the following files
and directory will have *nodatacow* flag set:

- dir1
- file3
- subv
- dir2
- file4
- file5

For now only two flags are supported:

- nodatacow
  Disable data COW, implies *nodatasum* for regular files

- nodatasum
  Disable data checksum only.

This also works with --compress option, and files with nodatasum or
nodatacow flag will skip compression.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The simple test will create a layout like the following:

rootdir
|- file1
|- file2
|- subv/		<< Regular subvolume
|  |- file3
|- nocow_subv/		<< NODATACOW subvolume
|  |- file4
|- nocow_dir/		<< NODATACOW directory
|  |- dir2
|  |  |- file5
|  |- file6
|- nocow_file1		<< NODATACOW file

Any files under NODATACOW subvolume/directory should also be NODATACOW.
The explicitly specified single file should also be NODATACOW.

Issue: kdave#984
Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Create a second data block-group to be used for relocation, in case a
zoned filesystem in created.

This second data block-group will then be picked up by the kernel as the
default data relocation block-group on mount.

This ensures we always have a target to relocate good data to when we
need to do garbage collection.

Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
If in btrfs_check_super() we find that the superblock has a csum
mismatch, print the wanted and found values, just as we do for metadata
in __csum_tree_block_size().

When hex-editing a btrfs image, it's useful to use btrfs check to
calculate what the new csum should be. Unfortunately at present this
only works for trees and not for the superblock, meaning you have to use
the much more wordy `btrfs inspect-internal`.

Pull-request: kdave#985
Signed-off-by: Mark Harmstone <[email protected]>
Signed-off-by: David Sterba <[email protected]>
This feature is provided by commit of kernel fc5c0c58258748 ("btrfs:
defrag: extend ioctl to accept compression levels") which is not
included in 6.14 but 6.15.

[skip ci]

Pull-request: kdave#983
Signed-off-by: David Sterba <[email protected]>
Block group tree requires no-holes and free-space-tree features, add
such check just like mkfs.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The bytenr sequence of all roots are controlled by our code, so if
something went wrong with the sequence, it's a bug.

A UASSERT() is more suitable for this case.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
The function requires parameters @slot and @itemoff to record where the
next item should land.

But this is overkilled, as after inserting an item, the temporary extent
buffer will have its header nritems and the item pointer updated.  We
can use that header nritems and item pointer to get where the next item
should land.

This removes the external counter to record @slot and @itemoff.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…_chunk_item()

These functions require parameters @slot and @itemoff to record where the
next item should land.

But this is overkilled, as after inserting an item, the temporary extent
buffer will have its header nritems and the item pointer updated.

We can use that header nritems and item pointer to get where the next
item should land.

This removes the external counter to record @slot and @itemoff.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
This function requires parameters @slot and @itemoff to record where the
next item should land.

But this is overkilled, as after inserting an item, the temporary extent
buffer will have its header nritems and the item pointer updated.

We can use that header nritems and item pointer to get where the next
item should land.

This removes the external counter to record @slot and @itemoff.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…emp_block_group()

These functions require parameters @slot and @itemoff to record where the
next item should land.

But this is overkilled, as after inserting an item, the temporary extent
buffer will have its header nritems and the item pointer updated.

We can use that header nritems and item pointer to get where the next
item should land.

This removes the external counter to record @slot and @itemoff.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…tree()

Both fs and csum trees are empty at make_convert_btrfs(), no need to use
two different functions to do that.

Merge them into a common setup_temp_empty_tree() instead.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Previously there were some problems related to btrfs-convert bgt support,
that it doesn't work at all, caused by the following reasons:

- We never update the super block with extra compat ro flags
  Even if we set "-O bgt" flags, it will not set the compat ro flags,
  and everything just go non-bgt routine.

  Meanwhile other compat ro flags are for free-space-tree, and
  free-space-tree is rebuilt after the full convert is done.
  Thus this bug won't cause any problem for fst features, but only
  affecting bgt so far.

- No extra handling to create block group tree

Fix above problems by:

- Set the proper compat RO flag for the temporary super block
  We should only set the compat RO flags except the two FST related
  bits.  As FST is handled after conversion, we should not set the flag
  at that timing.

- Add block group tree root item and its backrefs
  So the initial temporary fs will have a proper block group tree.

  The only tricky part is for the extent tree population, where we have
  to put all block group items into the block group tree other than the
  extent tree.

With these two points addressed, now block group tree can be properly
enabled for btrfs-convert.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Previously "btrfs-convert -O bgt" would not cause any error, but the
resulting fs has no block-group-tree feature at all, making it no
different than "btrfs-convert -O ^bgt".

This is a big bug that was never caught by our existing convert runs.
001-ext2-basic and 003-ext4-basic all tested bgt feature, but don't
really check if the resulting fs really have bgt flags set.

To fix that add a new test case, which will do the regular bgt convert,
but at the end also do a super block dump and verify the
BLOCK_GROUP_TREE flag is properly set.

Signed-off-by: Qu Wenruo <[email protected]>
Signed-off-by: David Sterba <[email protected]>
…in chunk root

The superblock of each device contains a copy of the corresponding
struct btrfs_dev_item that lives in the chunk root.

Add a check that the total_bytes and bytes_used values of these two
copies match.

Signed-off-by: Mark Harmstone <[email protected]>
@maharmstone maharmstone force-pushed the check-superblock-devitem branch from 0bdf0aa to 1164c01 Compare June 6, 2025 14:19
@maharmstone
Copy link
Contributor Author

maharmstone commented Jun 6, 2025

But there are still some more problems need to be solved:

  • fsck/047
    This requires proper repair support, your enhanced check didn't re-read the updated superblock after repair, thus failling that test case.
    This need some updates on your patch.

This wasn't quite right. The problem was that the dev_rec->byte_used value wasn't getting updated in check_device_used(). I've force-pushed the patch with this change squashed in.

  • fsck/057
    The check is not handling seed device correctly.
    The sprout fs can modify the old chunks on the seed device, e.g. remove the old empty SYSTEM chunk.
    But since the seed device is completely read-only, the device item can not be updated, thus causing check error.
    The patch should not check a seed device against the sprouted fs.

Unfortunately this wasn't right either. btrfs-check only works on one device at a time, so if the superblock is readonly on one device the metadata on it will be too.
The problem was that this test does a mount, and it was the kernel causing the corruption. Cherry-picking 9516bae0d79045004f0b64b1f852d177cacee094 causes the problem to go away.

kdave pushed a commit to btrfs/linux that referenced this pull request Jun 6, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
@adam900710
Copy link
Collaborator

Thanks a lot for updating the analyze, my quick guess is as bad as usual.

We can merge the series when the upstream kernel got the fix.
Although the CI may be problematic for a while until the CI kernel got the fix backported.

@maharmstone
Copy link
Contributor Author

No worries, thanks Qu

kdave pushed a commit to btrfs/linux that referenced this pull request Jun 9, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
kdave pushed a commit to btrfs/linux that referenced this pull request Jun 18, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
kdave pushed a commit to kdave/btrfs-devel that referenced this pull request Jun 19, 2025
Each superblock contains a copy of the device item for that device. In a
transaction which drops a chunk but doesn't create any new ones, we were
correctly updating the device item in the chunk tree but not copying
over the new bytes_used value to the superblock.

This can be seen by doing the following:

  # dd if=/dev/zero of=test bs=4096 count=2621440
  # mkfs.btrfs test
  # mount test /root/temp

  # cd /root/temp
  # for i in {00..10}; do dd if=/dev/zero of=$i bs=4096 count=32768; done
  # sync
  # rm *
  # sync
  # btrfs balance start -dusage=0 .
  # sync

  # cd
  # umount /root/temp
  # btrfs check test

For btrfs-check to detect this, you will also need my patch at
kdave/btrfs-progs#991.

Change btrfs_remove_dev_extents() so that it adds the devices to the
fs_info->post_commit_list if they're not there already. This causes
btrfs_commit_device_sizes() to be called, which updates the bytes_used
value in the superblock.

Fixes: bbbf724 ("btrfs: combine device update operations during transaction commit")
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Mark Harmstone <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
@kdave kdave force-pushed the devel branch 4 times, most recently from dafafca to 5d47f58 Compare June 20, 2025 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants