Skip to content

Fix 2 bugs in non-raw send with encryption #17340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 19, 2025

Conversation

gamanakis
Copy link
Contributor

@gamanakis gamanakis commented May 16, 2025

Motivation and Context

Closes #12014

Description

Bisecting identified the redacted send/receive as the source of the bug
for issue #12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
arc_untransform+0x96/0xb0 [zfs]
dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
dbuf_read+0x56/0x770 [zfs]
dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
zap_lockdir+0x87/0xf0 [zfs]
zap_lookup_norm+0x5c/0xd0 [zfs]
zap_lookup+0x16/0x20 [zfs]
zfs_get_zplprop+0x8d/0x1d0 [zfs]
setup_featureflags+0x267/0x2e0 [zfs]
dmu_send_impl+0xe7/0xcb0 [zfs]
dmu_send_obj+0x265/0x360 [zfs]
zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

How Has This Been Tested?

Manually running the scripts https://github.com/HankB/provoke_ZFS_corruption

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label May 16, 2025
@gamanakis
Copy link
Contributor Author

gamanakis commented May 16, 2025

How can I acknowledge @HankB and @pcd1193182 for the significant effort they put into this?

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Though after searching around this area for couple hours I still have no idea why it helps with the corruptions. I think it is only a trigger, but for what?

@tonyhutter
Copy link
Contributor

How can I acknowledge @HankB and @pcd1193182 for the significant effort they put into this?

Contributions-by: would be my recommendation. I see some of those in the git logs.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixes here make sense. But I agree, it's not at all clear to me how this would lead to a corruption. Was the final reproducer for this something which could be reasonably adapted and included in the test suite?

@ryao
Copy link
Contributor

ryao commented May 17, 2025

The fixes here make sense. But I agree, it's not at all clear to me how this would lead to a corruption. Was the final reproducer for this something which could be reasonably adapted and included in the test suite?

I have a WIP CodeQL check for instances of this problem in my git repository:

https://github.com/ryao/zfs/tree/issue-12014

I am waiting to see if it works. If it does, we should think about adding checks for other functions that could be accidentally mismatched.

@ryao
Copy link
Contributor

ryao commented May 17, 2025

ryao/zfs@7c921e3 works and detected the two issues fixed in this patch. The current iteration is limited to cases where the mismatched hold and release functions are called from the same function. I might try to handle more cases before I open a PR with the check.

@ryao
Copy link
Contributor

ryao commented May 17, 2025

The current iteration is limited to cases where the mismatched hold and release functions are called from the same function. I might try to handle more cases before I open a PR with the check.

I just did a manual search to see if this is even necessary. It turns out that it is. dmu_objset_hold_flags() calls dsl_dataset_hold_flags(), which calls dsl_dataset_hold_obj_flags(). The returned dsl_dataset_t pointer is then passed to dsl_dataset_rele(), which is wrong, on error from dmu_objset_from_ds() in dmu_objset_hold_flags(). If I expand the power of the check to catch mismatched hold and release functions across function calls, the check should be able to identify this and possibly other bugs.

@gamanakis
Copy link
Contributor Author

@ryao thank you for taking this to the next level.
@behlendorf I think expanding the ZTS would be great, but the scripts take about 2-3h to reproduce this.

@gamanakis
Copy link
Contributor Author

gamanakis commented May 17, 2025

I believe the crucial issue in #12014 is the failure to clear key mappings, which is not currently handled in the code base. My suspicion is that the apparent corruption in ARC stems from a scenario where an object (snapshot) is freed, but its key mapping persists. In the current code base this happens for both fromds and to_ds. This suggests the stale key mappings are inadvertently reused.

In this context, it would likely be safe to use dsl_dataset_hold_obj_flags(&fromds) as long as we call dsl_dataset_rele_flags(&fromds) afterwards.

@HankB
Copy link

HankB commented May 18, 2025

Was the final reproducer for this something which could be reasonably adapted and included in the test suite?

I'll comment because I wrote the reproducer. (Unfortunately I remain blissfully ignorant of the test suite and how it is run.)

I produced a series of scripts that:

  • Populate a pool with compressible and random files in multiple nested datasets.
  • Modify files, create and delete snapshots and send the encrypted pool to an unencrypted pool. (IOW, thrash on the pool.)

The setup of the tests is manual (copying and pasting commands from previous test notes.) This could all be scripted, I just didn't do that.

On my H/W - ten year old server motherboard, two 500GB SATA SSDs - it takes about 2 hours typically to reproduce corruption. One problem with automating it is that the results are indeterminate. It's "whale on it to see if it breaks." If it typically breaks at 2 hours, how long insures with reasonable confidence that the problem does not exist? Some earlier testing took on the order of a day to provoke corruption but I was able to bring that down by increasing the activity on the pool. It's conceivable that other changes to the stack could result in reducing the chance of corruption, delaying onset past the typical two hours.

Perhaps the next thing would be to fully automate the setup and bring the existing scripts up to "production quality." Others have used the scripts and been kind not comment on the code quality (beyond submitting PRs for bugs that I had overlooked.) Whether the scripts could be included in a test suite is left to those more familiar with the test suite.

@gamanakis
Copy link
Contributor Author

62f125b: updated the commit message

@AndrewJDR
Copy link

I’m just an observer here, but i’m curious: Is the context and timing of the uncleared key mapping reusage understood well enough to design a synthetic test for the crash?

@owlshrimp
Copy link

I’m just an observer here, but i’m curious: Is the context and timing of the uncleared key mapping reusage understood well enough to design a synthetic test for the crash?

Or alternatively, can the theory be tested with some printf's or the like?

Bisecting identified the redacted send/receive as the source of the bug
for issue openzfs#12014. Specifically the call to
dsl_dataset_hold_obj(&fromds) has been replaced by
dsl_dataset_hold_obj_flags() which passes a DECRYPT flag and creates
a key mapping. A subsequent dsl_dataset_rele_flag(&fromds) is missing
and the key mapping is not cleared. This may be inadvertedly used, which
results in arc_untransform failing with ECKSUM in:
 arc_untransform+0x96/0xb0 [zfs]
 dbuf_read_verify_dnode_crypt+0x196/0x350 [zfs]
 dbuf_read+0x56/0x770 [zfs]
 dmu_buf_hold_by_dnode+0x4a/0x80 [zfs]
 zap_lockdir+0x87/0xf0 [zfs]
 zap_lookup_norm+0x5c/0xd0 [zfs]
 zap_lookup+0x16/0x20 [zfs]
 zfs_get_zplprop+0x8d/0x1d0 [zfs]
 setup_featureflags+0x267/0x2e0 [zfs]
 dmu_send_impl+0xe7/0xcb0 [zfs]
 dmu_send_obj+0x265/0x360 [zfs]
 zfs_ioc_send+0x10c/0x280 [zfs]

Fix this by restoring the call to dsl_dataset_hold_obj().

The same applies for to_ds: here replace dsl_dataset_rele(&to_ds) with
dsl_dataset_rele_flags().

Both leaked key mappings will cause a panic when exporting the
sending pool or unloading the zfs module after a non-raw send from
an encrypted filesystem.

Contributions-by: Hank Barta <[email protected]>
Contributions-by: Paul Dagnelie <[email protected]>
Signed-off-by: George Amanakis <[email protected]>
@gamanakis
Copy link
Contributor Author

40567d0: updated the commit message, it's not really a panic but arc_untransform failing with ECKSUM.

The theory that the stale key mapping is reused is the only possible explanation I see. I have been trying to see if this is true but it's tricky with cmn_err(). In the current codebase all key mappings are leaked.

@amotin amotin marked this pull request as ready for review May 19, 2025 13:59
@amotin amotin added the Status: Accepted Ready to integrate (reviewed, tested) label May 19, 2025
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review Status: Accepted Ready to integrate (reviewed, tested) labels May 19, 2025
@amotin amotin added the Status: Accepted Ready to integrate (reviewed, tested) label May 19, 2025
@amotin amotin removed the Status: Code Review Needed Ready for review and testing label May 19, 2025
@behlendorf
Copy link
Contributor

@HankB thanks for the clarification on the reproducer. I figured it wouldn't be easy to add to the CI or George would have done it in this PR already, but I still wanted to ask!

@ryao the codeQL check would be a nice addition to help us catch future issues, even if it is limited to within the scope of a single calling function (for now). Plus having a working example should make it pretty easy to extend to other similar interfaces.

@behlendorf behlendorf merged commit ea74cde into openzfs:master May 19, 2025
28 checks passed
ryao added a commit to ryao/zfs that referenced this pull request May 19, 2025
This was caught when doing a manual check to see if openzfs#17352 needed to be
improved to catch mismatches across stack frames of the kind that were
first found in openzfs#17340.

Signed-off-by: Richard Yao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ZFS corruption related to snapshots post-2.0.x upgrade
9 participants