Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudo Snapshots #16749

Open
Haravikk opened this issue Nov 13, 2024 · 15 comments
Open

Pseudo Snapshots #16749

Haravikk opened this issue Nov 13, 2024 · 15 comments
Labels
Type: Feature Feature request or new feature

Comments

@Haravikk
Copy link

Haravikk commented Nov 13, 2024

Describe the feature would like to see added to OpenZFS

Currently if a dataset has a snapshot, and has not been changed since that snapshot was created, creating a new snapshot is registered as a change to that dataset, which interferes with incremental send operations.

However these snapshots don't really represent any actual changes, so it may make more sense to have them instead become a "pseudo snapshot" – essentially a bookmark that looks and behaves like a snapshot for ZFS commands.

In essence, any command given a pseudo-snapshot will either use at as a bookmark as normal (if it supports them), or swap it for the "root" snapshot that it is identical to/was created from.

When a snapshot is to be destroyed, ZFS will check to see if it has any corresponding "pseudo snapshots" – if it does, instead of destroying the snapshot it will instead be renamed to replace the first pseudo snapshot, whose internal bookmark is then removed. If there are multiple "pseudo snapshots" they will continue to be proxies/aliases of the "root" snapshot as normal.

Example

  1. Dataset contains a snapshot tank/foo@current, the dataset has not been changed since.
  2. zfs snapshot tank/foo@new is created as a "pseudo snapshot" (a bookmark referencing tank/foo@current).
  3. All operations that require a snapshot will silently swap @new for @current.
  4. zfs destroy tank/foo@current will not destroy the snapshot, instead it will be renamed to @new and the internal bookmark for @new is discarded.

At any time an incremental send can be performed to tank/foo without encountering issues, as it is still effectively unchanged throughout.

Compatability

Internally the "pseudo snapshot" should just be a bookmark with the same createtxg and guid as its "parent" snapshot, plus a means of identifying it as a pseudo snapshot, such as a naming convention or flag that is safe for older versions of ZFS to ignore.

For compatibility reasons, "pseudo snapshots" should always be checked for a "parent" snapshot – if none is found, the pseudo snapshot is treated in all respects as an ordinary bookmark (no special behaviours).

This is because to earlier versions a "pseudo snapshot" will simply appear as a regular bookmark, and the "parent" snapshot can be destroyed separately, resulting in orphaned "pseudo snapshots".

How will this feature improve OpenZFS?

It will make it possible to create recursive snapshots on a pool without having to worry about how its snapshots are normally handled, or if creating a snapshot will break incremental sends to that pool.

As a result, it should make it a lot easier to replicate pools that are currently in use by doing the following:

zfs snapshot -r tank/foo@replicate
zfs send -R tank/foo@replicate | zfs receive -F target

Without having to worry about what the existing snapshot scheme looks like, since the recursive snapshot(s) will either be true snapshots or pseudo snapshots as necessary, avoiding any disruption to consistency for sending to the same pool.

Additional Context

I'm not happy with the term "pseudo snapshot" but it's the best I could come up with. I considered "cloned snapshots" but I think that could get confusing.

@Haravikk Haravikk added the Type: Feature Feature request or new feature label Nov 13, 2024
@amotin
Copy link
Member

amotin commented Nov 13, 2024

I think now each snapshot is always created pointing at the current transaction group. So unless you create several snapshots in one command, I think they will point to different txgs and won't be identical. I wonder what is the actual difference between them other than txg number and is it avoidable? I wonder what would happen if we always point snapshots not to the current txg, but to a txg of the last change. One effect I can see is that we would not be able to tell that recursive snapshots were actually synchronous, they would look like they are not, but can't guess others. But I haven't been in that area deep lately, so just don't remember details.

@Haravikk
Copy link
Author

If it's possible to create duplicate snapshots on an unchanged dataset without the new snapshot itself appearing as a new transaction then that would be even better.

Only issue could be if any code assumes snapshots must have different txgs, such as in zfs send -I or similar? But if they don't, then it should mean no issues with compatibility?

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

Currently if a dataset has a snapshot, and has not been changed since that snapshot was created, creating a new snapshot is registered as a change to that dataset, which interferes with incremental send operations.

How does creating a snapshot affect the dataset to which the snapshot belongs to? I've never seen such a behavior as you're describing, seems like a bug.

If the problem is that when you create a snapshot, you then have take that snapshot's existence into account when doing an incremental send, that's by design, I for one wouldn't want a snapshot functionality that sometimes snapshots, but sometimes ~kinda ~doesn't.

@Haravikk
Copy link
Author

How does creating a snapshot affect the dataset to which the snapshot belongs to? I've never seen such a behavior as you're describing, seems like a bug.

Sorry, I should really have given an example. Let's say we have two datasets, zsource and ztargetzsource is periodically snapshotted and sent to ztarget as backups, however the snapshots are only taken on changed datasets (i.e- we're not using zfs snapshot -r zsource@snapshot).

Now consider that ztarget is to be migrated to a new pool – the obvious thing to do with would be:

zfs snapshot -r ztarget@copy
zfs send -Rw ztarget@copy | zfs receive -d ztarget2

(or similar)

The problem with this is that creating the recursive snapshot (@copy) will cause the sends from zsource to ztarget to start failing, as the datasets on ztarget are all considered to have been changed, even though the only thing that's new is a snapshot on datasets that otherwise haven't changed.

I guess it could be considered a bug that these are considered changes (a snapshot of a dataset that hasn't changed since its last snapshot), but it depends how exactly the snapshots are accounted for and if that can be changed. Removing the snapshots solves the problem but you can't really do that if you're in the midst of a migration that might need a second (or third etc.) send to complete it before you switchover.

While doing pretty much exactly this recently I had to write a script to convert to using zfs send -I on a per dataset basis and it was a huge pain when the above, at least IMO, should have worked fine.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

I guess it could be considered a bug that these are considered changes

This is just an user error. Not a bug. If you make a snapshot, then you have to work with it, if you've made a snapshot while nothing changed, it's still a new snapshot with a new timestamp, this behavior is valid.

@Haravikk
Copy link
Author

Haravikk commented Nov 27, 2024

Then it's not a bug then – you're the one that suggested it might be FFS.

I proposed the issue because what are essentially duplicate snapshots shouldn't affect the state of a dataset – they don't actually represent anything, but are still be useful for organisational purposes such as during replication.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

@Haravikk because you said I quote

Currently if a dataset has a snapshot, and has not been changed since that snapshot was created, creating a new snapshot is registered as a change to that dataset,

which is just not true, but if it somehow were, that'd definitely be a bug - but it isn't since creating a new snapshot is not registered as any kind of change to the dataset it belongs to, it just isn't.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

And btw, while you're swinging at others accusing them of attacking you, keep those "FFS" to yourself, could you.

@Haravikk
Copy link
Author

Haravikk commented Nov 27, 2024

which is just not true, but if it somehow were, that'd definitely be a bug - but it isn't since creating a new snapshot is not registered as any kind of change to the dataset it belongs to, it just isn't.

I literally just told you how it's true – if you create a new snapshot on a dataset, you can no longer send to it, even though the new snapshot doesn't contain any new data – zfs receive will not allow it without discarding the additional snapshot first.

If it's a bug then it should be fixed, if it's not then an alternative is useful for avoiding this problem – feel free to decided for yourself which it is, I've identified a problem and set out my proposed solution.

@snajpa
Copy link
Contributor

snajpa commented Nov 27, 2024

I literally just told you how it's true – if you create a new snapshot on a dataset, you can no longer send to it, even though the new snapshot doesn't contain any new data – zfs receive will not allow it without discarding the additional snapshot first.

User. Error. If you've made a snapshot, work with it, don't act like it's not there.

@Haravikk
Copy link
Author

User. Error. If you've made a snapshot, work with it, don't act like it's not there.

I'm not acting like it's not there, I'm pointing out that them registering as changes causes zfs receive to fail, which makes replications much more complicated if you want to replicate from the target to another pool.

Bizarrely you're accusing me of both being a liar and that this isn't true, or an idiot who's just causing problems for himself, despite the fact that I'm well aware that creating the snapshot is causing the problem as that's the basis for my feature request, so it'd be nice if you'd pick a damned lane.

Literally the entire point is to make it possible to non-disruptively create organisational snapshots with zfs snapshot -r so that replication can be easier. If it's possible to make snapshots not register as changing a dataset if they're just duplicating an existing snapshot then that would be even better, but I'm not aware if that's possible or not.

@AllKind
Copy link
Contributor

AllKind commented Nov 27, 2024

Snapshots don't alter the data of the dataset. But they are like a chain. If you take one piece out, you broke the chain.
On a corporate/busy server it's unlikely two snapshots are identical (no underlying data change), because a lot of changes happen all the time.
Different to like a home NAS or similar. I guess the design had the former in mind.
I think there just isn't code in place for send/receive to check if the referenced txg may fit to glue a broken chain back together. I guess that's what this feature request is about. (But than again, why not just send the 0 byte snapshot? - But dunno about your motivation/workflow)
Sounds doable. Just somebody gotta find the time and motivation. - The usual problem. Some coders here can be hired for money ;-)

@Haravikk
Copy link
Author

Haravikk commented Nov 27, 2024

Snapshots don't alter the data of the dataset.

I didn't say they did – I said they register as a change to the dataset for the purposes of zfs receive, which will refuse to receive a send stream even though it should actually be compatible.

I created this issue on the assumption that the reason this happens is that the snapshot itself is presumably created as its own new transaction (so the dataset has in fact changed, its data just hasn't), but bookmarks don't have that problem, they just reference a txg that exists at the time (the reference snapshot).

On a corporate/busy server it's unlikely two snapshots are identical (no underlying data change), because a lot of changes happen all the time.

The use case where I ran into the problem is on a backup pool (snapshots are sent to it) that I wanted to replicate – the problem is that doing this the easy way (creating a recursive snapshot with zfs snapshot -r) doesn't work as the recursive snapshot renders the pool unusable as a target (for the aforementioned zfs receive issue).

So the very breaking the chain you mention is why I created the issue – it breaks the chain of incremental sends to the target pool (or and its replicated copy). But really if the snapshots don't represent anything that's actually new there's no real reason it should be a problem.

Such a use-case should be stable, since the target only has snapshots (no datasets in separate use), and the goal is to wait for a moment when nothing is being received, take the recursive snapshot (since it's atomic) then send, as any activity thereafter can be replicated later. But this doesn't work if it stops zfs receive from working.

I think there just isn't code in place for send/receive to check if the referenced txg may fit to glue a broken chain back together.

The entire idea behind my feature request is that this shouldn't be necessary, because the goal isn't to "fix" duplicate snapshots (which I assumed may not even be possible, though if it is I'd be more than happy with that as a solution).

Bookmarks do not have the same problem, which is why the goal of the issue is to invisibly create bookmarks in place of snapshot duplicates, but allow them to be used anywhere a regular snapshot could be (since the bookmarks can be swapped for their reference snapshot).

I can do this using a script as a pre-processing step on zfs snapshot and commands that take snapshots, but it wouldn't be atomic in the same way that ZFS commands would be.

@AllKind
Copy link
Contributor

AllKind commented Nov 27, 2024

I must admit I misunderstood your post. Had to read it again 3 times. The example then made it clear.
But yeah, the chain concept kinda remains the same, just for a different usage.
I now understand and from a usage point I think of it as totally valid.

Maybe a possible future change in code would need to remember the current and the last change txg and then some comparison logic in the receive code to allow for the requested feature.

I wonder if you thought of some snap -r -> clone them -> promote them -> destroy the snaps - procedure on your backup pool, which should be easily script-able and should avoid the need to fiddle on the senders side, as you said in your initial post.

@whoschek
Copy link

FWIW, a tool like bzfs would help replicate the pool without requiring to take any additional snapshots (and thus avoids breaking the chain). https://github.com/whoschek/bzfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

5 participants