Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prune command #176

Open
pfernie opened this issue Aug 4, 2020 · 7 comments
Open

prune command #176

pfernie opened this issue Aug 4, 2020 · 7 comments

Comments

@pfernie
Copy link
Collaborator

pfernie commented Aug 4, 2020

I have (just) published a crate rdedup-prune which implements a prune command following the semantics of attic's prune, which I use to maintain my archives. It relies on a specific naming scheme, <prefix>-<timestamp>, for example "foo-2020-08-04-16-23-00'. It will simply ignore (not remove) names that do not match the format.

This is built on rdedup-lib, and would be easy to integrate directly into rdedup. It's fine as a stand alone tool, and I think would only benefit from directly integrating if timestamp metadata were added as part of the repository format. This would allow pruning without a specific name format. The external implementation meets my need, however.

@dpc
Copy link
Owner

dpc commented Aug 4, 2020

I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.

I need to double check what we store there already... haven't been paying attention to this project for a long while.

@pfernie
Copy link
Collaborator Author

pfernie commented Aug 4, 2020

The current naming scheme "restriction" is due to the fact I don't believe the metadata currently contains the timestamp (but maybe I overlooked it?). So, agreed, if that metadata were (or is) included, that would be the superior way to do it (although I already name my archives this way, so the restriction isn't a problem for me personally). I did intentionally avoid e.g. checking creation timestamps on the names .yml files, etc. as I didn't regard those as reliable.

@dpc
Copy link
Owner

dpc commented Aug 4, 2020

$ cat foo/0000000000000000-c39fdb79bc3faa16/name/foo.yml 
---
digest: d202d7951df2c4b711ca44b4bcc9d7b363fa4252127e058c1a910ec05b6cd038
index_level: 0

Please add any metadata you want that is missing (date mostly?), while backfilling date during deserializatoin with filesystem creation/modification date (for backward-compat), and also maybe tags while at it? https://github.com/dpc/rdedup/blob/master/lib/src/name.rs

@geek-merlin
Copy link

If i get it right, we need the naming scheme for backups (correct me if one of the assumptions is wrong):

  • rdedup does a key-value store for names
  • so every backup does need a new name (anyway)
  • parsing file content can have huge cost (think rclone mount)

So +1 for merging the prune command.

@geek-merlin
Copy link

I'm open for having such thing integrated directly, but why does it have to use a naming scheme? The creation date etc. could be stored in the name file itself, along with any tags like "weekly" "monthly" etc.

That would mean that a prune command must read each and every name file, instead of just doing a ls. Is that wanted?

@dpc
Copy link
Owner

dpc commented Jan 13, 2021

That would mean that a prune command must read each and every name file, instead of just doing a ls. Is that wanted?

Doesn't seem terrible, especially that the job of prune is to keep the number of of things limited.

@pfernie
Copy link
Collaborator Author

pfernie commented Jan 16, 2021

For my usage, reading the metadata for each name is fine (isn't costly), but it does seem some cases that might actually be undesirable. The existing behavior of the prune command relies on the naming convention, so I would be happy supporting both behaviors. So, for example, we could support flags to the prune command --metadata and --timestamp-format, which are mutually exclusive. The first would read the actual created field from the metadata, the latter would use the current scheme requiring a timestamp in the name.

Or, by default the command would consult the metadata (as I personally think this is more "reliable"), but there would be a flag timestamp-format which would override this default and use the naming convention scheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants