Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DART ignores empty folders #537

Open
kieranjol opened this issue Aug 18, 2022 · 3 comments
Open

DART ignores empty folders #537

kieranjol opened this issue Aug 18, 2022 · 3 comments

Comments

@kieranjol
Copy link

I've noticed that DART will not preserve empty folders, either in tar or in a regular bag. I've found that bagit-python and tar will retain empty folders, though of course there will be no reference to it in the manifest.
I can imagine scenarios where empty folders are ideally retained if they have some sort of meaningful name. Is this intentional, and if not, can it be fixed?

@diamondap
Copy link
Member

DART does ignore empty folders, to keep in line with the original APTrust bagging guidelines from 2014. APTrust uses a number of S3-compliant storage backends to preserve depositor data. We take the bag apart, store files individually, then reassemble the bag in the latest BagIt format for restoration. (Restoration may occur years after ingest, when the BagIt spec has changed.)

S3 can store empty files, but not empty folders. While we could accept a bag containing empty folders, we would have no way of restoring those empty folders later.

The workaround for this is to put empty .keep files in the empty folders you want to preserve. PHP and some other programming languages use this practice.

This used to be documented in our APTrust user guide, but is currently missing from the DART documentation. I'll add it.

@kieranjol
Copy link
Author

kieranjol commented Aug 31, 2022 via email

@diamondap
Copy link
Member

I've been thinking about this, and it may be better for DART to provide an option to preserve empty folders, a checkbox or something, so users can be explicit about what they want. Not preserving them was an APTrust-specific decision that doesn't necessarily serve the broader community.

It will likely be a few months before I can return to DART work since APTrust is about to move new systems into production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants