-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DART ignores empty folders #537
Comments
DART does ignore empty folders, to keep in line with the original APTrust bagging guidelines from 2014. APTrust uses a number of S3-compliant storage backends to preserve depositor data. We take the bag apart, store files individually, then reassemble the bag in the latest BagIt format for restoration. (Restoration may occur years after ingest, when the BagIt spec has changed.) S3 can store empty files, but not empty folders. While we could accept a bag containing empty folders, we would have no way of restoring those empty folders later. The workaround for this is to put empty .keep files in the empty folders you want to preserve. PHP and some other programming languages use this practice. This used to be documented in our APTrust user guide, but is currently missing from the DART documentation. I'll add it. |
Can we add a bit to the end report about the number of deleted folders?
Someone at the ARA2022 conference raised this as a potential solution.
…On Mon 22 Aug 2022 at 13:13, A. Diamond ***@***.***> wrote:
DART does ignore empty folders, to keep in line with the original APTrust
bagging guidelines from 2014. APTrust uses a number of S3-compliant storage
backends to preserve depositor data. We take the bag apart, store files
individually, then reassemble the bag in the latest BagIt format for
restoration. (Restoration may occur years after ingest, when the BagIt spec
has changed.)
S3 can store empty files, but not empty folders. While we could accept a
bag containing empty folders, we would have no way of restoring those empty
folders later.
The workaround for this is to put empty .keep files in the empty folders
you want to preserve. PHP and some other programming languages use this
practice.
This used to be documented in our APTrust user guide, but is currently
missing from the DART documentation. I'll add it.
—
Reply to this email directly, view it on GitHub
<#537 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAITFPT5562LOAKY3G4FTYDV2NVFJANCNFSM564LAB7Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I've been thinking about this, and it may be better for DART to provide an option to preserve empty folders, a checkbox or something, so users can be explicit about what they want. Not preserving them was an APTrust-specific decision that doesn't necessarily serve the broader community. It will likely be a few months before I can return to DART work since APTrust is about to move new systems into production. |
I've noticed that DART will not preserve empty folders, either in tar or in a regular bag. I've found that bagit-python and tar will retain empty folders, though of course there will be no reference to it in the manifest.
I can imagine scenarios where empty folders are ideally retained if they have some sort of meaningful name. Is this intentional, and if not, can it be fixed?
The text was updated successfully, but these errors were encountered: