Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent folder of files to package is now included in tar #535

Open
kieranjol opened this issue Aug 15, 2022 · 5 comments
Open

Parent folder of files to package is now included in tar #535

kieranjol opened this issue Aug 15, 2022 · 5 comments

Comments

@kieranjol
Copy link

I'm using 2.0.21 on Win10.

I have two inputs - a single file and a folder which contains three files.

C:\Users\NLI User\Documents\complex_objects_workshop.docx
C:\Users\NLI User\Downloads\strongbox_fixity_fail

I then make a tarred bag, but I see that in the TAR, the 'Documents' and 'Downloads' folders are also present even though my inputs were children of these folders. This issue does not occur if I just make a regular bag that is not tarred.


tar -tvf "C:\Users\NLI User\Documents\DART\6tyyryt.tar"
drwxr-xr-x  0 0      0           0 Aug 15 16:20 6tyyryt/
drwxr-xr-x  0 0      0           0 Aug 15 16:20 6tyyryt/data/
drwxr-xr-x  0 0      0           0 Aug 15 16:20 6tyyryt/data/Documents/
-rw-r--r--  0 0      0       36872 Feb 03  2021 6tyyryt/data/Documents/complex_objects_workshop.docx
drwxr-xr-x  0 0      0           0 Aug 15 16:20 6tyyryt/data/Downloads/
drw-rw-rw-  0 0      0           0 Aug 08 13:32 6tyyryt/data/Downloads/strongbox_fixity_fail/
-rw-r--r--  0 0      0        2488 Aug 08 12:54 6tyyryt/data/Downloads/strongbox_fixity_fail/18b3d60c-fc8f-41c0-8ad7-fed85b9b121f_manifest.md5
-rw-r--r--  0 0      0    57970553 Aug 08 10:27 6tyyryt/data/Downloads/strongbox_fixity_fail/file_report.csv
-rw-r--r--  0 0      0        2624 Aug 24  2021 6tyyryt/data/Downloads/strongbox_fixity_fail/WCL088763.xml
-rw-r--r--  0 0      0         501 Aug 15 16:20 6tyyryt/bag-info.txt
-rw-r--r--  0 0      0          55 Aug 15 16:20 6tyyryt/bagit.txt
-rw-r--r--  0 0      0         752 Aug 15 16:20 6tyyryt/manifest-sha512.txt
-rw-r--r--  0 0      0         430 Aug 15 16:20 6tyyryt/tagmanifest-sha512.txt
@diamondap
Copy link
Member

That's a known issue. DART tries to trim off the common prefix of all the folders you bag. For example, DART will trim /users/joe/documents if you're only bagging these directories:

/users/joe/documents/photos
/users/joe/documents/audio
/users/joe/documents/video

But if you add one more directory, like /users/mary/documents/audio, then the only common prefix DART can find is /users/. The bag will contain directories like /data/joe/documents/audio and data/mary/documents/audio.

I think tar and zip have similar behavior. I'm not sure how thoroughly this behavior is documented for DART, but path trimming was widely requested after DART's initial release.

@kieranjol
Copy link
Author

Thank you for that great explanation, just confirmed it there by having a single folder input and all was well. Is this fixable or is it too baked into how tarring works?

@diamondap
Copy link
Member

I don't see any easy fix for this. When DART wasn't trimming paths, everyone complained. So the feature is in there by popular demand.

If I turn it off, then every item in the payload directory has a path like /data/<full absolute path>. Those get very long.

With path trimming on, there's no safe way to guess how to trim paths that have no common prefix.

@kieranjol
Copy link
Author

I get you, I looked into it a bit more and can see why this is so awkward. I'm not that familiar with more complex tar use cases like this.
Is some version of the -C option in the regular tar command line some sort of solution here? https://www.gnu.org/software/tar/manual/html_node/directory.html

@diamondap
Copy link
Member

I would be reluctant to implement something like that, because I don't have much time and because it could make DART more confusing / harder to use. Part of its appeal is its simplicity and easy learning curve.

In case you're interested, when you get to the "Review and Run" screen, DART shows the part of the file paths that will be trimmed off in grey, while the rest of the path is in black. In the screenshot below, /Users/diamond/Downloads/ will be trimmed.

Screen Shot 2022-08-15 at 12 49 37 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants