-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Droid upgrade #387
Comments
Thanks, @pwinckles . Your approach to '1.' sounds good: Make a new "FITS zip tool". Regarding '2.', would you mind asking in Slack (c4l#fits-community) and on the list (https://groups.google.com/g/fits-users)? I will check locally as well. |
Hi @pwinckles, About the Droid recursion issue, we have no current use for file metadata below the first level. I'm concerned that the increased processing time for recursion would be an issue for some of our partners. I'm also wary of having a forked/hacked version of Droid in there. If there is a way to cleanly limit recursion to 1 level, or to make the depth configurable, I would vote for that. If not, it would be helpful to understand the performance implications of recursion on our typical use cases before deciding next steps. |
Thanks for the input. I still need to investigate options for limiting Droid's recursion. Note that with these changes, FITS would now be recursing into the following archive formats, which are all of the formats that Droid supports. I know that you'd expressed interest in having more formats supported in the past. Let me know if you are good with this list, or if you'd like any of them disabled.
Additionally, which of those formats would you like supported with the new tool that will be computing |
We need to support: zip, gz, and 7z |
Okay, there's some updated WIP code in the branch that limits the recursion to a depth of 1, and also adds back New questions:
|
The one test that isn't passing as is is for https://github.com/harvard-lts/fits/blob/main/testfiles/input/aliceDynamic_images_metadata_tableOfContents.epub Droid doesn't consider that file to be an epub because it's missing a root Which raises the question of whether or not there is interest in integrating epubcheck into FITS? If so, we could create a separate ticket for that. |
I created a draft PR that has what I consider to be final a version of the code, just pending your input on #387 (comment) |
@awoods @cvicary Droid 6.7.0 has been released now, and I updated the PR. The PR is still pending answers to these questions:
I can add made up gz and 7z test files if you'd prefer, but I thought it might be nice if they were representative of what you were interested in processing. |
@awoods @cvicary: Pinging this again in the new year. I understand if this is not a priority, just checking in. I have a draft PR up with the droid upgrade (#388). It would be great to include sample gz and 7z test files that demonstrate it working on your archives. However, I could also create some example archives, if that is not feasible. |
@pwinckles : Thanks for resurfacing this issue and associated PR. I will work with @cvicary to dig up example/test files for gz and 7z. For question 2, let's stick with only emitting Thanks again. |
@awoods Thanks! See the current output for those files here: https://gist.github.com/pwinckles/6b76bee3d37b9142081781ffcfa35c3e Points of interest:
|
@awoods
I started working on upgrading to the latest version of Droid. There have been some changes in Droid that require changing the FITS integration, which is on the hacky side of things because Droid does not expose an easy to use API that's ideally suited for what FITS wants to do.
I have a working, work in progress, branch that you can look at here: https://github.com/pwinckles/fits/tree/droid-6.7
In that branch, I change FITS to use the primary Droid entry point,
SubmissionGateway
, and use most of Droid's standard configuration. DroidWrapperFactory may look like its doing a ton of customization, but I sourced almost all of that from Droid's Spring xml. DroidWrapper is what does the analysis. With this setup, FITS would now output info on all of the archive types that Droid supports, and not just zips.However, that branch does not currently replicate the existing FITS behavior entirely, and I want to know to what extent it needs to.
originalSize
andcompressionMethod
elements. This is because this data isn't really from Droid. Instead, it's being computed by FITS inZipArchiveContentIdentifier
.multiple-file-types-and-folders.zip
seems to suggest that FITS considers the behavior of only going one level deep to be the desirable behavior.For the
originalSize
andcompressionMethod
, I assume that you don't want these values to disappear, which is fair. However, I would rather not hackup Droid again to get them back in, as they really have absolutely nothing to do with Droid. My proposal is to instead make a new "FITS zip tool" that runs on zip files and computes that information.For the recursion, is there a hard requirement on it being restricted to a depth of 1? If so, I can try to figure out a way to make this happen, but it'll be a little tricky so I'm not going to unless it's required.
Thoughts?
The text was updated successfully, but these errors were encountered: