-
Notifications
You must be signed in to change notification settings - Fork 127
Add TIFF Key-Value Store Implementation (take 2) #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is great --- I haven't had a chance to review your implementation in detail, but at a high level this is exactly the right implementation strategy. Do you have a reason to use this tiff kvstore driver on its own? Otherwise I'd be inclined to eliminate the JSON registration for it, such that it can only be created internally (by the to-be-written Tiff TensorStore driver). |
Regarding logging: what have you tried? Are you testing with bazel? With bazel you can do |
Thanks for that feedback. No, I don't think there is any reason to use the tiff kvstore on its own. I can eliminate the JSON registration.
Yes, I'm using bazel though I'm not terribly familiar with it. That works! I had previously tried setting the environment variable as that worked for non-subsystem logging in tests, but it wasn't working for conditional logging. Thank you. I'm going to start chipping away at the tiff driver as I await detailed feedback on this kvstore. |
… Working on crashes.
Hello again. I've managed to develop a driver per your suggestions and I believe I have something that's a pretty solid foundation with a reasonable amount of support. There are some outstanding items + polish but I think it makes sense for me to pause here for feedback in case major changes are required and to understand the likelihood of being able to get this merged first. A few notes on the driver implementation:
I think that's it for now. Looking forward to any feedback! |
Optional. Must match the length of `dimensions`. Required if `dimensions` | ||
has more than one entry. If only one dimension is specified, `ifd_count` can | ||
be used instead. If both are specified, their product must match `ifd_count`. | ||
ifd_count: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be specified --- can't we determine the number of IFDs from the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it doesn't. I initially was thinking about wanting to use only a subset of IFDs, but then really ifd_count
would be one of many parameters that a user would need to specify in order to provide them with the kind of flexibility to do that properly. I will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I double checked and I see now that IFDs may be ignored in OME XML so I will I need to add full specification functionality rather than remove this. https://docs.openmicroscopy.org/ome-model/5.6.3/ome-tiff/specification.html
description: | | ||
Optional. Specifies the total number of IFDs involved in the stack. | ||
Required if `dimension_sizes` is not specified for a single stack dimension. | ||
If specified along with `dimension_sizes`, their product must match `ifd_count`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dimensions
, dimension_sizes
and ifd_sequence_order
can be removed as options. Instead, there should be a single unnamed dimension as the "stack" dimension. Then a separate reshape
adapter TensorStore driver (not yet implemented, but on the todo list can be responsible for doing the equivalent of those options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Instead of an unlabeled dimension, the stack dimension could also be assigned some default label like stack
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue with that, if I understand enough about how tensorstore works, is that this will become an problem when we are pulling that metadata from the file itself (e.g. OME-TIFF), versus having the user specify that information (which they may not know since it's embedded in the image) using a reshape
adapter.
The reason I added these in here was to build out the necessary functionality and options such that I can later on just read the XML data from the appropriate TIFF tag and assign it to existing parameters. I did not add that in yet because I wanted to get the base, non-OME-specific, TIFF reader cleared first. Then adding in the XML parsing and populating these fields becomes a trivial exercise.
If you feel differently, or would still prefer that I just assign a single stacking dimension, I can do that. But (a) I think that means a longer runway for OME-TIFF support and (b) I don't know then how the XML metadata would work its way up to the reshape adapter.
Please advise.
the last dimension varying fastest. | ||
required: | ||
- dimensions | ||
sample_dimension_label: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary because the dimension labels can instead be overridden via the index transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added this is because OME XML has dimension labels in its metadata (XYZCT). I referenced the n5 driver a lot in my development and I saw that it has an axes
member for dimension labels in N5Metadata. As with my previous comment, right now, they are manually populated, but when the OME XML is parsed, I will populate this.
I realize I am in some ways optimizing for OME XML while trying to make a generic solution for TIFF images. The way I had envisioned it was that these are generic options for anyone to use to craft some interpretation of their TIFF file, though I can see how it can look a bit odd. Also happy to follow your guidance on this. Just explaining my rationale.
allOf: | ||
- type: object | ||
properties: | ||
dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be excluded since the data type can be specified generically via the dtype
and schema
properties common to all drivers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the metadata property (which is just dtype and shape below), I think I ultimately got confused. I was trying to square three things: (a) the concept of metadata constraints that I saw in n5, (b) come up with a generic TIFF solution, and (c) preempt support for OME-XML. The properties that I put inside the tiff
object - if we set aside OME XML - are technically not constraints. They specify different ways of interpreting the same underlying data (e.g. whether IFDs are stacked as a dimension or treated as separate images is in the mind of the user, not something the TIFF file is stipulating). It's only with OME-XML that it becomes true metadata.
I used the metadata object for properties that are not subject to interpretation, like the dtype, and shape (at least partially). I realize this is quite messy. I could perhaps move the tiff
object into metadata
, and then it functions the same way it does with n5: all members are optional, and they are populated if found (e.g. OME XML). But it will also serve one additional function: which is to specify the interpretation of the underlying data if no OME metadata is present. This is probably cleaner from a schema perspective, but it does mix up two distinct responsibilities.
Let me know what you prefer.
$ref: DataType | ||
title: Data type constraint. | ||
description: Constrains the expected data type of the TIFF dataset. | ||
shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can likewise be excluded since it can be specified via the schema
.
// This registry maps string IDs (like "lzw", "deflate") to factories/binders | ||
// capable of creating JsonSpecifiedCompressor instances. | ||
internal::JsonSpecifiedCompressor::Registry& GetTiffCompressorRegistry(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unlikely that we need to register tiff compressor to start with, also the tiff tag value could map directly to a compressor instance (via an enum, or similar).
}; | ||
|
||
// Common compression types | ||
enum class CompressionType : uint16_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably find the hex values easier to track back to the spec.
|
||
namespace tensorstore { | ||
namespace internal_tiff_kvstore { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a reference to the spec sections here as comments.
This PR adds a new (read only) key-value store driver for accessing tiles and strips within TIFF image files. This is my second attempt after finding the time to work on this given previous feedback on #138. I drew heavily upon the zip kvstore implementation and your prior feedback (thanks!), so I hope that things are more faithful to Tensorstore patterns.
The basic idea is this kvstore sits on top of a base kvstore and maps IFDs/tiles/stripes to the following key format:
tile/<id>/<row>/<col>
.My plan is to get feedback on whether this something that is suitable to be merged. If I receive a positive response, I work on any feedback I receive in parallel with building out the tiff driver that will pair with this kvstore.
Looking forward to your feedback on this implementation!
P.S. I couldn't for the life of me get subsystem logging to work (aka
ABSL_LOG_IF
). Would appreciate some advice on how to get that working.