Add TIFF Key-Value Store Implementation (take 2) #230

hsidky · 2025-04-17T02:21:47Z

This PR adds a new (read only) key-value store driver for accessing tiles and strips within TIFF image files. This is my second attempt after finding the time to work on this given previous feedback on #138. I drew heavily upon the zip kvstore implementation and your prior feedback (thanks!), so I hope that things are more faithful to Tensorstore patterns.

The basic idea is this kvstore sits on top of a base kvstore and maps IFDs/tiles/stripes to the following key format: tile/<id>/<row>/<col>.

My plan is to get feedback on whether this something that is suitable to be merged. If I receive a positive response, I work on any feedback I receive in parallel with building out the tiff driver that will pair with this kvstore.

Looking forward to your feedback on this implementation!

P.S. I couldn't for the life of me get subsystem logging to work (aka ABSL_LOG_IF). Would appreciate some advice on how to get that working.

…f IFD.

jbms · 2025-04-18T18:54:33Z

This is great --- I haven't had a chance to review your implementation in detail, but at a high level this is exactly the right implementation strategy.

Do you have a reason to use this tiff kvstore driver on its own? Otherwise I'd be inclined to eliminate the JSON registration for it, such that it can only be created internally (by the to-be-written Tiff TensorStore driver).

jbms · 2025-04-18T18:58:27Z

Regarding logging: what have you tried? Are you testing with bazel? With bazel you can do --test_env=TENSORSTORE_VERBOSE_LOGGING=tiff

hsidky · 2025-04-19T12:51:12Z

This is great --- I haven't had a chance to review your implementation in detail, but at a high level this is exactly the right implementation strategy.

Do you have a reason to use this tiff kvstore driver on its own? Otherwise I'd be inclined to eliminate the JSON registration for it, such that it can only be created internally (by the to-be-written Tiff TensorStore driver).

Thanks for that feedback. No, I don't think there is any reason to use the tiff kvstore on its own. I can eliminate the JSON registration.

Regarding logging: what have you tried? Are you testing with bazel? With bazel you can do --test_env=TENSORSTORE_VERBOSE_LOGGING=tiff

Yes, I'm using bazel though I'm not terribly familiar with it. That works! I had previously tried setting the environment variable as that worked for non-subsystem logging in tests, but it wasn't working for conditional logging. Thank you.

I'm going to start chipping away at the tiff driver as I await detailed feedback on this kvstore.

… Working on crashes.

hsidky · 2025-05-05T23:18:38Z

Hello again. I've managed to develop a driver per your suggestions and I believe I have something that's a pretty solid foundation with a reasonable amount of support. There are some outstanding items + polish but I think it makes sense for me to pause here for feedback in case major changes are required and to understand the likelihood of being able to get this merged first.

A few notes on the driver implementation:

I changed the tiff kvstore to work off of a linear index rather than an xy grid. It made it easier to handle samplesperpixel>1 but did offload more work to getchunkstoragekey, so I did a bit of optimization there.
I haven't worked on transaction support as I haven't fully wrapped my head around the nuts and bolts of how that works yet. Is this a requirement for a merge? I saw, for example, that the single image driver does not support transactions, so I assume not.
Same as above but for storage stats.
I did not commit the test file binaries (generated by the generate.py file). I'll wait until you give me clearance.
I waffled a bit on how much to build out the metadata property. I ended up putting options under a separate tiff property as they mainly correspond to the interpretation of the data. I can revisit if you have strong opinions.

I think that's it for now. Looking forward to any feedback!

jbms · 2025-05-06T16:34:34Z