`read_raw(&self, data: Vec<u8>)` should take a reference #15

flukejones · 2020-09-15T22:00:47Z

read_raw(&self, data: Vec<u8>) should be taking a reference, eg: data: &[u8]. Is there any reason not to do so here?

The text was updated successfully, but these errors were encountered:

kamadak · 2020-09-17T11:32:04Z

Exif fields are lazily parsed. Keeping a copy of unparsed data steam in the struct Exif makes it self-contained, and users can handle the struct without being bothered by the lifetime of the buffer.

flukejones · 2020-09-21T01:27:30Z

I understand that, but it would also be good to have an option to refer to a buffer also. Maybe both can be done?

1e100 · 2021-11-14T23:23:53Z

Ran into the same issue, and it's fatal in my case: I need to parse EXIF in uploads before saving them to disk, and copying the entire buffer (up to 50MiB in size) just to read some EXIF doesn't seem ideal. This would be a lot more palatable if I could create metadata-only copies.

kamadak · 2021-12-01T13:16:55Z

There could be two possible interfaces for shared buffers:

// (1) Parses immediately, borrows nothing.
Reader::read_raw_nonlazy(..., data: &[u8]) -> Exif

// (2) Parses lazily using the borrowed buffer.
Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

If a raw Exif block is big and you want to avoid copying data, (1) will not meet the spirit of your goal, because when 50 MiB buffer is parsed immediately, struct Exif will grow that large. So, I will go with (2) though users need to care about the lifetime of the buffer. Any opinions?

@1e100 BTW, I am curious about your use cases. Are your raw Exif blocks really that large, or are you parsing TIFF data, where TIFF image data and Exif attributes are intermixed and the total size could be 50 MiB?

1e100 · 2021-12-01T13:44:23Z

I just need a few metadata fields before I save the file that's all. Files can be quite large though. It's not just the exif block - it's the entire uploaded file

kamadak · 2021-12-01T14:01:40Z

Reader::read_from_container will only keep a copy of the Exif block, so no need to worry about the size if an image is large but not the Exif block. (Except TIFF)

1e100 · 2021-12-01T14:05:08Z

Thanks for the tip, but unfortunately it could be TIFF as well. Out of curiosity (and for others to find), why is it different?

kamadak · 2021-12-01T15:07:09Z

The Exif standard re-uses TIFF's tag-value format to store Exif fields.
For images files other than TIFF, a small independent Exif block (which is in TIFF format) is embedded in an image file.
For TIFF files, Exif fields (in TIFF format) and TIFF image data (of course in TIFF format as well) are merged. Unfortunately, some Exif fields are inherited from TIFF, and the locations of those fields are specified by the TIFF standard, thus metadata cannot be gathered in one place but are scattered in a TIFF file.

Anyway, if we have a TIFF image on memory and want to avoid copying the entire buffer, read_raw_ref could help. I will consider including such an API in the next version.

w-flo · 2023-04-28T08:35:20Z

Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

I like this! I have &[u8] raw exif data and just need to extract and store some basic exif metadata from it, then immediately drop the Exif, the &[u8] slice, the buffer backing the slice, etc. The allocation (to_vec() on the slice) is not a big issue I guess, but it's not really needed in my use case. And I guess this is a pretty common use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read_raw(&self, data: Vec<u8>)` should take a reference #15

`read_raw(&self, data: Vec<u8>)` should take a reference #15

flukejones commented Sep 15, 2020

kamadak commented Sep 17, 2020

flukejones commented Sep 21, 2020

1e100 commented Nov 14, 2021

kamadak commented Dec 1, 2021

1e100 commented Dec 1, 2021

kamadak commented Dec 1, 2021

1e100 commented Dec 1, 2021

kamadak commented Dec 1, 2021

w-flo commented Apr 28, 2023

read_raw(&self, data: Vec<u8>) should take a reference #15

read_raw(&self, data: Vec<u8>) should take a reference #15

Comments

flukejones commented Sep 15, 2020

kamadak commented Sep 17, 2020

flukejones commented Sep 21, 2020

1e100 commented Nov 14, 2021

kamadak commented Dec 1, 2021

1e100 commented Dec 1, 2021

kamadak commented Dec 1, 2021

1e100 commented Dec 1, 2021

kamadak commented Dec 1, 2021

w-flo commented Apr 28, 2023

`read_raw(&self, data: Vec<u8>)` should take a reference #15

`read_raw(&self, data: Vec<u8>)` should take a reference #15