Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_raw(&self, data: Vec<u8>) should take a reference #15

Open
flukejones opened this issue Sep 15, 2020 · 9 comments
Open

read_raw(&self, data: Vec<u8>) should take a reference #15

flukejones opened this issue Sep 15, 2020 · 9 comments

Comments

@flukejones
Copy link

read_raw(&self, data: Vec<u8>) should be taking a reference, eg: data: &[u8]. Is there any reason not to do so here?

@kamadak
Copy link
Owner

kamadak commented Sep 17, 2020

Exif fields are lazily parsed. Keeping a copy of unparsed data steam in the struct Exif makes it self-contained, and users can handle the struct without being bothered by the lifetime of the buffer.

@flukejones
Copy link
Author

I understand that, but it would also be good to have an option to refer to a buffer also. Maybe both can be done?

@1e100
Copy link

1e100 commented Nov 14, 2021

Ran into the same issue, and it's fatal in my case: I need to parse EXIF in uploads before saving them to disk, and copying the entire buffer (up to 50MiB in size) just to read some EXIF doesn't seem ideal. This would be a lot more palatable if I could create metadata-only copies.

@kamadak
Copy link
Owner

kamadak commented Dec 1, 2021

There could be two possible interfaces for shared buffers:

// (1) Parses immediately, borrows nothing.
Reader::read_raw_nonlazy(..., data: &[u8]) -> Exif

// (2) Parses lazily using the borrowed buffer.
Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

If a raw Exif block is big and you want to avoid copying data, (1) will not meet the spirit of your goal, because when 50 MiB buffer is parsed immediately, struct Exif will grow that large. So, I will go with (2) though users need to care about the lifetime of the buffer. Any opinions?

@1e100 BTW, I am curious about your use cases. Are your raw Exif blocks really that large, or are you parsing TIFF data, where TIFF image data and Exif attributes are intermixed and the total size could be 50 MiB?

@1e100
Copy link

1e100 commented Dec 1, 2021

I just need a few metadata fields before I save the file that's all. Files can be quite large though. It's not just the exif block - it's the entire uploaded file

@kamadak
Copy link
Owner

kamadak commented Dec 1, 2021

Reader::read_from_container will only keep a copy of the Exif block, so no need to worry about the size if an image is large but not the Exif block. (Except TIFF)

@1e100
Copy link

1e100 commented Dec 1, 2021

Thanks for the tip, but unfortunately it could be TIFF as well. Out of curiosity (and for others to find), why is it different?

@kamadak
Copy link
Owner

kamadak commented Dec 1, 2021

The Exif standard re-uses TIFF's tag-value format to store Exif fields.
For images files other than TIFF, a small independent Exif block (which is in TIFF format) is embedded in an image file.
For TIFF files, Exif fields (in TIFF format) and TIFF image data (of course in TIFF format as well) are merged. Unfortunately, some Exif fields are inherited from TIFF, and the locations of those fields are specified by the TIFF standard, thus metadata cannot be gathered in one place but are scattered in a TIFF file.

Anyway, if we have a TIFF image on memory and want to avoid copying the entire buffer, read_raw_ref could help. I will consider including such an API in the next version.

@w-flo
Copy link

w-flo commented Apr 28, 2023

Reader::read_raw_ref<'a>(..., data: &'a [u8]) -> ExifRef<'a>

I like this! I have &[u8] raw exif data and just need to extract and store some basic exif metadata from it, then immediately drop the Exif, the &[u8] slice, the buffer backing the slice, etc. The allocation (to_vec() on the slice) is not a big issue I guess, but it's not really needed in my use case. And I guess this is a pretty common use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants