LZW attempts to decode buffer after output is already filled #41

dalcde · 2024-03-31T13:21:06Z

In DecodeState::advance, after processing a burst, the decoder unconditionally processes the new code. However, we shouldn't do so if the output is already filled, because the remaining bits in the buffer may be nonsense. This caused an InvalidCode error when trying to read one of my images.

I attempted to write a fix at https://github.com/dalcde/lzw/tree/check-out but I didn't make a PR because I'm not confident it is correct.

The text was updated successfully, but these errors were encountered:

HeroicKatora · 2024-03-31T16:11:31Z

Can you share the reproduction, the binary lzw stream if possible.

dalcde · 2024-04-01T00:08:25Z

I added a (failing) test in https://github.com/dalcde/lzw/tree/add-test . I
checked that my fix indeed fixes the test, but it breaks other tests, so it is
definitely incorrect. I can try to make the tests pass but I still wouldn't
trust the result to be correct...

HeroicKatora · 2024-04-05T17:55:18Z

I don't see this being a bug. The input stream is invalid, it should end with an end code (129 = 0x81), and it doesn't since it instead has the code 0xff at this location. The buffer size shouldn't influence the validity check of the stream and it doesn't in this case. If the stream should end early, it shouldn't be supplied in the input.

dalcde · 2024-04-06T15:49:27Z

Empirically some programs seem to produce TIFF files that are missing these end
codes. Someone else has run into it here:
https://stackoverflow.com/questions/55674925/decoding-tiff-lzw-codes-not-yet-in-the-dictionary

It would be helpful to support this case even if it is technically invalid.

HeroicKatora · 2024-04-06T16:48:33Z

The library reports the valid filled output buffer size as part of BufferResult (and all the other variants), even in the error case (that was part of the considerations when choosing to return a structure instead of a Result at the top level here). If an invalidly terminated streams should be considered valid, does it work to check whether the output buffer was filled far enough and then ignore the reported InvalidCode error?

We seem to be missing explicit test cases for that guarantee though. The minimal example you've got in your branch would be perfect for verifying these assertions, with a variant for each of the IO styles if possible.

dalcde · 2024-04-07T10:27:16Z

I think that makes it a bit awkward to write a Read wrapper around it; you want this to fail after the user asks for the *next* byte, so you need to cache that.

…

On 7 April 2024 12:48:54 am HKT, Andreas Molzer ***@***.***> wrote: The library reports the *valid* filled output buffer size as part of `BufferResult` (and all the other variants), even in the error case (that was part of the considerations when choosing to return a structure instead of a `Result` at the top level here). If an invalidly terminated streams should be considered valid, it should work to check whether the output buffer was filled far enough and then ignore the reported `InvalidCode` error. We seem to be missing explicit test cases for that guarantee though. The minimal example you've got in your branch would be perfect for verifying these assertions, with a variant for each of the IO styles if possible. -- Reply to this email directly or view it on GitHub: #41 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

fintelia · 2024-04-07T23:14:07Z

If the tiff crate is failing to decode some images produced by a widely used encoder, then I think it might make sense to treat this as a bug at the level of that crate. ImageMagick has its own TIFF implementation ~~so it is plausible it sometimes generates corrupt files, though if so, would be nice to file an issue upstream~~. Edit: That implementation calls into libtiff for the actual compression

HeroicKatora · 2024-04-08T13:33:53Z

The API required to do this properly might look like a form of Read::take, some new control to exit early on reaching some limit of total bytes. I suppose that is feasible, but it's unclear how to achieve it without loss of performance. It might require a separately monomorphized (i.e. by const-generics) control loop.

dalcde · 2024-04-08T13:58:49Z

I'm happy to file this with the tiff library instead, but it's not wrong for lzw to ignore the remaining input after the output buffer is filled, and there should be no/minimal performance impact when doing so (we can check this only if we hit the invalid code branch). I think it would make life easier for downstream users if lzw handles this

…

On 8 April 2024 9:34:18 pm HKT, Andreas Molzer ***@***.***> wrote: The API required to do this properly might look like a form of `Read::take`, some new control to exit early on reaching some limit of total bytes. I suppose that is feasible, but it's unclear how to achieve it without loss of performance. It might require a separately monomorphized (i.e. by const-generics) control loop. -- Reply to this email directly or view it on GitHub: #41 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

fintelia · 2024-04-09T21:25:25Z

Please do create the the tiff crate issue. There's no need to close this issue, but it would help to discuss in the context of specific real-world files that are failing. That way we can decide how/if to handle those files.

(And if I've misunderstood and your failing LZW streams aren't from TIFF files please do say so!)

dalcde added a commit to dalcde/lzw that referenced this issue Apr 1, 2024

Add failing test to demonstrate bug in image-rs#41

bfed3a3

fintelia mentioned this issue May 3, 2024

Loading fails with "invalid code in LZW stream" image-rs/image-tiff#231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LZW attempts to decode buffer after output is already filled #41

LZW attempts to decode buffer after output is already filled #41

dalcde commented Mar 31, 2024

HeroicKatora commented Mar 31, 2024

dalcde commented Apr 1, 2024

HeroicKatora commented Apr 5, 2024 •

edited

Loading

dalcde commented Apr 6, 2024

HeroicKatora commented Apr 6, 2024 •

edited

Loading

dalcde commented Apr 7, 2024 via email

fintelia commented Apr 7, 2024 •

edited

Loading

HeroicKatora commented Apr 8, 2024

dalcde commented Apr 8, 2024 via email

fintelia commented Apr 9, 2024

LZW attempts to decode buffer after output is already filled #41

LZW attempts to decode buffer after output is already filled #41

Comments

dalcde commented Mar 31, 2024

HeroicKatora commented Mar 31, 2024

dalcde commented Apr 1, 2024

HeroicKatora commented Apr 5, 2024 • edited Loading

dalcde commented Apr 6, 2024

HeroicKatora commented Apr 6, 2024 • edited Loading

dalcde commented Apr 7, 2024 via email

fintelia commented Apr 7, 2024 • edited Loading

HeroicKatora commented Apr 8, 2024

dalcde commented Apr 8, 2024 via email

fintelia commented Apr 9, 2024

HeroicKatora commented Apr 5, 2024 •

edited

Loading

HeroicKatora commented Apr 6, 2024 •

edited

Loading

fintelia commented Apr 7, 2024 •

edited

Loading