-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LZW attempts to decode buffer after output is already filled #41
Comments
Can you share the reproduction, the binary lzw stream if possible. |
I added a (failing) test in https://github.com/dalcde/lzw/tree/add-test . I |
I don't see this being a bug. The input stream is invalid, it should end with an end code (129 = 0x81), and it doesn't since it instead has the code |
Empirically some programs seem to produce TIFF files that are missing these end It would be helpful to support this case even if it is technically invalid. |
The library reports the valid filled output buffer size as part of We seem to be missing explicit test cases for that guarantee though. The minimal example you've got in your branch would be perfect for verifying these assertions, with a variant for each of the IO styles if possible. |
I think that makes it a bit awkward to write a Read wrapper around it; you want this to fail after the user asks for the *next* byte, so you need to cache that.
…On 7 April 2024 12:48:54 am HKT, Andreas Molzer ***@***.***> wrote:
The library reports the *valid* filled output buffer size as part of `BufferResult` (and all the other variants), even in the error case (that was part of the considerations when choosing to return a structure instead of a `Result` at the top level here). If an invalidly terminated streams should be considered valid, it should work to check whether the output buffer was filled far enough and then ignore the reported `InvalidCode` error.
We seem to be missing explicit test cases for that guarantee though. The minimal example you've got in your branch would be perfect for verifying these assertions, with a variant for each of the IO styles if possible.
--
Reply to this email directly or view it on GitHub:
#41 (comment)
You are receiving this because you authored the thread.
Message ID: ***@***.***>
|
If the |
The API required to do this properly might look like a form of |
I'm happy to file this with the tiff library instead, but it's not wrong for lzw to ignore the remaining input after the output buffer is filled, and there should be no/minimal performance impact when doing so (we can check this only if we hit the invalid code branch). I think it would make life easier for downstream users if lzw handles this
…On 8 April 2024 9:34:18 pm HKT, Andreas Molzer ***@***.***> wrote:
The API required to do this properly might look like a form of `Read::take`, some new control to exit early on reaching some limit of total bytes. I suppose that is feasible, but it's unclear how to achieve it without loss of performance. It might require a separately monomorphized (i.e. by const-generics) control loop.
--
Reply to this email directly or view it on GitHub:
#41 (comment)
You are receiving this because you authored the thread.
Message ID: ***@***.***>
|
Please do create the the tiff crate issue. There's no need to close this issue, but it would help to discuss in the context of specific real-world files that are failing. That way we can decide how/if to handle those files. (And if I've misunderstood and your failing LZW streams aren't from TIFF files please do say so!) |
In
DecodeState::advance
, after processing a burst, the decoder unconditionally processes the new code. However, we shouldn't do so if the output is already filled, because the remaining bits in the buffer may be nonsense. This caused an InvalidCode error when trying to read one of my images.I attempted to write a fix at https://github.com/dalcde/lzw/tree/check-out but I didn't make a PR because I'm not confident it is correct.
The text was updated successfully, but these errors were encountered: