-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Document Intelligence] Corrupted images #31908
base: main
Are you sure you want to change the base?
Conversation
@@ -349,7 +356,8 @@ function streamToText(stream: NodeJS.ReadableStream): Promise<string> { | |||
} | |||
}); | |||
stream.on("end", () => { | |||
resolve(Buffer.concat(buffer).toString("utf8")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utf8 encoding corrupts the image data
… harshan/d-i-png-issue
API change check API changes are not detected in this pull request. |
outputFile: { | ||
junit: "test-results.browser.xml", | ||
}, | ||
fakeTimers: { | ||
toFake: ["setTimeout", "Date"], | ||
}, | ||
watch: false, | ||
include: ["test/**/*.spec.ts"], | ||
include: ["test/**/figures.spec.ts"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to keep it at one test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, apologies about this. I was testing my changes in core-
with this example.
I'll revert this back once the changes in core are finalized :)
Background
We encode the response Buffer to "utf8" by default in the
NodeFetchClient
at@azure/core-rest-pipeline
."utf8" encoding of Buffer has been found to corrupt the output in certain cases, such as when dealing with PNG files (
image/png
content-type response).This issue arises because "utf8" encoding is not suitable for binary data like images, which leads to corrupted files and unexpected behavior in applications that rely on this data.
What changed?
To address this issue, I've implemented the following changes
Content-Type Check: Introduced a check for the content type of the response header. If the content type is
"image/png"
, the encoding is now set to"binary"
instead of"utf8"
. This ensures that binary data, like PNG images, is handled correctly without corruption.Encoding Parameter: The
streamToText
function has been updated to accept an additional parameter,encoding
, which can be either "utf8" or "binary". This allows for more flexible handling of different types of response data, ensuring that the appropriate encoding is used based on the content type.TO DO
While the changes in this PR address the immediate issue with PNG files, there are further improvements needed to ensure robust handling of various content types:
bodyAsBuffer
Property: As an alternative approach, we could introduce a newbodyAsBuffer
property. By doing so, we can bypass the need for encoding altogether for certain types of responses, allowing the data to be handled as a raw Buffer. This approach could simplify the handling of binary data and prevent encoding-related issues.