Memory Usage Spike When Analyzing 5MB PDF with Document Intelligence SDK #38972
Labels
Client
This issue points to a problem in the data-plane of the library.
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
Document Intelligence
needs-team-attention
Workflow: This issue needs attention from Azure service team or SDK team
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Package Name:
azure-ai-documentintelligence==1.0.0b4
Package Version:
Operating System:
mac OS 14.7.1(23H222)
Python Version:
Python 3.11.3
Describe the bug
I encountered an issue with the Document Intelligence SDK when analyzing a PDF file of approximately 5MB. Upon receiving the result using result: AnalyzeResult = poller.result(), the memory usage spiked to around 4GB. This increase in memory usage seems abnormal. The actual response, when output to a txt file, was only about 360MB in size.
To Reproduce
Expected behavior
The memory usage should not increase significantly and should be proportional to the size of the analyzed PDF file.
** Actual behavior **
The memory usage spiked to around 4GB, which seems disproportionate to the size of the PDF file and the resulting output.
Screenshots
Additional context
The PDF file contains approximately 1600 pages, with each page containing information equivalent to two standard pages. Any insights or suggestions to mitigate this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered: