Replies: 2 comments 11 replies
-
Hi @kdaniel21, great to see you thinking towards production usage of docling! I am happy to discuss further with you on how docling can be leveraged in production scenarios. As a starting point, let me share with you a few pieces that may give you relevant context.
|
Beta Was this translation helpful? Give feedback.
-
@kdaniel21 To follow uo with what @cau-git said, we are currently writing a 2nd version of the technical report, which will include a lot more benchmarking on a variety of hardware. Could you inform us where you intend to run it, specifically the specs (pure cpu, GPU accelerated, etc). This could give us a good indication on what to benchmark. |
Beta Was this translation helpful? Give feedback.
-
Hi, first of all, thank you for the project, I've been testing it in the last few days, and it seems perform great!
We'd love to try Docling in production, but we were wondering about what the best way would be to deploy it. It doesn't need to "hyperscale" to hundreds of documents per minute, but we still want to make sure that it can deal with a dozen of documents per minute, and it doesn't crash for smaller spikes.
The "intuitive" idea i had was deploying it on a "beefy" VM with dedicated GPUs, and running it from there as usual (probably with a queue to avoid overloading it). However, I was wondering if it'd possible to deploy the models (EasyOCR, layout recognition, table extraction etc.) separately e.g. on SageMaker/AzureML through their HuggingFace integration, and run the rest of the pipeline run on a "regular" CPU-only VM. This would allow scaling the computationally heavy tasks independently from the code that's responsible for executing the pipeline, and thus could be easily integrated into an existing service/backend as one would only have to install the
docling
package, deploy the models, and use the "managed models" for inference.Additionally, this would also make it more accessible to use it locally, as one could just deploy the pre-trained models remotely, and use them even with tighter hardware constraints (despite
docling
having a great performance even on consumer grade hardware!).I also saw that there was a docling-ibm-models repo, which seems to contain the actual code for the models, and also docling-serve, which seems to be a simple API wrapper for
docling
.Thanks for taking the time, and looking forward to using Docling in production!
Beta Was this translation helpful? Give feedback.
All reactions