Updates for docling-serve #12

dolfim-ibm · 2024-12-09T12:11:20Z

Here are updates coming for docling-serve.

Updates to the latest docling v2. New input and output formats, better processing, more options.
Richer endpoints. The new options in docling allow for a richer API, for example:
- One endpoint with multiple output formats. Which one is returned is controlled by the input payload options (see later)
- One endpoint specialized for markdown which returns the markdown as plain text
```
POST /convert
{
    "document": {
        "markdown": "",
        "docling_document": {},
        "html": "",
    },
    "errors": [],
    "status": "enum",
    "timings": {},
}

POST /convert/markdown
# text/markdown response
```

Input options. Both endpoint will accept the following payload

{
    "file_source": {
        "base64_string": "string",
        "filename": "string"
    },
    "http_source": {
        "url": "string",
        "headers": {}
    },

    "options": {
        "output_markdown": "bool, default false",
        "output_html": "bool, default false",
        "output_docling_document": "bool, default true",
        "do_ocr": "bool, default true",
        "ocr_engine": "enum (easyocr, tesseract, rapidocr",
        "ocr_lang": "optional[list[str]]",
        "do_table_structure": "bool, default true",
        "include_images": "bool, default ... Embedded page images in docling_document, embedded images in ",
        "images_scale": 2.0
    }
}

Persistent DocumentConverter. To avoid reloading models all the time, we should keep a global DocumentConverter initialized. However, the initialization of the class will set options like which OCR engine, i.e. we want to add a cache for multiple DocumentConverter one for each common set of input options.

The text was updated successfully, but these errors were encountered:

vishnoianil · 2024-12-09T16:33:44Z

@dolfim-ibm @guimou @nerdalert For the community UI usecase, we will need (in long term) these API's to be async. We have two options

Only expose the aysnc API's to start with.
Expose two version of the API, one is for sync and one for async (provide websocket or just simple polling with the job id)

I am more inclined toward 1) so that we can avoid api explosion, but i do see some use cases where having a sync api can be helpful, like writing simple CLI client or simple demo scripts that uses docling-serve for doc conversion. What are your thoughts?

guimou · 2024-12-09T21:07:32Z

Asyncs APIs will require some more thinking depending how you want to work. Two different avenues:

Call to the API initially sends back a token, and there is a websocket endpoint to connect to (authenticating with the token), from where the server can push a "conversion ready" message or directly the result.
Call to the API sends back the token, but then you let the client regularly ping another endpoint until the result (or an error is sent back.

In both cases, you can implement a queue system, with feedback on where you are in the queue, and eventually the progress of the conversion (if/when docling provides such data).
Option 2 is more crude. Option 1 is more evolved, but would allow for real-time feedback (like when you advance in the queue). However it puts more load on the "client" side as it's not anyone who knows how to work with web sockets.
And of course the two are doable simultaneously as anyway the tokens and queue themselves are handled separately.

Anyway, directly to your questions @vishnoianil:

Only expose the aysnc API's to start with.

That's not what I would start with as it's more complicated to consume (from a client perspective). Let's start by offering a sync API, then add async endpoints.

Expose two version of the API, one is for sync and one for async (provide websocket or just simple polling with the job id)

That's where I would go, using both Websockets and "standard" status as it's not that much more implementation.

As we discussed, my current implementation offers two endpoint for processing url(s) and file(s). We seemed to agree to continue it this way. So here is the API map I am proposing (I am adding a v1 prefix to allow for organized non-breaking evolution in the future):

/health -> simple health check for probes
/v1/convert/url -> converts a url or a set of urls (either in a list or comma-separated sting, that's my current behaviour)
/v1/convert/file -> converts a file or a set of files.

In both cases, the client can specify each option available in the docling CLI, plus some outputs options (direct markdown, files in a zip). Options are currently:

from_format (Optional[Union[List[str], str]]): Input format(s) to convert from. Allowed values: docx, pptx, html, image, pdf, asciidoc, md. Defaults to all formats.
to_format (Optional[Union[List[str], str]]): Output format(s) to convert to. Allowed values: md, json, text, doctags. Defaults to Markdown.
ocr (Optional[bool]): If enabled, the bitmap content will be processed using OCR. Defaults to true.
force_ocr (Optional[bool]): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to false.
ocr_engine (Optional[str]): OCR engine to use. Allowed values: easyocr, tesseract_cli, tesseract. Defaults to easyocr.
pdf_backend (Optional[str]): PDF backend to use. Allowed values: pypdfium2, dlparse_v1, dlparse_v2. Defaults to dlparse_v1.
table_mode (Optional[str]): Table mode to use. Allowed values: fast, accurate. Defaults to fast.
abort_on_error (Optional[bool]): If enabled, abort on error. Defaults to false.
return_as_file (Optional[bool]): If enabled, return the output as a file. Defaults to false.

Above is doable almost immediately (give me 1-2 days after we agree, this project is unfortunately not on the top of my list).

In a second phase I would introduce the async endpoints:

/v1/async_convert/url -> converts a url or a set of urls (either in a list or comma-separated sting, that's my current behaviour)
/v1/async_convert/file -> converts a file or a set of files.
/v1/status/{task_id} -> standard feedback (retries implemented client side)
/v1/ws_status/{task_id} -> websocket feedback (real-time feedback)

Waiting for comments/approval to go on.

vishnoianil · 2024-12-10T17:39:33Z

Asyncs APIs will require some more thinking depending how you want to work. Two different avenues:

Call to the API initially sends back a token, and there is a websocket endpoint to connect to (authenticating with the token), from where the server can push a "conversion ready" message or directly the result.

Call to the API sends back the token, but then you let the client regularly ping another endpoint until the result (or an error is sent back.

In both cases, you can implement a queue system, with feedback on where you are in the queue, and eventually the progress of the conversion (if/when docling provides such data). Option 2 is more crude. Option 1 is more evolved, but would allow for real-time feedback (like when you advance in the queue). However it puts more load on the "client" side as it's not anyone who knows how to work with web sockets. And of course the two are doable simultaneously as anyway the tokens and queue themselves are handled separately.

Makes sense to me. I think sync api -> async-api (with client polling) -> async-api (with websocket), seems like a reasonable evolution plan.

Anyway, directly to your questions @vishnoianil:

Only expose the aysnc API's to start with.

That's not what I would start with as it's more complicated to consume (from a client perspective). Let's start by offering a sync API, then add async endpoints.

Expose two version of the API, one is for sync and one for async (provide websocket or just simple polling with the job id)

That's where I would go, using both Websockets and "standard" status as it's not that much more implementation.

Sounds good. I think major issue with any async apis is the scaling of the apis. Running multiple instances of the docling + api server etc might need some more work. But i believe that's not our day 1 problem at this point of time, but something to keep in back of our mind.

As we discussed, my current implementation offers two endpoint for processing url(s) and file(s). We seemed to agree to continue it this way. So here is the API map I am proposing (I am adding a v1 prefix to allow for organized non-breaking evolution in the future):

/health -> simple health check for probes

/v1/convert/url -> converts a url or a set of urls (either in a list or comma-separated sting, that's my current behaviour)

/v1/convert/file -> converts a file or a set of files.

This looks good to me, minor suggestion, can be use /v1alpha/ instead of /v1, given that these are first cut api's and will need some time to stablize.

In both cases, the client can specify each option available in the docling CLI, plus some outputs options (direct markdown, files in a zip). Options are currently:

from_format (Optional[Union[List[str], str]]): Input format(s) to convert from. Allowed values: docx, pptx, html, image, pdf, asciidoc, md. Defaults to all formats.

to_format (Optional[Union[List[str], str]]): Output format(s) to convert to. Allowed values: md, json, text, doctags. Defaults to Markdown.

ocr (Optional[bool]): If enabled, the bitmap content will be processed using OCR. Defaults to true.

force_ocr (Optional[bool]): If enabled, replace any existing text with OCR-generated text over the full content. Defaults to false.

ocr_engine (Optional[str]): OCR engine to use. Allowed values: easyocr, tesseract_cli, tesseract. Defaults to easyocr.

pdf_backend (Optional[str]): PDF backend to use. Allowed values: pypdfium2, dlparse_v1, dlparse_v2. Defaults to dlparse_v1.

table_mode (Optional[str]): Table mode to use. Allowed values: fast, accurate. Defaults to fast.

abort_on_error (Optional[bool]): If enabled, abort on error. Defaults to false.

return_as_file (Optional[bool]): If enabled, return the output as a file. Defaults to false.

Above is doable almost immediately (give me 1-2 days after we agree, this project is unfortunately not on the top of my list).

In a second phase I would introduce the async endpoints:

/v1/async_convert/url -> converts a url or a set of urls (either in a list or comma-separated sting, that's my current behaviour)

/v1/async_convert/file -> converts a file or a set of files.

/v1/status/{task_id} -> standard feedback (retries implemented client side)

/v1/ws_status/{task_id} -> websocket feedback (real-time feedback)

I think the following url might be more cleaner :
/v1alpha/convert/url/async
/v1alpha/convert/file/async
/v1alpha/status/poll/{task_id}
/v1alpha/status/ws/{tasl_id}

But i think we can take the async api discussion through different issues.

Waiting for comments/approval to go on.

Details for the sync api's looks good to me. so +1 from me.

guimou · 2024-12-10T18:21:21Z

@vishnoianil Thanks for the feedback. I'm going on with this plan then.

willkara · 2024-12-13T23:53:19Z

Taking a look at some of the response options for /convert, is there any reason I wouldn't be able to add in responses for different types like text, json and others included in the export commands already?

dolfim-ibm mentioned this issue Dec 10, 2024

feat: upgrade endpoint to docling v2 #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for docling-serve #12

Updates for docling-serve #12

dolfim-ibm commented Dec 9, 2024

vishnoianil commented Dec 9, 2024

guimou commented Dec 9, 2024

vishnoianil commented Dec 10, 2024 •

edited

Loading

guimou commented Dec 10, 2024

willkara commented Dec 13, 2024

Updates for docling-serve #12

Updates for docling-serve #12

Comments

dolfim-ibm commented Dec 9, 2024

vishnoianil commented Dec 9, 2024

guimou commented Dec 9, 2024

vishnoianil commented Dec 10, 2024 • edited Loading

guimou commented Dec 10, 2024

willkara commented Dec 13, 2024

vishnoianil commented Dec 10, 2024 •

edited

Loading