v2.14.0 - 2024-12-18
v2.13.0 - 2024-12-17
- Updated Layout processing with forms and key-value areas (#530) (
60dc852
) - Create a backend to parse USPTO patents into DoclingDocument (#606) (
4e08750
) - Add Easyocr parameter recog_network (#613) (
3b53bd3
)
- Add Haystack RAG example (#615) (
3e599c7
) - Fix the path to the run_with_accelerator.py example (#608) (
3bb3bf5
)
v2.12.0 - 2024-12-13
v2.11.0 - 2024-12-12
- Do not import python modules from deepsearch-glm (#569) (
aee9c0b
) - Handle no result from RapidOcr reader (#558) (
f45499c
) - Make enum serializable with human-readable value (#555) (
a7df337
)
v2.10.0 - 2024-12-09
- Call into docling-core for legacy document transform (#551) (
7972d47
) - Introduce Image format options in CLI. Silence the tqdm downloading messages. (#544) (
78f61a8
)
v2.9.0 - 2024-12-09
- Expose new hybrid chunker, update docs (#384) (
c8ecdd9
) - MS Word backend: Make detection of headers and other styles localization agnostic (#534) (
3e073df
)
- Correcting DefaultText ID for MS Word backend (#537) (
eb7ffcd
) - Add
py.typed
marker file (#531) (9102fe1
) - Enable HTML export in CLI and add options for image mode (#513) (
0d11e30
) - Missing text in docx (t tag) when embedded in a table (#528) (
b730b2d
) - Restore pydantic version pin after fixes (#512) (
c830b92
) - Folder input in cli (#511) (
8ada0bc
)
v2.8.3 - 2024-12-03
v2.8.2 - 2024-12-03
- ParserError EOF inside string (#470) (#472) (
c90c41c
) - PermissionError when using tesseract_ocr_cli_model (#496) (
d3f84b2
)
- Add styling for faq (#502) (
5ba3807
) - Typo in faq (#484) (
33cff98
) - Add automatic api reference (#475) (
d487210
) - Introduce faq section (#468) (
8ccb3c6
)
v2.8.1 - 2024-11-29
v2.8.0 - 2024-11-27
- Use correct image index in word backend (#442) (
767563b
) - Update tests and examples for docling-core 2.5.1 (#449) (
29807a2
)
v2.7.1 - 2024-11-26
v2.7.0 - 2024-11-20
v2.6.0 - 2024-11-19
- Added support for exporting DocItem to an image when page image is available (#379) (
3f91e7d
) - Expose ocr-lang in CLI (#375) (
ed785ea
) - Added excel backend (#334) (
926dfd2
) - Extracting picture data for raster images found in PPTX (#349) (
7a97d71
)
- Fixing images in the input Word files (#330) (
8533039
) - Reduce logging by keeping option for more verbose (#323) (
8b437ad
)
- Fixed typo in v2 example v2 (#378) (
911c3bd
) - Add automatic generation of CLI reference (#325) (
ca8524e
) - Add architecture outline (#341) (
25fd149
) - Fix parameter in usage.md (#332) (
835e077
)
v2.5.2 - 2024-11-13
v2.5.1 - 2024-11-12
v2.5.0 - 2024-11-12
- OCR: Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning (#290) (
c6b3763
)
- Configure env prefix for docling settings (#315) (
5d4a10b
) - Added handling of grouped elements in pptx backend (#307) (
81c8243
) - Allow mps usage for easyocr (#286) (
97f214e
)
v2.4.2 - 2024-11-08
- EasyOcrModel: Support the use_gpu pipeline parameter in EasyOcrModel. Initialize easyocr (#282) (
0eb065e
)
v2.4.1 - 2024-11-08
- tesserocr: Raise Exception if tesserocr has not loaded any languages (#279) (
704d792
) - Dockerfile example copy command (#234) (
90836db
)
- Update badges & credits (#248) (
a84ec27
) - Add coming-soon section (#235) (
5ce02c5
) - Add artifacts-path param to CLI (#233) (
d5e65ae
)
v2.4.0 - 2024-11-04
- Add explicit artifacts path example (#224) (
eeee3b4
) - Update custom convert and dockerfile (#226) (
5f5fea9
) - Correct spelling of 'individual' (#219) (
41acaa9
) - Update LlamaIndex docs (#196) (
244ca69
)
v2.3.1 - 2024-10-30
- Simplify torch dependencies and update pinned docling deps (#190) (
eb679cc
) - Allow to explicitly initialize the pipeline (#189) (
904d24d
)
v2.3.0 - 2024-10-30
v2.2.1 - 2024-10-28
- Fix header levels for DOCX & HTML (#184) (
b9f5c74
) - Handling of long sequence of unescaped underscore chars in markdown (#173) (
94d0729
) - HTML backend, fixes for Lists and nested texts (#180) (
7d19418
) - MD Backend, fixes to properly handle trailing inline text and emphasis in headers (#178) (
88c1673
)
- Update LlamaIndex docs for Docling v2 (#182) (
2cece27
) - Fix batch convert (#177) (
189d3c2
) - Add export with embedded images (#175) (
8d356aa
)
v2.2.0 - 2024-10-23
- Update to docling-parse v2 without history (#170) (
4116819
) - Support AsciiDoc and Markdown input format (#168) (
3023f18
)
v2.1.0 - 2024-10-18
- Typo fix (#155) (
f799e77
) - Add graphical band in readme (#154) (
034a411
) - Add use docling (#150) (
61c092f
)
v2.0.0 - 2024-10-16
v1.20.0 - 2024-10-11
v1.19.1 - 2024-10-11
- Remove stderr from tesseract cli and introduce fuzziness in the text validation of OCR tests (#138) (
dae2a3b
)
v1.19.0 - 2024-10-08
v1.18.0 - 2024-10-03
v1.17.0 - 2024-10-03
v1.16.1 - 2024-09-27
v1.16.0 - 2024-09-27
v1.15.0 - 2024-09-24
v1.14.0 - 2024-09-24
v1.13.1 - 2024-09-23
v1.13.0 - 2024-09-18
v1.12.2 - 2024-09-17
v1.12.1 - 2024-09-16
v1.12.0 - 2024-09-13
v1.11.0 - 2024-09-10
v1.10.0 - 2024-09-10
v1.9.0 - 2024-09-03
v1.8.5 - 2024-08-30
v1.8.4 - 2024-08-30
v1.8.3 - 2024-08-28
v1.8.2 - 2024-08-27
v1.8.1 - 2024-08-26
v1.8.0 - 2024-08-23
v1.7.1 - 2024-08-23
- Better raise exception when a page fails to parse (#46) (
8808463
) - Upgrade docling-parse to 1.1.1, safety checks for failed parse on pages (#45) (
7e84533
)