Skip to content

Releases: docling-project/docling

v2.44.0

12 Aug 09:51
Compare
Choose a tag to compare

Feature

  • Add convert_string to document-converter (#2069) (b09033c)

Fix

  • html: Parse rawspan and colspan when they include non numerical values (#2048) (ed56f2d)
  • Support new mlx-vlm module (#2001) (0130e3a)
  • Extend error reporting when verbose logging is enabled (#2017) (2eb760d)
  • HTML: Replace non-standard Unicode characters (#2006) (86f7012)

Documentation

v2.43.0

28 Jul 09:45
Compare
Choose a tag to compare

Feature

Fix

  • markdown: Ensure correct parsing of nested lists (#1995) (aec29a7)
  • HTML: Remove an unnecessary print command (#1988) (945721a)

v2.42.2

24 Jul 10:21
Compare
Choose a tag to compare

Fix

  • HTML: Concatenation of child strings in table cells and list items (#1981) (5132f06)
  • docx: Adding plain latex equations to table cells (#1986) (0b83609)
  • Preserve PARTIAL_SUCCESS status when document timeout hits (#1975) (98e2fcf)
  • Multi-page image support (tiff) (#1928) (8d50a59)

Documentation

v2.42.1

22 Jul 16:45
Compare
Choose a tag to compare

Fix

Documentation

v2.42.0

18 Jul 15:35
Compare
Choose a tag to compare

Feature

  • Add option to control empty clusters in layout postprocessing (#1940) (a436be7)

Fix

  • Safe pipeline init, use device_map in transformers models (#1917) (cca05c4)
  • Fix HTML table parser and JATS backend bugs (#1948) (e1e3053)
  • KeyError: 'fPr' when processing latex fractions in DOCX files (#1926) (95e7096)
  • Change granite vision model URL from preview to stable version (#1925) (c5fb353)

Documentation

v2.41.0

10 Jul 14:25
Compare
Choose a tag to compare

Feature

Fix

  • ocr-utils: Unit test and fix the rotate_bounding_box function (#1897) (931eb55)
  • Docs are missing osd packages for tesseract on RHEL (#1905) (e25873d)
  • Use only backend for picture classifier (#1904) (edd4356)
  • Typo in asr options (#1902) (dd8fde7)

v2.40.0

04 Jul 15:31
Compare
Choose a tag to compare

Feature

  • Introduce LayoutOptions to control layout postprocessing behaviour (#1870) (ec6cf6f)
  • Integrate ListItemMarkerProcessor into document assembly (#1825) (56a0e10)

Fix

  • Secure torch model inits with global locks (#1884) (598c9c5)
  • Ensure that TesseractOcrModel does not crash in case OSD is not installed (#1866) (ae39a94)

Performance

  • msexcel: _find_table_bounds use iter_rows/iter_cols instead of Worksheet.cell (#1875) (13865c0)
  • Move expensive imports closer to usage (#1863) (3089cf2)

v2.39.0

27 Jun 15:37
Compare
Choose a tag to compare

Feature

  • Leverage new list modeling, capture default markers (#1856) (0533da1)

Fix

  • markdown: Make parsing of rich table cells valid (#1821) (e79e4f0)

v2.38.1

25 Jun 16:27
Compare
Choose a tag to compare

Fix

  • Updated granite vision model version for picture description (#1852) (d337825)
  • markdown: Fix single-formatted headings & list items (#1820) (7c5614a)
  • Fix response type of ollama (#1850) (41e8cae)
  • Handle missing runs to avoid out of range exception (#1844) (4002de1)

v2.38.0

23 Jun 18:14
Compare
Choose a tag to compare

Feature

Fix

  • docx: Ensure list items have a list parent (#1827) (d26dac6)
  • msword_backend: Identify text in the same line after an image #1425 (#1610) (1350a8d)
  • Ensure uninitialized pages are removed before assembling document (#1812) (dd7f64f)
  • Formula conversion with page_range param set (#1791) (dbab30e)

Documentation