You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug does not exist until version:
docling 2.9.0
docling-core 2.10.0
docling-ibm-models 2.0.8
docling-parse 2.1.2
I show the bug with two different parts in a pdf.
Here the first example:
even if the header is not depicted below correctly, it is correct. So no worries about this.
However, the glyphs are a big problem.
| | | Shape | Appearance | Appearance | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) |
| | | . | layout type. | using ground truth. | family.(S. 4.1) | breed (S. 4.2) | breed (S. 4.2) | both (S. 4.3) | both (S. 4.3) |
.
cat
dog
hierarchical
flat
0
1
glyph[check]
-
-
94.21
NA
NA
NA
NA
1
2
-
Image
-
82.56
52.01
40.59
NA
39.64
2
3
-
Image + Head
-
85.06
60.37
52.10
NA
51.23
3
4
-
Image + Head + Body
-
87.78
64.27
54.31
NA
54.05
4
5
-
Image + Head + Body
glyph[check]
88.68
66.12
57.29
NA
56.60
5
6
glyph[check]
Image
-
94.88
50.27
42.94
42.29
43.30
6
7
glyph[check]
Image + Head
-
95.07
59.11
54.56
52.78
54.03
7
8
glyph[check]
Image + Head + Body
-
94.89
63.48
55.68
55.26
56.68
8
9
glyph[check]
Image + Head + Body
glyph[check]
95.37
66.07
59.18
57.77
59.21
The original looks like:
Here you can see a table with special characters such as the check sign. They were recognized correctly in the version without GPU.
Here the second example:
Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar
W * glyph[circledot] W *4 GLYPH<16> V GLYPH<133> 2 GLYPH<240> 4 GLYPH<239> V * ··· 5 glyph[floorleft] ⁄GLYPH<134> GLYPH<239> glyph[circledot] GLYPH<16> glyph[circledot] GLYPH<134> glyph[turnstileright] glyph[circledot] ⁄GLYPH<134> · GLYPH<16> V 4 GLYPH<239> 4 glyph[turnstileright] 4 -d 5GLYPH<134> V glyph[circledot] dd4GLYPH<23> glyph[circledot] GLYPH<134> glyph[turnstileright] glyph[circledot] GLYPH<226> ··· 52 21)
IBM Research Saumerstrasse 4 8803 Ruschlikon, Switzerland
The original looks like this:
The glyphs come from the topmost line.
There is even a second bug: The ä,ü are not recognized correctly as well. But this was also true in the old versions.
Steps to reproduce
I provide you the pdf for the second example. article.pdf
Bug
This bug does not exist until version:
docling 2.9.0
docling-core 2.10.0
docling-ibm-models 2.0.8
docling-parse 2.1.2
I show the bug with two different parts in a pdf.
Here the first example:
even if the header is not depicted below correctly, it is correct. So no worries about this.
However, the glyphs are a big problem.
| | | Shape | Appearance | Appearance | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) | Classification Accuracy (%) |
| | | . | layout type. | using ground truth. | family.(S. 4.1) | breed (S. 4.2) | breed (S. 4.2) | both (S. 4.3) | both (S. 4.3) |
The original looks like:
Here you can see a table with special characters such as the check sign. They were recognized correctly in the version without GPU.
Here the second example:
Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar
W * glyph[circledot] W *4 GLYPH<16> V GLYPH<133> 2 GLYPH<240> 4 GLYPH<239> V * ··· 5 glyph[floorleft] ⁄GLYPH<134> GLYPH<239> glyph[circledot] GLYPH<16> glyph[circledot] GLYPH<134> glyph[turnstileright] glyph[circledot] ⁄GLYPH<134> · GLYPH<16> V 4 GLYPH<239> 4 glyph[turnstileright] 4 -d 5GLYPH<134> V glyph[circledot] dd4GLYPH<23> glyph[circledot] GLYPH<134> glyph[turnstileright] glyph[circledot] GLYPH<226> ··· 52 21)
IBM Research Saumerstrasse 4 8803 Ruschlikon, Switzerland
The original looks like this:
The glyphs come from the topmost line.
There is even a second bug: The ä,ü are not recognized correctly as well. But this was also true in the old versions.
Steps to reproduce
I provide you the pdf for the second example.
article.pdf
Docling version
docling 2.12.0
docling-core 2.10.0
docling-ibm-models 3.1.0
docling-parse 3.0.0
Python version
python 3.10
The text was updated successfully, but these errors were encountered: