Image conversion not parsing correctly #478
Unanswered
mailtoshwetha09
asked this question in
Q&A
Replies: 1 comment
-
Bitmap figures are currently not processed, i.e. the output will contain their location (page, bbox, etc) and, possibly, their caption. We are working on new features which would
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have the image which is in pdf. I have exported images and trying to convert to json or markdown, where it is not extracting the content of the image correct.
Convert the current PNG file
sample json, here text values are encoded where i was expecting the same text as in png and i have to call LLM for translating the content
{
"bbox": {
"l": 590.3333129882812,
"t": 114.33333587646484,
"r": 659.6666870117188,
"b": 96.66666412353516,
"coord_origin": "BOTTOMLEFT"
},
"row_span": 1,
"col_span": 1,
"start_row_offset_idx": 5,
"end_row_offset_idx": 6,
"start_col_offset_idx": 3,
"end_col_offset_idx": 4,
"text": "3,021 @A",
"column_header": false,
"row_header": false,
"row_section": false
},{
"bbox": {
"l": 469.6666564941406,
"t": 94.66666412353516,
"r": 536.6666870117188,
"b": 77,
"coord_origin": "BOTTOMLEFT"
},
"row_span": 1,
"col_span": 1,
"start_row_offset_idx": 6,
"end_row_offset_idx": 7,
"start_col_offset_idx": 2,
"end_col_offset_idx": 3,
"text": "5,564 {8A",
"column_header": false,
"row_header": false,
"row_section": false
}, {
"bbox": {
"l": 350.3333435058594,
"t": 94,
"r": 416.6666564941406,
"b": 77,
"coord_origin": "BOTTOMLEFT"
},
"row_span": 1,
"col_span": 1,
"start_row_offset_idx": 6,
"end_row_offset_idx": 7,
"start_col_offset_idx": 1,
"end_col_offset_idx": 2,
"text": "4,638 '8A",
"column_header": false,
"row_header": false,
"row_section": false
},
Beta Was this translation helpful? Give feedback.
All reactions