Issue search results

Filter by

12 results

(72 ms)indocling-project/docling-eval (press backspace or delete to remove)

docling-project/docling-eval
Request to Release or Integrate 89-PDF Benchmark Dataset Mentioned in the Technical Report

Hi Docling Team, I really enjoyed reading your technical report, especially the section describing the 89-PDF benchmark dataset: To enable a meaningful benchmark, we composed a test set of 89 PDF files ...

CHN-ChenYi

Opened
4 days ago

#117

docling-project/docling-eval
Allow the direct evaluation of externally provided DocTag and DoclingDocument json files without having a HF parquet prediction dataset

The current design of docling-eval assumes the workflow: 1. create-gt: Create a Ground Truth dataset in HF parquet format. 2. create-eval: Create a prediction dataset in HF parquet format that contains ...

nikos-livathinos

Opened
12 days ago

#112

docling-project/docling-eval
Decide on the approach to deal with column/row headers during tables evaluation

Currently, the TEDS metrics calculation only looks at td tag while building the tree for APTED algorithm. (Reference) IMO, this will unfairly penalize any hyperscaler or even WDU/docling in case they ...

divekarsc

Opened
15 days ago

#110

docling-project/docling-eval
[Bee] Cannot write struct type 'pipeline_options' with no child field to Parquet

Instantiating DoclingPredictionProvider with do_visualization=False as follows: docling_provider = DoclingPredictionProvider( do_visualization=False, ignore_missing_predictions=False ) Will ...

bug

wai25

Opened
16 days ago

#107

docling-project/docling-eval
Refine Polyline-to-Bounding Box Matching Strategy

In the current matching strategy, a point on a polyline is associated with the smallest bounding box that contains it. https://github.com/docling-project/docling-eval/blob/b507977171780650860e74ae48f3edadd4a60b78/docling_eval/dataset_builders/cvat_dataset_builder.py#L225-L230 ...

Saidgurbuz

Opened
16 days ago

#106

docling-project/docling-eval
[Bee] RuntimeError: Cannot visualize document without images

Docling, WDU Tables/OCR tests fail with the error: RuntimeError: Cannot visualize document without images To reproduce, update test_tables_aws.py to use Docling and run. poetry run pytest -v tests/test_tables_docling.py ...

bug

wai25

Opened
17 days ago

#105

docling-project/docling-eval
[Bee] An error occurred while generating the dataset

The test_ocr_xfund_google.py test is failing and likely other tests too. To reproduce the error: poetry run pytest -v tests/test_ocr_xfund_google.py poetry run pytest -v tests/test_ocr_xfund_google.py ...

bug

samiuc

Opened
17 days ago

#104

docling-project/docling-eval
Export datasets in other formats (e.g. COCO format)

Given that docling-eval is able to create ground truth and prediction datasets built around the DoclingDocument format we may also want to export the entire GT/prediction dataset in another format. This ...

nikos-livathinos

Opened
on Apr 23

docling-project/docling-eval
Introduce a DoclingDocument Ground Truth DatasetBuilder

- Introduce DoclingDocumentDatasetBuilder to build Ground Truth datasets from lossless serializations of DoclingDocument files (e.g. jsons). - It is useful when DoclingDocument objects have been ...

nikos-livathinos

Opened
on Apr 23

docling-project/docling-eval
Extend docling-eval to support externally computed predictions

In its current implementation docling-eval is focused on the standardization of the evaluation where the DoclingDocument format is used as the interface to store both the ground-truth and the predictions. ...

nikos-livathinos

Opened
on Mar 10

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Press the

key to activate the search input again and adjust your query.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

docling-project/docling-eval
Request to Release or Integrate 89-PDF Benchmark Dataset Mentioned in the Technical Report

docling-project/docling-eval
Allow the direct evaluation of externally provided DocTag and DoclingDocument json files without having a HF parquet prediction dataset

docling-project/docling-eval
Decide on the approach to deal with column/row headers during tables evaluation

docling-project/docling-eval
[Bee] Cannot write struct type 'pipeline_options' with no child field to Parquet

docling-project/docling-eval
Refine Polyline-to-Bounding Box Matching Strategy

docling-project/docling-eval
[Bee] RuntimeError: Cannot visualize document without images

docling-project/docling-eval
[Bee] An error occurred while generating the dataset

docling-project/docling-eval
Export datasets in other formats (e.g. COCO format)

docling-project/docling-eval
Introduce a DoclingDocument Ground Truth DatasetBuilder

docling-project/docling-eval
Extend docling-eval to support externally computed predictions

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:docling-project/docling-eval language:Python

Filter by

State

Advanced

12 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.