Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
95635d5
code to produce models
Dec 5, 2019
4601237
add files needed for training
Dec 5, 2019
2263305
add files needed for training
Dec 5, 2019
1882dd8
Update config_params.json
Dec 5, 2019
e8afb37
Update README
Dec 5, 2019
99a02a1
Update README
Dec 5, 2019
7eb3dd2
Update README
Dec 5, 2019
cf18aa7
Update README
Dec 5, 2019
ac54266
Delete README
Dec 5, 2019
350378a
Add new file
Dec 5, 2019
979b824
📝 howto: Be more verbose with the subtree pull
mikegerber Dec 9, 2019
8084e13
Update README
vahidrezanezhad Dec 10, 2019
4229ad9
Update README.md
vahidrezanezhad Dec 10, 2019
943628e
Merge commit '4229ad92d7460ed9fdc63a2837527586fde18de3'
Dec 10, 2019
b5f9b9c
Update main.py
vahidrezanezhad Dec 10, 2019
df536d6
Add LICENSE
cneud Dec 10, 2019
c07d16d
Merge pull request #2 from cneud/add-license-1
cneud Jan 15, 2020
ad1360b
Update README.md
cneud Jan 15, 2020
66d7138
Update README.md
cneud Jan 15, 2020
325864e
Merge pull request #7 from qurator-spk/update-readme
vahidrezanezhad Jan 16, 2020
4e21647
Update README.md
vahidrezanezhad Jan 16, 2020
b54285b
Update README.md
vahidrezanezhad Jan 16, 2020
070c2e0
first updates, padding, rotations
Jun 22, 2021
8884b90
continue training, losses and etc
Jun 22, 2021
2e2b6ee
Merge pull request #15 from vahidrezanezhad/master
vahidrezanezhad Jun 22, 2021
2d9ba85
Update README.md
vahidrezanezhad Jun 23, 2021
1540739
Update README.md
vahidrezanezhad Jun 23, 2021
491cdbf
Update README.md
vahidrezanezhad Jun 23, 2021
76c75d1
Update README.md
vahidrezanezhad Jun 23, 2021
310a709
Update README.md
vahidrezanezhad Jun 23, 2021
b1c8bdf
Update README.md
vahidrezanezhad Jun 29, 2021
49853bb
Update README.md
vahidrezanezhad Jun 29, 2021
09c0d5e
Update README.md
vahidrezanezhad Jun 29, 2021
bcc900b
Update README.md
vahidrezanezhad Jun 29, 2021
083f5ae
Update README.md
vahidrezanezhad Jul 14, 2021
5282caa
supposed to solve https://github.com/qurator-spk/sbb_binarization/iss…
Aug 22, 2022
57dae56
adjusting to tf2
vahidrezanezhad Apr 4, 2024
ced1f85
adding requirements
vahidrezanezhad Apr 4, 2024
4565229
use headless cv2
cneud Apr 10, 2024
d0b0395
add info on helpful tools (fix #14)
cneud Apr 10, 2024
39aa886
update parameter config docs (fix #11)
cneud Apr 10, 2024
666a626
code formatting with black; typos
cneud Apr 10, 2024
6e06742
first working update of branch
vahidrezanezhad Apr 15, 2024
ca63c09
integrating first working classification training model
vahidrezanezhad Apr 29, 2024
c989f7a
adding enhancement training
vahidrezanezhad May 6, 2024
e1f62c2
inference script is added
vahidrezanezhad May 7, 2024
bc2ca71
modifications
vahidrezanezhad May 7, 2024
241cb90
Update train.py
vahidrezanezhad May 8, 2024
d277ec4
Update utils.py
vahidrezanezhad May 12, 2024
d6a057b
adding page xml to label generator
vahidrezanezhad May 16, 2024
faeac99
page to label enable textline new concept
vahidrezanezhad May 17, 2024
b2085a1
update requirements
vahidrezanezhad May 17, 2024
f1c2913
page2label with a dynamic layout
vahidrezanezhad May 22, 2024
47c6bf6
dynamic layout decorated with artificial class on text elements boundry
vahidrezanezhad May 23, 2024
348d323
missing text types are added
vahidrezanezhad May 23, 2024
a83d53c
use cases like textline, word and glyph are added
vahidrezanezhad May 23, 2024
61487bf
use case printspace is added
vahidrezanezhad May 23, 2024
d346b31
machine based reading order training dataset generator is added
vahidrezanezhad May 24, 2024
9638098
machine based reading order training is integrated
vahidrezanezhad May 24, 2024
ccf520d
adding rest_as_paragraph and rest_as_graphic to elements
vahidrezanezhad May 27, 2024
467bbb2
pass degrading scales for image enhancement as a json file
vahidrezanezhad May 28, 2024
cc7577d
min area size of text region passes as an argument for machine based …
vahidrezanezhad May 28, 2024
4fb45a6
inference for reading order
vahidrezanezhad May 28, 2024
06ed006
reading order detection on xml with layout + result will be written i…
vahidrezanezhad May 29, 2024
0978961
min_area size of regions considered for reading order detection passe…
vahidrezanezhad May 29, 2024
47a1646
modifying xml parsing
vahidrezanezhad May 30, 2024
3ef0dbd
scaling and cropping of labels and org images
vahidrezanezhad May 30, 2024
13ebe71
replacement in a list done correctly
vahidrezanezhad Jun 6, 2024
742e3c2
Update README.md
vahidrezanezhad Jun 6, 2024
5a5914e
just defined textregion types can be extracted as label
vahidrezanezhad Jun 6, 2024
0e4dd0b
just defined textregion types can be extracted as label
vahidrezanezhad Jun 6, 2024
4c37628
just defined graphic region types can be extracted as label
vahidrezanezhad Jun 6, 2024
cc91e4b
updating train.py
vahidrezanezhad Jun 7, 2024
1921e67
updating train.py nontransformer backend
vahidrezanezhad Jun 10, 2024
29da23d
binarization as a separate task of segmentation
vahidrezanezhad Jun 11, 2024
95faf1a
transformer patch size is dynamic now.
vahidrezanezhad Jun 12, 2024
22d7359
Transformer+CNN structure is added to vision transformer type
vahidrezanezhad Jun 12, 2024
66022cf
update config
vahidrezanezhad Jun 12, 2024
b3cd01d
update reading order machine based
vahidrezanezhad Jun 21, 2024
fe69b9c
update inference
vahidrezanezhad Jun 21, 2024
9260d29
resolving typo
vahidrezanezhad Jul 9, 2024
3bceec9
printspace_as_class_in_layout is integrated. Printspace can be define…
vahidrezanezhad Jul 16, 2024
453d0fb
adding degrading and brightness augmentation to no patches case training
vahidrezanezhad Jul 17, 2024
861f0b1
brightness augmentation modified
Jul 17, 2024
840d7c2
increasing margin in the case of pixelwise inference
Jul 23, 2024
2c822da
erosion and dilation parameters are changed & separators are written …
vahidrezanezhad Jul 24, 2024
3819760
inference updated
vahidrezanezhad Jul 24, 2024
6fb28d6
erosion rate changed
vahidrezanezhad Aug 1, 2024
2d83b8f
add documentation from wiki as markdown file to the codebase
cneud Aug 8, 2024
3b90347
save only layout output. different from overlayed layout on image
vahidrezanezhad Aug 9, 2024
bf5837b
update
vahidrezanezhad Aug 9, 2024
5e1821a
augmentation function for red textlines, rgb background and scaling f…
vahidrezanezhad Aug 20, 2024
445c45c
updating augmentations
vahidrezanezhad Aug 21, 2024
aeb2ee4
scaling, channels shuffling, rgb background and red content added to …
vahidrezanezhad Aug 21, 2024
61cdd2a
using prepared binarized images in the case of augmentation
vahidrezanezhad Aug 22, 2024
5bbd098
early dilation for textline artificial class
vahidrezanezhad Aug 27, 2024
a57a316
adding foreground rgb to augmentation
vahidrezanezhad Aug 28, 2024
e3da494
fixing artificial class bug
vahidrezanezhad Aug 28, 2024
3f354e1
new augmentations for patchwise training
vahidrezanezhad Aug 30, 2024
a524f8b
Update inference.py to check if save_layout was passed as argument ot…
johnlockejrr Oct 19, 2024
f09eed1
Changed deprecated `lr` to `learning_rate` and `model.fit_generator` …
johnlockejrr Oct 19, 2024
fd14e65
early_erosion is added
vahidrezanezhad Oct 25, 2024
7b4d14b
addinh shifting augmentation
vahidrezanezhad Oct 29, 2024
238ea3b
update resizing in inference
vahidrezanezhad Nov 14, 2024
e9b860b
artificial_class_label for table region
vahidrezanezhad Nov 18, 2024
90a1b18
this enables to visualize reading order of textregions provided in pa…
vahidrezanezhad Mar 14, 2025
363c343
visualising reaidng order- Overlaying on image is provided
vahidrezanezhad Mar 17, 2025
825b263
rotation augmentation is provided for machine based reading order
vahidrezanezhad Apr 16, 2025
dd21a3b
updating:rotation augmentation is provided for machine based reading …
vahidrezanezhad Apr 16, 2025
4635dd2
updating:rotation augmentation is provided for machine based reading …
vahidrezanezhad Apr 16, 2025
44d0268
Merge pull request #18 from johnlockejrr/unifying-training-models
cneud Apr 17, 2025
3b123b0
adding min_early parameter for generating training dataset for machin…
vahidrezanezhad May 3, 2025
5694d97
saving model by steps is added to reading order and pixel wise segmen…
vahidrezanezhad May 5, 2025
92954b1
resolving issued with saving model by steps
vahidrezanezhad May 5, 2025
6fa766d
Update utils.py
johnlockejrr May 11, 2025
3a9fc0e
Update utils.py
johnlockejrr May 11, 2025
4ddc84d
visulizing textline detection from eynollah page-xml output
vahidrezanezhad May 12, 2025
4a7728b
visuliazation layout from eynollah page-xml output
vahidrezanezhad May 12, 2025
25abc0f
Update gt_gen_utils.py
johnlockejrr May 14, 2025
f9390c7
updating inference for mb reading order
vahidrezanezhad May 17, 2025
25e3a2a
visualizing ro for single xml file
vahidrezanezhad May 23, 2025
eb91000
layout visualization updated
vahidrezanezhad Jun 2, 2025
0e7de52
Merge pull request #24 from johnlockejrr/unifying-training-models
cneud Jun 3, 2025
f5a1d1a
docker file to train model with desired cuda and cudnn
vahidrezanezhad Jun 25, 2025
1b22259
Update README.md: how to train model using docker image
vahidrezanezhad Jun 25, 2025
6462ea5
adding visualization of ocr text of xml file
vahidrezanezhad Aug 6, 2025
263da75
loading xmls with UTF-8 encoding
vahidrezanezhad Aug 7, 2025
cf4983d
visualize vertical ocr text vertically
vahidrezanezhad Aug 8, 2025
68a71be
Running inference on files in a directory
vahidrezanezhad Sep 13, 2025
530897c
renaming argument names
vahidrezanezhad Sep 19, 2025
a65405b
tables are visulaized within layout
vahidrezanezhad Sep 22, 2025
3b9548d
Merge sbb_pixelwise_segmentation training code into eynollah
kba Sep 29, 2025
56c4b7a
:memo: align pre-merge docs/train.md with former upstream train.md sy…
kba Sep 29, 2025
ea05461
add documentation on eynollah layout from eynollah wiki
kba Sep 29, 2025
52a7c93
add documentation on training eynollah from sbb_pixelwise_segmentatio…
kba Sep 29, 2025
6d37978
:memo: align former upstream train.md with wiki train.md syntactically
kba Sep 29, 2025
ce02a35
:fire: remove obsolete versions of the training document
kba Sep 29, 2025
2bcd20e
reference the now-merged training tools in README.md
kba Sep 29, 2025
9d8b858
remove docs/eynollah-layout, superseded by docs/model.md and docs/usa…
kba Sep 29, 2025
53c1ca1
Update README.md
cneud Sep 29, 2025
070dafc
remove duplicate LICENSE
cneud Sep 29, 2025
558867e
fix typo
cneud Sep 30, 2025
9ce127e
remove unnecessary backslash
cneud Sep 30, 2025
1d0616e
comparisons to None should not use the equality operators
cneud Sep 30, 2025
70af001
mutable defaults are the source of all evil
cneud Sep 30, 2025
f2f93e0
list literal is faster than using list constructor to create a new list
cneud Sep 30, 2025
91d2a74
remove redundant parentheses
cneud Sep 30, 2025
e027bc0
Update README.md
cneud Sep 30, 2025
4514d41
force GH markdown code block in list
cneud Sep 30, 2025
733af1e
:memo: update train/README.md, align with docs/train.md
kba Oct 1, 2025
48266b1
make training dependencies optional-dependencies of eynollah
kba Oct 1, 2025
95bb590
Merge branch 'integrate-training-from-sbb_pixelwise_segmentation' of …
kba Oct 1, 2025
8a9b4f8
remove commented-out requirement for tf == 2.12.1, rely on same versi…
kba Oct 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ output.html
/build
/dist
*.tif
*.sw?
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Added:
Fixed:

* allow empty imports for optional dependencies
* avoid Numpy warnings (empty slices etc)
* avoid Numpy warnings (empty slices etc.)
* remove deprecated Numpy types
* binarization CLI: make `dir_in` usable again

Expand Down
55 changes: 29 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,24 @@
![](https://user-images.githubusercontent.com/952378/102350683-8a74db80-3fa5-11eb-8c7e-f743f7d6eae2.jpg)

## Features
* Support for up to 10 segmentation classes:
* Support for 10 distinct segmentation classes:
* background, [page border](https://ocr-d.de/en/gt-guidelines/trans/lyRand.html), [text region](https://ocr-d.de/en/gt-guidelines/trans/lytextregion.html#textregionen__textregion_), [text line](https://ocr-d.de/en/gt-guidelines/pagexml/pagecontent_xsd_Complex_Type_pc_TextLineType.html), [header](https://ocr-d.de/en/gt-guidelines/trans/lyUeberschrift.html), [image](https://ocr-d.de/en/gt-guidelines/trans/lyBildbereiche.html), [separator](https://ocr-d.de/en/gt-guidelines/trans/lySeparatoren.html), [marginalia](https://ocr-d.de/en/gt-guidelines/trans/lyMarginalie.html), [initial](https://ocr-d.de/en/gt-guidelines/trans/lyInitiale.html), [table](https://ocr-d.de/en/gt-guidelines/trans/lyTabellen.html)
* Support for various image optimization operations:
* cropping (border detection), binarization, deskewing, dewarping, scaling, enhancing, resizing
* Text line segmentation to bounding boxes or polygons (contours) including for curved lines and vertical text
* Detection of reading order (left-to-right or right-to-left)
* Textline segmentation to bounding boxes or polygons (contours) including for curved lines and vertical text
* Text recognition (OCR) using either CNN-RNN or Transformer models
* Detection of reading order (left-to-right or right-to-left) using either heuristics or trainable models
* Output in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML)
* [OCR-D](https://github.com/qurator-spk/eynollah#use-as-ocr-d-processor) interface

:warning: Development is currently focused on achieving the best possible quality of results for a wide variety of
historical documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome.
:warning: Development is focused on achieving the best quality of results for a wide variety of historical
documents and therefore processing can be very slow. We aim to improve this, but contributions are welcome.

## Installation

Python `3.8-3.11` with Tensorflow `<2.13` on Linux are currently supported.

For (limited) GPU support the CUDA toolkit needs to be installed.
For (limited) GPU support the CUDA toolkit needs to be installed. A known working config is CUDA `11` with cuDNN `8.6`.

You can either install from PyPI

Expand All @@ -53,26 +54,30 @@ make install EXTRAS=OCR
```

## Models

Pretrained models can be downloaded from [zenodo](https://zenodo.org/records/17194824) or [huggingface](https://huggingface.co/SBB?search_models=eynollah).

For documentation on methods and models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md).
For documentation on models, have a look at [`models.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/models.md).
Model cards are also provided for our trained models.

## Train
## Training

In case you want to train your own model with Eynollah, have a look at [`train.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/train.md).
In case you want to train your own model with Eynollah, see the
documentation in [`train.md`](https://github.com/qurator-spk/eynollah/tree/main/docs/train.md) and use the
tools in the [`train` folder](https://github.com/qurator-spk/eynollah/tree/main/train).

## Usage

Eynollah supports five use cases: layout analysis (segmentation), binarization,
image enhancement, text recognition (OCR), and (trainable) reading order detection.
image enhancement, text recognition (OCR), and reading order detection.

### Layout Analysis

The layout analysis module is responsible for detecting layouts, identifying text lines, and determining reading order
using both heuristic methods or a machine-based reading order detection model.
The layout analysis module is responsible for detecting layout elements, identifying text lines, and determining reading
order using either heuristic methods or a [pretrained reading order detection model](https://github.com/qurator-spk/eynollah#machine-based-reading-order).

Note that there are currently two supported ways for reading order detection: either as part of layout analysis based
on image input, or, currently under development, for given layout analysis results based on PAGE-XML data as input.
Reading order detection can be performed either as part of layout analysis based on image input, or, currently under
development, based on pre-existing layout analysis results in PAGE-XML format as input.

The command-line interface for layout analysis can be called like this:

Expand Down Expand Up @@ -105,15 +110,15 @@ The following options can be used to further configure the processing:
| `-sp <directory>` | save cropped page image to this directory |
| `-sa <directory>` | save all (plot, enhanced/binary image, layout) to this directory |

If no option is set, the tool performs layout detection of main regions (background, text, images, separators
If no further option is set, the tool performs layout detection of main regions (background, text, images, separators
and marginals).
The best output quality is produced when RGB images are used as input rather than greyscale or binarized images.
The best output quality is achieved when RGB images are used as input rather than greyscale or binarized images.

### Binarization

The binarization module performs document image binarization using pretrained pixelwise segmentation models.

The command-line interface for binarization of single image can be called like this:
The command-line interface for binarization can be called like this:

```sh
eynollah binarization \
Expand All @@ -124,16 +129,16 @@ eynollah binarization \

### OCR

The OCR module performs text recognition from images using two main families of pretrained models: CNN-RNN–based OCR and Transformer-based OCR.
The OCR module performs text recognition using either a CNN-RNN model or a Transformer model.

The command-line interface for ocr can be called like this:
The command-line interface for OCR can be called like this:

```sh
eynollah ocr \
-i <single image file> | -di <directory containing image files> \
-dx <directory of xmls> \
-o <output directory> \
-m <path to directory containing model files> | --model_name <path to specific model> \
-m <directory containing model files> | --model_name <path to specific model> \
```

### Machine-based-reading-order
Expand Down Expand Up @@ -169,22 +174,20 @@ If the input file group is PAGE-XML (from a previous OCR-D workflow step), Eynol
(because some other preprocessing step was in effect like `denoised`), then
the output PAGE-XML will be based on that as new top-level (`@imageFilename`)

ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0
ocrd-eynollah-segment -I OCR-D-XYZ -O OCR-D-SEG -P models eynollah_layout_v0_5_0

Still, in general, it makes more sense to add other workflow steps **after** Eynollah.
In general, it makes more sense to add other workflow steps **after** Eynollah.

There is also an OCR-D processor for the binarization:
There is also an OCR-D processor for binarization:

ocrd-sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P models default-2021-03-09

#### Additional documentation

Please check the [wiki](https://github.com/qurator-spk/eynollah/wiki).
Additional documentation is available in the [docs](https://github.com/qurator-spk/eynollah/tree/main/docs) directory.

## How to cite

If you find this tool useful in your work, please consider citing our paper:

```bibtex
@inproceedings{hip23rezanezhad,
title = {Document Layout Analysis with Deep Learning and Heuristics},
Expand Down
Loading