Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
09ece86
dilate_textregions_contours: simplify (via shapely's Polygon.buffer()…
Aug 19, 2025
b48c41e
return_boxes_of_images_by_order_of_reading_new: simplify, avoid chang…
Aug 19, 2025
66b2bce
return_boxes_of_images_by_order_of_reading_new: log any exceptions
Sep 19, 2025
afba70c
separate_lines/do_work_of_slopes: skip if crop is empty
Aug 19, 2025
41cc38c
get_textregion_contours_in_org_image_light: no back rotation, drop sl…
Aug 20, 2025
7b51fd6
avoid creating invalid polygons via rounding
Aug 20, 2025
e730725
check_any_text_region_in_model_one_is_main_or_header_light: return or…
Aug 20, 2025
17bcf1a
rename *lines_xml → *seplines for clarity
Aug 20, 2025
a433c73
filter_contours_area_of_image*: also ensure validity here
Aug 20, 2025
0650274
move dilate_*_contours to .utils.contour, rename dilate_textregions_c…
Aug 20, 2025
f3faa29
refactor shapely converisons into contour2polygon / polygon2contour, …
Aug 21, 2025
7a9e825
increase dilatation: textregions/lines (5→6), seplines (0→1)
Aug 21, 2025
11e143a
polygon2contour: avoid overflow
Aug 29, 2025
235539a
filter_contours_without_textline_inside: avoid removing from identica…
Aug 29, 2025
bca2ae3
get_marginals: exit early if no peaks found to avoid spurious overlap…
Aug 29, 2025
9b5182c
utils: introduce box2rect and box2slice
Aug 26, 2025
5bff2d1
use box2rect instead of crop_image_inside_box when no image needed
Aug 26, 2025
5b16c2f
avoid pulling unused 'image_page_rotated' through functions
Aug 26, 2025
4337d62
contours: rename 'pixel' → 'label' for clarity
Aug 26, 2025
f458e3e
writer: SeparatorRegion needs SeparatorRegionType (not ImageRegionType)
Aug 26, 2025
dc0caad
writer: use @type='heading' instead of 'header'
Aug 26, 2025
abf5c0f
get_smallest_skew: when shifting search range of rotation angle, comp…
Sep 2, 2025
8be2c79
Revert "deskewing with faster multiprocessing"
Sep 3, 2025
31f240c
do_image_rotation, do_work_of_slopes_new_curved: pass arrays via shar…
Sep 2, 2025
0662ece
do_work_of_slopes*: use shm also in non-light mode(s)
Sep 4, 2025
04c3d7d
get_smallest_skew: avoid shm if no ProcessPoolExecutor is passed
Sep 18, 2025
b94c96f
find_num_col: exit early if empty (avoiding exceptions)
Sep 19, 2025
0366707
get_smallest_skew: do not pass logger
Sep 19, 2025
7586024
replace loky with concurrent.futures.ProcessPoolExecutor (faster)
Sep 21, 2025
13f85b0
Merge branch 'main' into loky-with-shm-for-175-rebuilt
Sep 30, 2025
c0137c2
try to fix the failed outsourcing of utils_ocr
Sep 30, 2025
f857ee7
simplify
Sep 19, 2025
08c8c26
indent extremely long lines
Sep 30, 2025
b21051d
ProcessPoolExecutor: shutdown during del() instead of atexit()
Sep 30, 2025
375e026
CNN-RNN OCR model: switch to 20250930 version (compatible with TF 2.1…
Sep 30, 2025
61b20cc
tests: switch from subtests to parametrize, use --isolate everywhere …
Sep 30, 2025
a3d8197
makefile: update model URL
Sep 30, 2025
c86e59f
CI: update model key, split up cache restore/save
Sep 30, 2025
ad129ed
CI: remove OS from model cache keys
Sep 30, 2025
7daec39
Dockerfile: fix up CUDA installation for mixed TF/Torch
Sep 30, 2025
f0de1ad
rm loky dependency
Sep 30, 2025
3aa7ad0
:memo: update changelog
Sep 30, 2025
0b9d490
contour features: avoid unused calculations, simplify, add shortcuts
Oct 2, 2025
81827c2
filter_contours_inside_a_bigger_one: simplify
Oct 2, 2025
8c3d5eb
separate_marginals_to_left_and_right_and_order_from_top_to_down: simp…
Oct 2, 2025
3f3353e
do_order_of_regions: simplify
Oct 2, 2025
415b2cb
eynollah, drop_capitals: simplify
Oct 2, 2025
a1c8fd4
do_order_of_regions / order_of_regions: simplify
Oct 2, 2025
4950e6b
order_of_regions: simplify
Oct 2, 2025
7387f5a
do_order_of_regions: improve box matching, simplify
Oct 2, 2025
e9bb62b
do_order_of_regions: simplify
Oct 2, 2025
e674ea0
do_order_of_regions: drop redundant no/full_layout
Oct 2, 2025
29b4527
do_order_of_regions: simplify
Oct 3, 2025
d774a23
matching deskewed text region contours with predicted: simplify
Oct 5, 2025
73e5a1d
matching deskewed text region contours with predicted: simplify
Oct 5, 2025
0f33c21
matching deskewed text region contours with predicted: improve
Oct 5, 2025
0e00d78
matching deskewed text region contours with predicted: improve
Oct 6, 2025
155b8f6
matching deskewed text region contours with predicted: improve
Oct 6, 2025
fe60318
avoid unnecessary 3-channel conversions
Oct 6, 2025
6e57ab3
textline_contours_postprocessing: do not catch arbitrary exceptions
Oct 6, 2025
595ed02
run_single: simplify; allow running TrOCR in non-fl mode, too
Oct 6, 2025
a1904fa
tests: cover layout with OCR in various modes
Oct 6, 2025
2353599
tests: symlink OCR models into layout model directory
Oct 6, 2025
18bbdb7
CI: run deps-test with OCR extra so symlink rule fires
Oct 6, 2025
d53f829
filter_contours_inside_a_bigger_one: fix edge case in 81827c29
Oct 7, 2025
2e90787
get_text_region_boxes_by_given_contours: simplify
Oct 7, 2025
dfdc705
do_work_of_slopes: rm unused old variant
Oct 7, 2025
0a80cd5
avoid unnecessary 3-channel conversions: for tables, too
Oct 7, 2025
fd43e78
filter_contours_without_textline_inside: simplify
Oct 7, 2025
02a347a
no more need to rm from `contours_only_text_parent_d_ordered` now
Oct 7, 2025
d88ca18
get/do_work_of_slopes etc.: reduce call/return signatures
Oct 7, 2025
e324797
writer: simplify
Oct 7, 2025
cbbb324
writer: simplify
Oct 7, 2025
75823f9
run_single: call `writer.build_pagexml_no_full_layout` w/ kwargs
Oct 7, 2025
5e11a68
writer/run_single: consistent kwarg naming `conf_contours_textregion(s)`
Oct 7, 2025
ca72a09
tests: cover table detection in various modes
Oct 7, 2025
e5b5264
CI: add diagnostic message for model symlink
Oct 8, 2025
839b7c4
make models: avoid re-download
Oct 8, 2025
1d4815b
utils_ocr: forgot to pass coordinate offsets
Oct 8, 2025
027b87d
fixup c0137c2 (missing arguments for utils_ocr)
Oct 8, 2025
096def1
mbreorder/enhancment: fix missing imports
Oct 8, 2025
8a2d682
fix identifier scope in layout OCR options (w/o full_layout)
Oct 8, 2025
b3d29be
return_contours_of_interested_region*: rm unused variants
Oct 8, 2025
a144026
add rough ruff config
Oct 8, 2025
e1b56d9
CI: lint with ruff
Oct 8, 2025
cab3926
:memo: update changelog
Oct 9, 2025
d96af42
Merge pull request #4 from bertsky/loky-with-shm-for-175-rebuilt-refa…
bertsky Oct 9, 2025
ecb5305
Merge branch 'main' of https://github.com/qurator-spk/eynollah into l…
Oct 9, 2025
c4cb16c
simplify
Oct 9, 2025
374818d
:memo: update changelog for 5725e4f
Oct 9, 2025
4e9a161
layout: refactor model setup, allow loading custom versions
Oct 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 28 additions & 8 deletions .github/workflows/test-eynollah.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,24 +24,39 @@ jobs:
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
df -h
- uses: actions/checkout@v4
- uses: actions/cache@v4
- uses: actions/cache/restore@v4
id: seg_model_cache
with:
path: models_layout_v0_5_0
key: ${{ runner.os }}-models
- uses: actions/cache@v4
key: seg-models
- uses: actions/cache/restore@v4
id: ocr_model_cache
with:
path: models_ocr_v0_5_0
key: ${{ runner.os }}-models
- uses: actions/cache@v4
path: models_ocr_v0_5_1
key: ocr-models
- uses: actions/cache/restore@v4
id: bin_model_cache
with:
path: default-2021-03-09
key: ${{ runner.os }}-modelbin
key: bin-models
- name: Download models
if: steps.seg_model_cache.outputs.cache-hit != 'true' || steps.bin_model_cache.outputs.cache-hit != 'true' || steps.ocr_model_cache.outputs.cache-hit != true
run: make models
- uses: actions/cache/save@v4
if: steps.seg_model_cache.outputs.cache-hit != 'true'
with:
path: models_layout_v0_5_0
key: seg-models
- uses: actions/cache/save@v4
if: steps.ocr_model_cache.outputs.cache-hit != 'true'
with:
path: models_ocr_v0_5_1
key: ocr-models
- uses: actions/cache/save@v4
if: steps.bin_model_cache.outputs.cache-hit != 'true'
with:
path: default-2021-03-09
key: bin-models
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
Expand All @@ -50,7 +65,12 @@ jobs:
run: |
python -m pip install --upgrade pip
make install-dev EXTRAS=OCR,plotting
make deps-test
make deps-test EXTRAS=OCR,plotting
ls -l models_*
- name: Lint with ruff
uses: astral-sh/ruff-action@v3
with:
src: "./src"
- name: Test with pytest
run: make coverage PYTEST_ARGS="-vv --junitxml=pytest.xml"
- name: Get coverage results
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
__pycache__
sbb_newspapers_org_image/pylint.log
models_eynollah*
models_ocr*
models_layout*
default-2021-03-09
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for me: Rename the binarization model to include the version as well.

output.html
/build
/dist
*.tif
TAGS
49 changes: 49 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,55 @@ Versioned according to [Semantic Versioning](http://semver.org/).

## Unreleased

Fixed:

* continue processing when no columns detected but text regions exist
* convert marginalia to main text if no main text is present
* reset deskewing angle to 0° when text covers <30% image area and detected angle >45°
* :fire: polygons: avoid invalid paths (use `Polygon.buffer()` instead of dilation etc.)
* `return_boxes_of_images_by_order_of_reading_new`: avoid Numpy.dtype mismatch, simplify
* `return_boxes_of_images_by_order_of_reading_new`: log any exceptions instead of ignoring
* `filter_contours_without_textline_inside`: avoid removing from duplicate lists twice
* `get_marginals`: exit early if no peaks found to avoid spurious overlap mask
* `get_smallest_skew`: after shifting search range of rotation angle, use overall best result
* Dockerfile: fix CUDA installation (cuDNN contested between Torch and TF due to extra OCR)
* OCR: re-instate missing methods and fix `utils_ocr` function calls
* mbreorder/enhancement CLIs: missing imports
* :fire: writer: `SeparatorRegion` needs `SeparatorRegionType` (not `ImageRegionType`)
f458e3e
* tests: switch from `pytest-subtests` to `parametrize` so we can use `pytest-isolate`
(so CUDA memory gets freed between tests if running on GPU)

Added:
* :fire: `layout` CLI: new option `--model_version` to override default choices
* test coverage for OCR options in `layout`
* test coverage for table detection in `layout`
* CI linting with ruff

Changed:

* polygons: slightly widen for regions and lines, increase for separators
* various refactorings, some code style and identifier improvements
* deskewing/multiprocessing: switch back to ProcessPoolExecutor (faster),
but use shared memory if necessary, and switch back from `loky` to stdlib,
and shutdown in `del()` instead of `atexit`
* :fire: OCR: switch CNN-RNN model to `20250930` version compatible with TF 2.12 on CPU, too
* OCR: allow running `-tr` without `-fl`, too
* :fire: writer: use `@type='heading'` instead of `'header'` for headings
* :fire: performance gains via refactoring (simplification, less copy-code, vectorization,
avoiding unused calculations, avoiding unnecessary 3-channel image operations)
* :fire: heuristic reading order detection: many improvements
- contour vs splitter box matching:
* contour must be contained in box exactly instead of heuristics
* make fallback center matching, center must be contained in box
- original vs deskewed contour matching:
* same min-area filter on both sides
* similar area score in addition to center proximity
* avoid duplicate and missing mappings by allowing N:M
matches and splitting+joining where necessary
* CI: update+improve model caching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @vahidrezanezhad something along those lines would be very helpful for #186



## [0.5.0] - 2025-09-26

Fixed:
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ RUN ocrd ocrd-tool ocrd-tool.json dump-tools > $(dirname $(ocrd bashlib filename
RUN ocrd ocrd-tool ocrd-tool.json dump-module-dirs > $(dirname $(ocrd bashlib filename))/ocrd-all-module-dir.json
# install everything and reduce image size
RUN make install EXTRAS=OCR && rm -rf /build/eynollah
# fixup for broken cuDNN installation (Torch pulls in 8.5.0, which is incompatible with Tensorflow)
RUN pip install nvidia-cudnn-cu11==8.6.0.163
# smoke test
RUN eynollah --help

Expand Down
72 changes: 42 additions & 30 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,18 @@ DOCKER ?= docker
#SEG_MODEL := https://github.com/qurator-spk/eynollah/releases/download/v0.3.0/models_eynollah.tar.gz
#SEG_MODEL := https://github.com/qurator-spk/eynollah/releases/download/v0.3.1/models_eynollah.tar.gz
SEG_MODEL := https://zenodo.org/records/17194824/files/models_layout_v0_5_0.tar.gz?download=1
SEG_MODELFILE = $(notdir $(patsubst %?download=1,%,$(SEG_MODEL)))
SEG_MODELNAME = $(SEG_MODELFILE:%.tar.gz=%)

BIN_MODEL := https://github.com/qurator-spk/sbb_binarization/releases/download/v0.0.11/saved_model_2021_03_09.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for me: Replace with the zenodo location.

BIN_MODELFILE = $(notdir $(BIN_MODEL))
BIN_MODELNAME := default-2021-03-09

OCR_MODEL := https://zenodo.org/records/17194824/files/models_ocr_v0_5_0.tar.gz?download=1
OCR_MODEL := https://zenodo.org/records/17236998/files/models_ocr_v0_5_1.tar.gz?download=1
OCR_MODELFILE = $(notdir $(patsubst %?download=1,%,$(OCR_MODEL)))
OCR_MODELNAME = $(OCR_MODELFILE:%.tar.gz=%)

PYTEST_ARGS ?= -vv
PYTEST_ARGS ?= -vv --isolate

# BEGIN-EVAL makefile-parser --make-help Makefile

Expand All @@ -31,7 +37,8 @@ help:
@echo " install Install package with pip"
@echo " install-dev Install editable with pip"
@echo " deps-test Install test dependencies with pip"
@echo " models Download and extract models to $(CURDIR)/models_layout_v0_5_0"
@echo " models Download and extract models to $(CURDIR):"
@echo " $(BIN_MODELNAME) $(SEG_MODELNAME) $(OCR_MODELNAME)"
@echo " smoke-test Run simple CLI check"
@echo " ocrd-test Run OCR-D CLI check"
@echo " test Run unit tests"
Expand All @@ -42,33 +49,32 @@ help:
@echo " PYTEST_ARGS pytest args for 'test' (Set to '-s' to see log output during test execution, '-vv' to see individual tests. [$(PYTEST_ARGS)]"
@echo " SEG_MODEL URL of 'models' archive to download for segmentation 'test' [$(SEG_MODEL)]"
@echo " BIN_MODEL URL of 'models' archive to download for binarization 'test' [$(BIN_MODEL)]"
@echo " OCR_MODEL URL of 'models' archive to download for binarization 'test' [$(OCR_MODEL)]"
@echo ""

# END-EVAL


# Download and extract models to $(PWD)/models_layout_v0_5_0
models: models_layout_v0_5_0 models_ocr_v0_5_0 default-2021-03-09
models: $(BIN_MODELNAME) $(SEG_MODELNAME) $(OCR_MODELNAME)

models_layout_v0_5_0: models_layout_v0_5_0.tar.gz
tar zxf models_layout_v0_5_0.tar.gz
# do not download these files if we already have the directories
.INTERMEDIATE: $(BIN_MODELFILE) $(SEG_MODELFILE) $(OCR_MODELFILE)

models_layout_v0_5_0.tar.gz:
$(BIN_MODELFILE):
wget -O $@ $(BIN_MODEL)
$(SEG_MODELFILE):
wget -O $@ $(SEG_MODEL)

models_ocr_v0_5_0: models_ocr_v0_5_0.tar.gz
tar zxf models_ocr_v0_5_0.tar.gz

models_ocr_v0_5_0.tar.gz:
$(OCR_MODELFILE):
wget -O $@ $(OCR_MODEL)

default-2021-03-09: $(notdir $(BIN_MODEL))
unzip $(notdir $(BIN_MODEL))
$(BIN_MODELNAME): $(BIN_MODELFILE)
mkdir $@
mv $(basename $(notdir $(BIN_MODEL))) $@

$(notdir $(BIN_MODEL)):
wget $(BIN_MODEL)
unzip -d $@ $<
$(SEG_MODELNAME): $(SEG_MODELFILE)
tar zxf $<
$(OCR_MODELNAME): $(OCR_MODELFILE)
tar zxf $<

build:
$(PIP) install build
Expand All @@ -82,28 +88,34 @@ install:
install-dev:
$(PIP) install -e .$(and $(EXTRAS),[$(EXTRAS)])

deps-test: models_layout_v0_5_0
ifeq (OCR,$(findstring OCR, $(EXTRAS)))
deps-test: $(OCR_MODELNAME)
endif
deps-test: $(BIN_MODELNAME) $(SEG_MODELNAME)
$(PIP) install -r requirements-test.txt
ifeq (OCR,$(findstring OCR, $(EXTRAS)))
ln -rs $(OCR_MODELNAME)/* $(SEG_MODELNAME)/
endif

smoke-test: TMPDIR != mktemp -d
smoke-test: tests/resources/kant_aufklaerung_1784_0020.tif
# layout analysis:
eynollah layout -i $< -o $(TMPDIR) -m $(CURDIR)/models_layout_v0_5_0
eynollah layout -i $< -o $(TMPDIR) -m $(CURDIR)/$(SEG_MODELNAME)
fgrep -q http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 $(TMPDIR)/$(basename $(<F)).xml
fgrep -c -e TextRegion -e ImageRegion -e SeparatorRegion $(TMPDIR)/$(basename $(<F)).xml
# layout, directory mode (skip one, add one):
eynollah layout -di $(<D) -o $(TMPDIR) -m $(CURDIR)/models_layout_v0_5_0
eynollah layout -di $(<D) -o $(TMPDIR) -m $(CURDIR)/$(SEG_MODELNAME)
test -s $(TMPDIR)/euler_rechenkunst01_1738_0025.xml
# mbreorder, directory mode (overwrite):
eynollah machine-based-reading-order -di $(<D) -o $(TMPDIR) -m $(CURDIR)/models_layout_v0_5_0
eynollah machine-based-reading-order -di $(<D) -o $(TMPDIR) -m $(CURDIR)/$(SEG_MODELNAME)
fgrep -q http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 $(TMPDIR)/$(basename $(<F)).xml
fgrep -c -e RegionRefIndexed $(TMPDIR)/$(basename $(<F)).xml
# binarize:
eynollah binarization -m $(CURDIR)/default-2021-03-09 -i $< -o $(TMPDIR)/$(<F)
eynollah binarization -m $(CURDIR)/$(BIN_MODELNAME) -i $< -o $(TMPDIR)/$(<F)
test -s $(TMPDIR)/$(<F)
@set -x; test "$$(identify -format '%w %h' $<)" = "$$(identify -format '%w %h' $(TMPDIR)/$(<F))"
# enhance:
eynollah enhancement -m $(CURDIR)/models_layout_v0_5_0 -sos -i $< -o $(TMPDIR) -O
eynollah enhancement -m $(CURDIR)/$(SEG_MODELNAME) -sos -i $< -o $(TMPDIR) -O
test -s $(TMPDIR)/$(<F)
@set -x; test "$$(identify -format '%w %h' $<)" = "$$(identify -format '%w %h' $(TMPDIR)/$(<F))"
$(RM) -r $(TMPDIR)
Expand All @@ -114,18 +126,18 @@ ocrd-test: tests/resources/kant_aufklaerung_1784_0020.tif
cp $< $(TMPDIR)
ocrd workspace -d $(TMPDIR) init
ocrd workspace -d $(TMPDIR) add -G OCR-D-IMG -g PHYS_0020 -i OCR-D-IMG_0020 $(<F)
ocrd-eynollah-segment -w $(TMPDIR) -I OCR-D-IMG -O OCR-D-SEG -P models $(CURDIR)/models_layout_v0_5_0
ocrd-eynollah-segment -w $(TMPDIR) -I OCR-D-IMG -O OCR-D-SEG -P models $(CURDIR)/$(SEG_MODELNAME)
result=$$(ocrd workspace -d $(TMPDIR) find -G OCR-D-SEG); \
fgrep -q http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 $(TMPDIR)/$$result && \
fgrep -c -e TextRegion -e ImageRegion -e SeparatorRegion $(TMPDIR)/$$result
ocrd-sbb-binarize -w $(TMPDIR) -I OCR-D-IMG -O OCR-D-BIN -P model $(CURDIR)/default-2021-03-09
ocrd-sbb-binarize -w $(TMPDIR) -I OCR-D-SEG -O OCR-D-SEG-BIN -P model $(CURDIR)/default-2021-03-09 -P operation_level region
ocrd-sbb-binarize -w $(TMPDIR) -I OCR-D-IMG -O OCR-D-BIN -P model $(CURDIR)/$(BIN_MODELNAME)
ocrd-sbb-binarize -w $(TMPDIR) -I OCR-D-SEG -O OCR-D-SEG-BIN -P model $(CURDIR)/$(BIN_MODELNAME) -P operation_level region
$(RM) -r $(TMPDIR)

# Run unit tests
test: export MODELS_LAYOUT=$(CURDIR)/models_layout_v0_5_0
test: export MODELS_OCR=$(CURDIR)/models_ocr_v0_5_0
test: export MODELS_BIN=$(CURDIR)/default-2021-03-09
test: export MODELS_LAYOUT=$(CURDIR)/$(SEG_MODELNAME)
test: export MODELS_OCR=$(CURDIR)/$(OCR_MODELNAME)
test: export MODELS_BIN=$(CURDIR)/$(BIN_MODELNAME)
test:
$(PYTHON) -m pytest tests --durations=0 --continue-on-collection-errors $(PYTEST_ARGS)

Expand Down
18 changes: 18 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,21 @@ where = ["src"]
[tool.coverage.run]
branch = true
source = ["eynollah"]

[tool.ruff]
line-length = 120

[tool.ruff.lint]
ignore = [
# disable unused imports
"F401",
# disable import order
"E402",
# disable unused variables
"F841",
# disable bare except
"E722",
]

[tool.ruff.format]
quote-style = "preserve"
2 changes: 1 addition & 1 deletion requirements-test.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
pytest
pytest-subtests
pytest-isolate
coverage[toml]
black
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,4 @@ scikit-learn >= 0.23.2
tensorflow < 2.13
numba <= 0.58.1
scikit-image
loky
biopython
10 changes: 9 additions & 1 deletion src/eynollah/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,13 @@ def enhancement(image, out, overwrite, dir_in, model, num_col_upper, num_col_low
type=click.Path(exists=True, file_okay=False),
required=True,
)
@click.option(
"--model_version",
"-mv",
help="override default versions of model categories",
type=(str, str),
multiple=True,
)
@click.option(
"--save_images",
"-si",
Expand Down Expand Up @@ -373,7 +380,7 @@ def enhancement(image, out, overwrite, dir_in, model, num_col_upper, num_col_low
help="Setup a basic console logger",
)

def layout(image, out, overwrite, dir_in, model, save_images, save_layout, save_deskewed, save_all, extract_only_images, save_page, enable_plotting, allow_enhancement, curved_line, textline_light, full_layout, tables, right2left, input_binary, allow_scaling, headers_off, light_version, reading_order_machine_based, do_ocr, transformer_ocr, batch_size_ocr, num_col_upper, num_col_lower, threshold_art_class_textline, threshold_art_class_layout, skip_layout_and_reading_order, ignore_page_extraction, log_level, setup_logging):
def layout(image, out, overwrite, dir_in, model, model_version, save_images, save_layout, save_deskewed, save_all, extract_only_images, save_page, enable_plotting, allow_enhancement, curved_line, textline_light, full_layout, tables, right2left, input_binary, allow_scaling, headers_off, light_version, reading_order_machine_based, do_ocr, transformer_ocr, batch_size_ocr, num_col_upper, num_col_lower, threshold_art_class_textline, threshold_art_class_layout, skip_layout_and_reading_order, ignore_page_extraction, log_level, setup_logging):
if setup_logging:
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.INFO)
Expand Down Expand Up @@ -404,6 +411,7 @@ def layout(image, out, overwrite, dir_in, model, save_images, save_layout, save_
assert bool(image) != bool(dir_in), "Either -i (single input) or -di (directory) must be provided, but not both."
eynollah = Eynollah(
model,
model_versions=model_version,
extract_only_images=extract_only_images,
enable_plotting=enable_plotting,
allow_enhancement=allow_enhancement,
Expand Down
Loading