Blood-Tests-Extractor

This tool extracts blood test results from PDF or image files.

It provides an HTTP service that accepts a PDF via a POST request and returns a JSON response containing detailed information for each analysis result.

In addition to its HTTP service, it can be used locally for experimentation. In this mode, the tool processes a PDF file and generates an HTML file displaying the extracted tables and debugging information.

The extraction of tables from PDFs or images is powered by the img2table library.

This service is used by Blood-Tests-App, a web application for digitalizing, managing, and comparing collections of blood test results.

Example

From a PDF containing

with the HTTP request

    curl -F file=@examples/input/checkup-2025-01-15.pdf http://localhost:8000/blood-test-pdf | jq

you get the JSON response:

[
  {
    "name": "GLOBULI BIANCHI",
    "value": 6.73,
    "unit": "x10^3/μl",
    "reference_lower": 4,
    "reference_upper": 9.5
  },
  {
    "name": "GLOBULI ROSSI",
    "value": 7.22,
    "unit": "x10^6/μl",
    "reference_lower": 4.7,
    "reference_upper": 5.82
  },
  {
    "name": "COLESTEROLO HDL",
    "value": 63,
    "unit": "mg/dl",
    "reference_lower": 40,
    "reference_upper": null
  }
]

From the command line, executing

python -m src.command_line examples/input/checkup-2025-01-15.pdf

it generates an HTML containing:

Development

Install the dependencies with:

poetry install --with development

Run an extraction of the analysis tables from a PDF:

Copy a PDF with a blood test to examples/input
Run source .venv/bin/activate to use the poetry virtual environment
Run python -m src.command_line examples/input/checkup-123.pdf (There is an example pdf file in tests/data/fake-blood-check.pdf)
Look at the generated HTML file(s) in examples/output

Run the tests and coverage with:

coverage run -m unittest

Generate the coverage HTML report with:

coverage html

Run the http server with:

uvicorn src.http_api.main:app --reload

The code is formatted using black. Either configure the IDE to use it or run black src/ tests/.

TODO list

Recognise the language first. In this way analysis name, decimal numbers, unit measure can be recognized more accurately.
Recognise the type of column through machine learning instead of using the fixed AnalysisTable.CONFIDENCE_THRESHOLD

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
examples/output_templates		examples/output_templates
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blood-Tests-Extractor

Example

Development

TODO list

About

Uh oh!

Releases

Packages

Languages

License

robertoz-01/blood-tests-extractor

Folders and files

Latest commit

History

Repository files navigation

Blood-Tests-Extractor

Example

Development

TODO list

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages