LLMOCR uses a local LLM to read text from images.
You can also change the instruction to have the LLM use the image in the way that you prompt.
- Local Processing: All processing is done locally on your machine.
- User-Friendly GUI: Includes a GUI. Relies on Koboldcpp, a single executable, for all AI functionality.
- GPU Acceleration: Will use Apple Metal, Nvidia CUDA, or AMD (Vulkan) hardware if available to greatly speed inference.
- Cross-Platform: Supports Windows, macOS ARM, and Linux.
- Python 3.8 or higher
- KoboldCPP
-
Clone the repository or download the ZIP file and extract it.
-
Install Python for Windows.
-
Download KoboldCPP.exe and place it in the LLMOCR folder. If it is not named KoboldCPP.exe, rename it to KoboldCPP.exe
-
If you want the script to download a model for you and have KoboldCpp run it for you, open
llm_ocr.bat
-
If you want to load your own model using KoboldCpp, open
llm_ocr_no_kobold.bat
-
Clone the repository or download and extract the ZIP file.
-
Install Python 3.8 or higher if not already installed.
-
Create a new python env and install the requirements.txt.
-
Run kobold with flag --config llm-ocr.kcppt
-
Wait until the model weights finish downloading and the terminal window says
Please connect to custom endpoint at http://localhost:5001
-
Run llm-ocr-gui.py using Python.
This project is licensed under the MIT License - see the LICENSE file for details.