This repository contains the source code and resources used in a tutorial video hosted on YouTube (in Spanish), demonstrating how to extract information from PDF invoices using a Local Language Model (LLM). The tutorial is designed to provide hands-on experience with setting up a local LLM server, processing PDFs, and extracting data efficiently using R.
In this tutorial, you'll learn the following:
- Introduction to LM Studio: Get familiar with the LM Studio, a tool that allows you to run language models locally.
- Installing a Local Model: Steps to download and install a local language model suitable for information extraction tasks.
- Running a Local Server: Set up and launch a local server that can handle API requests for text processing.
- Processing PDF Invoices: Convert PDF invoices into text and use the local language model to extract relevant information.
- Storing Extracted Data: Save the extracted data in either CSV or Excel formats for further analysis.
- Cost-Effective: Running models locally eliminates cloud costs.
- Privacy: Your data stays on your machine, ensuring confidentiality.
- Offline Capability: Once set up, the system can operate without an internet connection.
- Hardware: Sufficient computing power to run LLMs locally.
- Software:
- R Programming Language
- PDF processing libraries
- LM Studio
- Model Loading Issues: Ensure that the model is correctly installed in LM Studio.
- Server Connectivity: Verify that the local server is running and accessible.
- Dependencies: Make sure all required R packages are installed
Feel free to fork this repository and submit pull requests. Contributions are welcome, especially in improving the extraction accuracy and expanding functionality.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback, please contact:
Dr. José Manuel Galán Ordax