This project is a Flask-based web application that extracts structured data from service information pages of the Flemish government. The application uses the LangChain library and the OllamaLLM model to process the input text and extract relevant information, such as the cost of the service and the organization responsible for it.
To run this application, you'll need to have the following dependencies installed:
- Python 3.7 or later
- Flask
- Pydantic
- LangChain
- OllamaLLM
You can install these dependencies using pip:
pip install flask pydantic langchain langchain-ollama
The application provides two endpoints:
/extract_cost/
: This endpoint takes aninput_text
parameter and returns a JSON response containing the extracted cost of the service.
Example usage:
curl -X POST -H "Content-Type: application/json" -d '{"input_text": "Twintig euro te betalen bij de aanvraag of bij het afhalen wanneer je het voorlopig rijbewijs online hebt aangevraagd."}' http://localhost:8080/extract_cost/
Response:
{
"cost": 20.0,
"cost_string": "twintig euro"
}
/extract_organisation/
: This endpoint takes aninput_text
parameter and returns a JSON response containing the extracted list of organizations which appears in text.
Example usage:
curl -X POST -H "Content-Type: application/json" -d '{"input_text": "Je hebt, als je voeding wilt verkopen, een registratie, erkenning of toelating van het Federaal Agentschap voor de Veiligheid van de Voedselketen (FAVV)."}' http://localhost:8080/extract_organisation/
Response:
{
"organisations_list": [
"Federaal Agentschap voor de Veiligheid van de Voedselketen",
"FAVV"
],
"organisations_list_string": ["Je hebt, als je voeding wilt verkopen, een registratie, erkenning of toelating van het Federaal Agentschap voor de Veiligheid van de Voedselketen (FAVV)."]
}
The code is organized as follows:
web.py
: This file contains the FastAPI application and the two endpoints for extracting cost and organization information.InputText
: This is a Pydantic model that defines the input data structure for the endpoints.CostExtractor
andOrganisationExtractor
: These are Pydantic models that define the output data structure for the respective endpoints.shared_model
: This is an instance of theOllamaLLM
model, which is used to process the input text.system_prompt
: This is aPromptTemplate
object that defines the prompt used to extract the desired information from the input text.cost_parser
andorganisation_parser
: These arePydanticOutputParser
objects that are used to parse the model's output and convert it to the desired Pydantic model.chain
: This is a LangChain "chain" that combines the prompt, model, and output parser to perform the extraction task.
- Performance Optimization: The application uses the
OllamaLLM
model, which is a high-performance language model that can process text efficiently. - Security Considerations: The application uses Pydantic models to validate the input and output data, which helps to ensure that the application is secure and robust.
If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request on the project's GitHub repository.
This project is licensed under the MIT License.