A minuscule terminology service, powered by LLMs, designed to extract, validate, and enrich user-defined topic definitions sourced from Wikipedia—perfectly packaged for developers and AI enthusiasts.
AI-powered web service (built with FastAPI) that creates and manages a dictionary of financial terms. It automatically fetches definitions (primarily from Wikipedia with a user-defined topic-focused heuristics), uses Large Language Models (LLMs via litellm and instructor) to generate explanatory follow-up questions and validate definition accuracy, and can extract financial terms from text. New entries undergo a candidate review process before being added to the official terminus, ensuring quality control. The system uses SQLAlchemy for database persistence (defaulting to SQLite) and provides Docker support for easy deployment.
Quick Start Commands
- Run Locally (using Uvicorn): (Ensure you have created a .env file with necessary configurations, like LLM API keys)
# Set up environment variables
export $(cat .env | xargs)
# Install dependencies (using uv is recommended)
uv sync
# Run the FastAPI application
uvicorn terminus.app:app --host 0.0.0.0 --port 8000 --reloadNB: Alternatively, use a .env loader (like python-dotenv) or export variables manually depending on your shell.
- Run with Docker Compose: (Ensure you have created a .env file in the project root)
# Build and start the service defined in docker-compose.yml
docker-compose up --build -dAccess the API at http://localhost:8000/docs
Table of Contents:
- AI-Powered Terminology Service
- TL;DR
The terminus project is an asynchronous web service designed to build, manage, and serve a curated dictionary of (financial and economic) terms. It leverages Large Language Models and Wikipedia to automatically generate, refine, and validate definitions and related concepts, ensuring a degree of quality control through a candidate review process.
Core Objectives:
- Provide clear, concise, and factually validated definitions for financial terms.
- Generate contextually relevant follow-up questions to deepen user understanding.
- Identify and extract financial terms from unstructured text.
- Implement a workflow for reviewing and approving automatically generated or externally sourced term definitions before they become part of the official terminus.
- Offer a robust API for programmatic access to the terminus.
Key Features:
- Automated Definition Generation: Utilizes Wikipedia and validate them using LLMs to source initial definitions.
- LLM-Powered Follow-up Generation: Creates insightful follow-up questions based on the definition's content using
instructorandlitellm. - LLM-Based Validation: Use LLMs to critique the financial relevance of terms and validate the factual accuracy of definitions within a financial context.
- Candidate Workflow: Implements a two-stage system (
candidate_terminusandterminustables) where new entries are held for review before being promoted to the official, validated terminus. - Financial Term Extraction: Identifies potential financial terms within a given text block using LLM-based Named Entity Recognition (NER) followed by a critique step.
- Asynchronous API: Built with FastAPI for high-performance, non-blocking I/O operations.
- Database Persistence: Uses SQLAlchemy for ORM and database interaction (defaulting to SQLite).
- Containerized Deployment: Provides
Dockerfileanddocker-compose.ymlfor easy setup and deployment.
The service operates primarily through API endpoints, orchestrating interactions between the database, LLM services, and the Wikipedia service.
This is the primary user-facing endpoint for retrieving a term's definition. The logic follows a specific hierarchy to ensure quality and efficiency:
- Check Official terminus: The system first queries the
terminustable (viaTerminusService) for an existing, validated entry matching the requestedterm(case-insensitive). If found, this validated entry (terminusAnswer) is returned directly. - Check Candidate terminus: If the term is not in the official terminus, the system checks the
candidate_terminustable (viaCandidateTerminusService).- If a candidate entry exists, its details (
CandidateterminusAnswer, including status like "under_review" or "rejected") might be returned (the exact logic for returning candidates vs. generating new ones might need refinement based on desired UX).
- If a candidate entry exists, its details (
- Generate New Candidate (if necessary): If the term is found in neither table, or if regeneration is triggered:
- Fetch Definition: The
WikipediaServiceis queried asynchronously to find the most relevant, user-defined topic-focused (e.g. finance, physics) summary for the term. This service employs specific strategies:- Searching for
"{term} (user-defined topic)". - Standard Wikipedia search, prioritizing results containing financial keywords.
- Handling disambiguation pages by preferring user-defined topic)-related options.
- Falling back to a search including a context hint (
finance economics...).
- Searching for
- Generate Follow-ups: The fetched definition (or potentially a user-provided one via
terminusEntryCreate) is passed to theFUService(LLM). This service uses a specific prompt (FOLLOWUP_SYSTEM_MESSAGE,FOLLOWUP_USER_MESSAGE_TEMPLATE) and theterminusAnswerPydantic model (viainstructor) to generate a list ofFollowUpquestions based on sub-terms found within the definition. - Definition Validation: Includes a
DefinitionValidationService. This service is intended to be called here or before saving to candidates, using its specific LLM prompt (VALIDATION_SYSTEM_MESSAGE,VALIDATION_USER_MESSAGE_TEMPLATE) and theDefinitionValidationResultPydantic model to assess the fetched/generated definition's factual accuracy and assign a confidence score. - Save as Candidate: The term, fetched/generated definition, generated follow-ups, and initial status ("under_review") are saved to the
candidate_terminustable usingCandidateTerminusService. - Return Candidate: The newly created candidate entry details (
CandidateterminusAnswer) are returned to the user.
- Fetch Definition: The
This endpoint identifies financial terms within a given block of text:
- Initial Extraction: The input text is passed to the
FinancialTermExtractionService. A LLM call is made using a prompt focused on extracting potential financial/economic terms, structured according to theExtractedTermsPydantic model. - Critique/Validation: Each extracted term is then individually subjected to a second LLM call within the same service (
_critique_termmethod). This step uses a different prompt (critique_system_message,critique_user_message_template) and theTermCritiquePydantic model. The LLM acts as a domain expert to determine if the term is genuinely relevant to the user-defined topic. - Return Validated Terms: Only the terms that pass the critique step (i.e.,
is_relevantis true in theTermCritiqueresponse) are returned to the user as a list of strings.
These endpoints facilitate the review workflow:
- Get Candidate (
/candidate/{term}): Retrieves the details of a specific candidate entry (CandidateterminusAnswer) from thecandidate_terminustable. - Validate Candidate (
/candidate/validate): This is the crucial human-in-the-loop or automated approval step.- Input: Takes a
CandidateValidationpayload (term, approve flag, reason). - Logic:
- If
approveisTrue:- Retrieve the candidate entry data (
get_dictfromCandidateTerminusServiceis used here, likely to detach the object from the session before manipulating across services). - Save the data (term, definition, follow-ups) to the official
terminustable usingTerminusService. - Delete the entry from the
candidate_terminustable usingCandidateTerminusService.
- Retrieve the candidate entry data (
- If
approveisFalse:- Update the status of the entry in the
candidate_terminustable to "rejected" along with the providedreasonusingCandidateTerminusService.reject.
- Update the status of the entry in the
- If
- Return: Confirmation message.
- Input: Takes a
- Web Framework: FastAPI (for asynchronous API development)
- Data Validation/Serialization: Pydantic (used extensively for API models, LLM response structures, and settings)
- Database ORM: SQLAlchemy (for defining models and interacting with the database)
- Database Driver (Default):
sqlite-aiosqlite(for async SQLite access) - LLM Interaction:
instructor: For reliable structured output (Pydantic models) from LLMs.litellm: To interact with various LLM providers (e.g., Gemini viagemini/gemini-2.0-flash) through a unified interface.
- Wikipedia Access:
wikipedialibrary (wrapped for asynchronous execution). - Configuration:
pydantic-settings(for managing settings via environment variables and.envfiles). - Dependency Management:
uv(orpip) withpyproject.tomlanduv.lock. - Logging:
loguru(configured inapp.py). - Containerization: Docker, Docker Compose.
- ORM: SQLAlchemy Core and ORM features are used.
- Engine/Session:
database.pyconfigures the asynchronous SQLAlchemy engine (create_async_engine) and session factory (async_sessionmaker). Theget_sessiondependency provider ensures each API request gets a dedicated session that is closed afterward. - Models: Defined in
models.py:terminusEntry: Represents validated entries in theterminustable (term [PK], definition, follow_ups [JSON Text]).CandidateterminusEntry: Represents entries awaiting review in thecandidate_terminustable (term [PK], definition, follow_ups [JSON], status [String]).
- Storage: Defaults to a persistent SQLite database (
./volumes/sqlite_data/terminus.db) managed via Docker volumes.DATABASE_URLin.envcan configure other SQLAlchemy-compatible databases. - Schema Management:
Base.metadata.create_all(bind=engine)indatabase.pyprovides a basic mechanism for table creation during development. Note: For production, a dedicated migration tool like Alembic is strongly recommended but not currently implemented. - Serialization: Follow-up questions (
FollowUpPydantic models) are serialized to a JSON string for storage in the database (_serialize_follow_ups) and deserialized back into Pydantic objects upon retrieval (_deserialize_follow_ups) within theTerminusServiceandCandidateTerminusService.
- Framework: FastAPI.
- Structure: Endpoints are organized into routers (
routers/candidate.py,routers/definition.py,routers/terms.py) which are included in the mainapp.py. - Asynchronicity: Uses
async defextensively for non-blocking request handling, essential for waiting on database, Wikipedia, and LLM I/O. - Validation: Pydantic models defined in
schemas.pyare used for automatic request body validation and response serialization. Type hints are used throughout for clarity and static analysis. - Dependency Injection: FastAPI's dependency injection system is used, notably for providing database sessions (
Depends(get_session)). Services (TerminusService,WikipediaService, LLM services) are instantiated within endpoint functions, often receiving the injected session. - Documentation: Automatic interactive API documentation is available at
/docs(Swagger UI) and/redoc(ReDoc) provided by FastAPI.
- Abstraction:
litellmprovides a common interface (acompletion) to different LLM APIs (defaulting togemini/gemini-2.0-flash). - Structured Output:
instructor.from_litellm(acompletion)patches the LiteLLM client to enforce responses conforming to specified Pydantic models (response_modelparameter in services). This significantly improves reliability. - Service Layer: Logic for interacting with LLMs is encapsulated in dedicated service classes (
services/llm_service.py):BaseLLMService: Abstract base class handling client initialization, message formatting (build_messages), and basic error handling during the LLM call (generate_response).FUService: GeneratesterminusAnswer(specifically thefollow_upspart) based on a term and definition.DefinitionValidationService: GeneratesDefinitionValidationResultto assess definition quality.FinancialTermExtractionService: Performs two-step extraction and critique usingExtractedTermsandTermCritiquemodels.
- Prompt Engineering: System and user message templates are stored centrally (
prompts.py) and formatted within the respective services, clearly defining the LLM's task and context.
- Service:
WikipediaServiceencapsulates all logic for fetching summaries. - Asynchronicity: The blocking
wikipedialibrary calls (wikipedia.summary,wikipedia.search,wikipedia.page) are wrapped usingasyncio.to_threadto avoid blocking the FastAPI event loop. - Topic Focus: Implements heuristics to prioritize user-defined topic-related articles:
- Checks for explicit
(user-defined topic)suffix. - Scans search results and disambiguation options for financial keywords using regex (
topic_pattern). - Uses a context hint in fallback searches.
- Checks for explicit
- Error Handling: Explicitly handles
wikipedia.exceptions.DisambiguationErrorandwikipedia.exceptions.PageError.
- Mechanism: Uses
pydantic-settings. TheSettingsclass inconfig.pydefines expected configuration variables. - Source: Settings are loaded from environment variables or a
.envfile. - Variables:
DATABASE_URL: SQLAlchemy database connection string (default:sqlite+aiosqlite:///./volumes/sqlite_data/terminus.db).LOG_LEVEL: Logging level for the application (default:INFO).litellmmight require provider-specific API keys (e.g.,GEMINI_API_KEY) set as environment variables depending on the chosen model.
- Dockerfile: Defines the image for the Python application, including installing dependencies using
uvand setting the entry point to runuvicorn. - docker-compose.yml: Orchestrates the application service (
terminus_app) and potentially related services (though only the app is defined here). It maps ports (8000:8000), mounts the source code (./:/app), and defines a named volume (sqlite_data) to persist the SQLite database file outside the container filesystem. It also specifies the.envfile for configuration.
The application follows a standard layered architecture pattern:
- Presentation Layer (API): Handles HTTP requests, routes them to appropriate handlers, performs data validation (via Pydantic), and serializes responses. This is implemented using FastAPI Routers (
routers/). - Service Layer (Business Logic): Contains the core application logic, orchestrating tasks like database interaction, calling external services (LLM, Wikipedia), and implementing workflows (e.g., candidate validation). This is implemented in the
services/directory (TerminusService,CandidateTerminusService,WikipediaService, LLM Services). - Data Access Layer: Responsible for interacting with the database. This includes the SQLAlchemy models (
models.py), database session management (database.py), and the ORM queries performed within the Service Layer. - External Services: Integrations with third-party APIs (LLM providers via
litellm, Wikipedia API viawikipedialibrary).
This separation promotes modularity, testability, and maintainability.
graph TD
subgraph Frontend
User[User]
end
subgraph API Layer
Router[Definition Router FastAPI]
end
subgraph Services
LS[TerminusService]
CLS[CandidateTerminusService]
WS[WikipediaService]
FS[FollowUpService LLM]
DVS[DefValidationService LLM]
end
subgraph External APIs
WIKI[Wikipedia API]
LLM[LLM API]
end
subgraph Databases
ODB[(Official terminus DB)]
CDB[(Candidate terminus DB)]
end
User --> Router
Router --> LS
LS --> ODB
Router --> CLS
CLS --> CDB
Router --> WS
WS --> WIKI
WS --> Router
Router --> FS
FS --> LLM
Router --> DVS
DVS --> LLM
CLS --> CDB
Router --> User
Conceptual Diagram, the sequence diagram wasn't readable enough.
The entire application is built around Python's asyncio framework, facilitated by FastAPI:
- API endpoints are defined with
async def. - Database interactions use an asynchronous SQLAlchemy driver (
aiosqlite) andawait. - LLM calls via
litellm(acompletion) are asynchronous. - Blocking Wikipedia calls are executed in separate threads using
asyncio.to_threadto prevent blocking the main event loop.
This ensures the service can handle concurrent requests efficiently, especially when waiting for external I/O operations.
Several mechanisms are implemented to ensure the quality, relevance, and accuracy of the terminus data:
- Candidate Review Workflow: The most significant guard rail. New or automatically generated entries must pass through the
candidate_terminustable and require explicit approval (/candidate/validate) before being promoted to the officialterminus. This allows for human oversight or more sophisticated automated checks. - LLM-Powered Term Relevance Critique: The
FinancialTermExtractionServicedoesn't just extract terms; it uses a secondary LLM call (_critique_term) specifically to validate whether an extracted term is genuinely related to the user-defined topic, reducing noise. - LLM-Powered Definition Validation: The
DefinitionValidationServiceuses an LLM prompt focused on factual accuracy within the financial domain, providing a structured assessment (DefinitionValidationResultincludingis_valid,confidence,reasoning) of generated or fetched definitions. - Structured LLM Output: Using
instructorforces LLM responses into predefined Pydantic models. This prevents malformed or unexpected free-form text, ensuring downstream code receives data in the expected format. If the LLM fails to conform,instructortypically raises an error or allows for retries (depending on configuration, though basic retry isn't explicitly shown here). - Wikipedia User-Defined Topic Prioritization: The
WikipediaServiceactively tries to find user-defined topic-specific articles, reducing the chance of retrieving definitions for unrelated concepts with the same name (e.g., "bond" the chemical vs. "bond" the financial instrument). - API Input/Output Validation: Pydantic models used in FastAPI endpoints automatically validate incoming request data and ensure outgoing responses adhere to the defined schema.
- Type Hinting: Extensive use of Python type hints improves code clarity and allows for static analysis tools (like MyPy) to catch potential type errors early.
- Logging: Detailed logging (
loguru) provides visibility into the system's operations, helping diagnose errors and understand decision-making processes (e.g., why a specific Wikipedia page was chosen).
While functional, the current implementation has areas for improvement and inherent limitations:
- LLM Reliability:
- Hallucination/Accuracy: LLMs can still generate plausible but incorrect information (hallucinations). The
DefinitionValidationServicemitigates but doesn't eliminate this risk. Confidence scores are subjective to the LLM's assessment. - Prompt Sensitivity: The quality of LLM outputs (extraction, follow-ups, validation) is highly dependent on the specific prompts used and the chosen LLM model. Changes in models might require prompt adjustments.
- Bias: LLMs can inherit biases from their training data, potentially affecting definitions or follow-up questions.
- Hallucination/Accuracy: LLMs can still generate plausible but incorrect information (hallucinations). The
- Wikipedia Service Limitations:
- Summarization Quality: Wikipedia summaries (
sentences=2) can sometimes be too brief, too complex, or miss crucial nuances. - Disambiguation Imperfection: The user-defined topic keyword heuristic might fail for terms where the financial meaning isn't obvious from the title or for genuinely ambiguous cases.
- Vandalism/Accuracy: Wikipedia content itself can occasionally be inaccurate or subject to vandalism, although popular articles are generally well-maintained.
- Summarization Quality: Wikipedia summaries (
- Scalability:
- Database: SQLite is simple for development but has limitations under high concurrent write loads. Migrating to PostgreSQL or another production-grade database would be necessary for scaling.
- External API Dependencies: Heavy reliance on external LLM and Wikipedia APIs introduces potential bottlenecks related to rate limits, latency, cost, and availability. Caching strategies could help.
- Validation Robustness:
- The LLM-based validation is a good step, but could be enhanced (e.g., cross-referencing with multiple sources, more sophisticated fact-checking techniques, multi-agent debate).
- The current candidate approval is binary. A more granular review process might be needed.
- Cold Start Problem: An empty terminus requires significant initial effort (manual or automated runs) to populate candidate terms and get them reviewed.
- Lack of UI: The review process currently relies on direct API calls. A simple web interface for reviewers would significantly improve usability.
- Testing Coverage: While the structure supports testing, comprehensive unit, integration, and end-to-end tests are crucial but not explicitly provided. Testing LLM interactions effectively requires specific strategies (mocking, snapshot testing, evaluation sets).
- Migration Management: No database migration tool (like Alembic) is included, making schema changes in production environments risky.
create_all_tablesis unsuitable for production. - Error Handling Granularity: Some error handling could be more specific, providing clearer feedback to the user or client system about why an operation failed (e.g., LLM API key missing vs. content moderation block).
While not fully implemented, the system could incorporate automated evaluation mechanisms:
- Candidate Approval Rate: Track the percentage of candidate terms that are approved versus rejected. A high rejection rate might indicate issues with the generation (Wikipedia fetch) or validation (LLM) steps.
- LLM Validation Confidence Monitoring: Analyze the average confidence scores provided by the
DefinitionValidationService. Consistently low scores might signal problems with the definitions being generated or the validator LLM itself. - Semantic Similarity to Golden Set: Maintain a "golden set" of high-quality, human-verified terms and definitions. Periodically, compare newly approved terminus entries against this set using semantic similarity metrics (e.g., sentence-transformer embeddings and cosine similarity) to detect semantic drift or quality degradation.
- Consistency Checks:
- Periodically re-run the
DefinitionValidationServiceon existing official terminus entries to catch potential regressions or identify definitions that have become outdated. - Check for contradictions between a term's definition and the definitions of its follow-up terms.
- Periodically re-run the
- A/B Testing Prompts/Models: Implement infrastructure to test different LLM prompts or models for generation, extraction, or validation tasks, comparing their performance based on metrics like approval rates, confidence scores, or semantic similarity scores.
- User Feedback Loop: If user interaction is added, incorporate feedback mechanisms (e.g., rating definitions, reporting errors) as a direct measure of quality.
- Python 3.13+
uv(recommended, high-performance Python package installer and resolver) orpip- Docker and Docker Compose (for containerized execution)
- Access to an LLM API compatible with
litellm(e.g., Google AI Studio for Gemini API key, free-tier is OK for testing).
- Clone the Repository:
git clone <your-repository-url> cd terminus
- Create
.envFile: Create a file named.envin the project root directory and add the following, adjusting as needed:Ensure# .env DATABASE_URL=sqlite+aiosqlite:///./volumes/sqlite_data/terminus.db LOG_LEVEL=INFO # Add LLM API Key if required by litellm for your chosen provider # Example for Gemini: GEMINI_API_KEY=your_gemini_api_key_here # User defined topic and anchor list of keywords TOPIC_DOMAIN=finance TOPIC_KEYWORDS=["finance", "financial", "banking", "investment", "economic", "stock", "market", "derivative"]
litellmknows how to pick up the key, or configure it according tolitellmdocumentation if necessary.
Using uv (recommended):
uv syncNote: For production, implement and use a database migration tool like Alembic.
uvicorn terminus.app:app --host 0.0.0.0 --port 8000 --reload(The --reload flag is useful for development)
The API documentation will be available at http://localhost:8000/docs.
This is the recommended way to run the application, especially for consistency across environments.
- Build and Start Containers:
(Use
docker-compose up --build
-dto run in detached mode) - Accessing the Service: The API will be available at
http://localhost:8000. Documentation athttp://localhost:8000/docs. - Stopping Containers:
(Add
docker-compose down
-vto remove the named volumesqlite_dataif you want to clear the database)