Important
See the API package documentation for general pre-requisites, dependent components, and package deployment instructions
This document is only applicable for spinning up the API in a local Python development environment.
Important
Execute the following commands from this sub-directory
Important
The following steps assume that you already have a deployed and accessible UDS Kubernetes cluster and LeapfrogAI. Please follow the steps within the DEVELOPMENT.md for details.
-
Install dependencies
make install
-
Create a config.yaml using the config.example.yaml as a template.
-
Run the FastAPI application
make dev API_PORT=8080
-
Create an API key with test user "[email protected]" and test password "password", lasting 30 days from creation time
# If the in-cluster API is up, and not testing the API workflow make api-key API_BASE_URL=http://localhost:8080
To create a new 30-day API key, use the following:
# If the in-cluster API is up, and not testing the API workflow make new-api-key API_BASE_URL=http://localhost:8080
The newest API key will be printed to a
.env
file located within this directory. -
Make calls to the API Swagger endpoint at
http://localhost:8080/docs
using your API token as theHTTPBearer
token.- Hit
Authorize
on the Swagger page to enter your API key
- Hit
See the "Access" section of the DEVELOPMENT.md for different ways to connect the API to a model backend or Supabase.
See the tests directory documentation for more details.
The LeapfrogAI API includes a Retrieval Augmented Generation (RAG) pipeline for enhanced question answering. This section details how to configure its reranking options. All RAG configurations are managed through the /leapfrogai/v1/rag/configure
API endpoint.
Reranking improves the accuracy and relevance of RAG responses. You can enable or disable it using the enable_reranking
parameter:
- Enable Reranking: Send a PATCH request to
/leapfrogai/v1/rag/configure
with the following JSON payload:
{
"enable_reranking": true
}
- Disable Reranking: Send a PATCH request with:
{
"enable_reranking": false
}
Multiple reranking models are supported, each offering different performance characteristics. Choose your preferred model using the ranking_model
parameter. Ensure you've installed any necessary Python dependencies for your chosen model (see the rerankers library documentation on dependencies).
-
Supported Models: The system supports several models, including (but not limited to)
flashrank
,rankllm
,cross-encoder
, andcolbert
. Refer to the rerankers library documentation for a complete list and details on their capabilities. -
Model Selection: Use a PATCH request to
/leapfrogai/v1/rag/configure
with the desired model:
{
"enable_reranking": true, // Reranking must be enabled
"ranking_model": "rankllm" // Or another supported model
}
This parameter sets the number of top results retrieved from the vector database before the reranking process begins. A higher value increases the diversity of candidates considered for reranking but also increases processing time. A lower value can lead to missing relevant results if not carefully chosen. This setting is only relevant when reranking is enabled.
- Configuration: Use a PATCH request to
/leapfrogai/v1/rag/configure
to set this value:
{
"enable_reranking": true,
"ranking_model": "flashrank",
"rag_top_k_when_reranking": 150 // Adjust this value as needed
}
To check the current RAG configuration (including reranking status, model, and rag_top_k_when_reranking
), send a GET request to /leapfrogai/v1/rag/configure
. The response will be a JSON object containing all the current settings.
-
Initial Setup: Start with reranking enabled using the default
flashrank
model and arag_top_k_when_reranking
value of 100. -
Experiment with Models: Test different reranking models (
rankllm
,colbert
, etc.) by changing theranking_model
parameter and observing the impact on response quality. Adjustrag_top_k_when_reranking
as needed to find the optimal balance between diversity and performance. -
Fine-tuning: Once you identify a suitable model, fine-tune the
rag_top_k_when_reranking
parameter for optimal performance. Monitor response times and quality to determine the best setting. -
Disabling Reranking: If needed, disable reranking by setting
"enable_reranking": false
.
Remember to always consult the rerankers library documentation for information on supported models and their specific requirements. The API documentation provides further details on request formats and potential error responses.