The LLM API lets you send messages to AI models and get responses. It works like the OpenAI API.
Examples: API examples index covers cURL/SDK snippets for every endpoint.
- Direct access: http://localhost:8080
- Through gateway (recommended): http://localhost:8000
- Inside Docker: http://llm-api:8080
All endpoints require authentication through the Kong gateway.
For complete authentication documentation, see Authentication Guide
Quick guest token:
# Get guest token
curl -X POST http://localhost:8000/llm/auth/guest-login
# Use token in requests
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/chat/completionsKey points:
- Use Kong gateway (port 8000) for all client requests
- Bearer tokens from
/llm/auth/guest-loginwork for development - API keys (
X-API-Key: sk_*) available via Kong for production - Direct service access (port 8080) still requires valid JWT
- Chat with AI - Send messages and get responses (like ChatGPT)
- Stream responses - Get word-by-word output in real-time
- Save conversations - Keep chat history for later
- Add images - Reference images using jan_* IDs
- Multiple models - Works with vLLM, OpenAI, Anthropic, and others
- Track everything - Built-in logging and monitoring
- (Future) Prompt Orchestration - Dynamic prompt composition with memory, templates, and conditional modules (see
docs/todo/prompt-orchestration-todo.md)
| Component | Port | Environment Variable |
|---|---|---|
| HTTP Server | 8080 | HTTP_PORT |
| Database | 5432 | DB_POSTGRESQL_WRITE_DSN |
| Keycloak | 8085 | KEYCLOAK_BASE_URL |
HTTP_PORT=8080 # HTTP listen port
DB_POSTGRESQL_WRITE_DSN=postgres://jan_user:password@api-db:5432/jan_llm_api?sslmode=disable
LOG_LEVEL=info # debug, info, warn, error
LOG_FORMAT=json # json or text
KEYCLOAK_BASE_URL=http://keycloak:8085 # Keycloak URL
JWKS_URL=http://keycloak:8085/realms/jan/protocol/openid-connect/certs
ISSUER=http://localhost:8090/realms/jan # Token issuer
ACCOUNT=account # JWT audience/account claimOTEL_ENABLED=false # Enable OpenTelemetry
OTEL_SERVICE_NAME=llm-api # Service name for tracing
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 # Jaeger endpoint
MEDIA_RESOLVE_URL=http://kong:8000/media/v1/media/resolve # Default Media API resolver via Kong
MEDIA_RESOLVE_TIMEOUT=5s # Media resolution timeoutOverride
MEDIA_RESOLVE_URLonly if you need to call the Media API directly (e.g.,http://media-api:8285/v1/media/resolveinside Docker).
POST /v1/chat/completions
OpenAI-compatible chat completion endpoint.
# Simple completion
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v1-4b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 100
}'
# Streaming completion
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v1-4b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": true
}'Request Parameters:
model(required) - Model identifier (e.g., "jan-v1-4b")messages(required) - Array of message objectsrole- "system", "user", or "assistant"content- Text content (string) or content array (for media)stream(optional) - Enable streaming responses (default: false)temperature(optional) - 0.0-2.0, controls randomness (default: 0.7)top_p(optional) - 0.0-1.0, nucleus sampling (default: 1.0)max_tokens(optional) - Maximum response lengthstop(optional) - Stop sequences
Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1699999999,
"model": "jan-v1-4b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
"total_tokens": 22
}
}GET /v1/conversations
List all conversations for the authenticated user.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversationsQuery Parameters:
limit(optional) - Number of conversations to return (default: 20)after(optional) - Cursor for paginationorder(optional) - Sort order: "asc" or "desc" (default: "desc")
POST /v1/conversations
Create a new conversation.
curl -X POST -H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"title": "My Conversation",
"project_id": "proj_123"
}' \
http://localhost:8000/v1/conversationsGET /v1/conversations/{conv_public_id}
Get a specific conversation with its items.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversations/conv_123POST /v1/conversations/{conv_public_id}
Update a conversation (title, archived status).
curl -X POST -H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"title": "Updated Title"}' \
http://localhost:8000/v1/conversations/conv_123DELETE /v1/conversations/{conv_public_id}
Delete a conversation.
curl -X DELETE -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversations/conv_123GET /v1/conversations/{conv_public_id}/items
List all items (messages) in a conversation.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversations/conv_123/itemsPOST /v1/conversations/{conv_public_id}/items
Add items (messages) to a conversation.
curl -X POST -H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"items": [
{
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "Hello!"}
]
}
]
}' \
http://localhost:8000/v1/conversations/conv_123/itemsGET /v1/conversations/{conv_public_id}/items/{item_id}
Get a specific item from a conversation.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversations/conv_123/items/item_456DELETE /v1/conversations/{conv_public_id}/items/{item_id}
Delete an item from a conversation.
curl -X DELETE -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/conversations/conv_123/items/item_456Projects help organize conversations into logical groups.
POST /v1/projects
Create a new project.
curl -X POST -H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Marketing Campaign",
"instruction": "You are a marketing expert."
}' \
http://localhost:8000/v1/projectsGET /v1/projects
List all projects for the authenticated user.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/projectsQuery Parameters:
limit(optional) - Number of projects to returnafter(optional) - Cursor for paginationorder(optional) - Sort order: "asc" or "desc"
GET /v1/projects/{project_id}
Get a specific project by ID.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/projects/proj_123PATCH /v1/projects/{project_id}
Update a project's name, instruction, or archived status.
curl -X PATCH -H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Updated Project Name",
"instruction": "New instruction text",
"archived": false
}' \
http://localhost:8000/v1/projects/proj_123DELETE /v1/projects/{project_id}
Soft-delete a project.
curl -X DELETE -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/projects/proj_123GET /v1/models
List all available models.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/modelsGET /v1/models/catalogs/{model_public_id}
Get details for a specific model from the catalog.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/models/catalogs/jan-v1-4bGET /v1/models/providers
List all available model providers.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/models/providersGET /v1/healthz
Basic health check endpoint.
curl http://localhost:8080/v1/healthzGET /v1/readyz
Readiness check endpoint (service ready to accept traffic).
curl http://localhost:8080/v1/readyzGET /v1/version
Get API version and build information.
curl http://localhost:8080/v1/versionGET /v1/users/me/settings
Retrieves the current user's settings. If no settings exist, returns defaults.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/users/me/settingsResponse:
{
"id": 1,
"user_id": 123,
"memory_config": {
"enabled": true,
"observe_enabled": true,
"inject_user_core": true,
"inject_semantic": true,
"inject_episodic": false,
"max_user_items": 3,
"max_project_items": 5,
"max_episodic_items": 3,
"min_similarity": 0.75
},
"profile_settings": {
"base_style": "Friendly",
"custom_instructions": "",
"nick_name": "",
"occupation": "",
"more_about_you": ""
},
"advanced_settings": {
"web_search": false,
"code_enabled": false
},
"enable_trace": false,
"enable_tools": true,
"preferences": {}
}PATCH /v1/users/me/settings
Updates user settings. Only provided fields are updated (partial update).
curl -X PATCH http://localhost:8000/v1/users/me/settings \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"memory_config": {
"enabled": true,
"max_user_items": 5
},
"profile_settings": {
"base_style": "Professional",
"nick_name": "Dev"
}
}'Settings Groups:
| Group | Fields | Purpose |
|---|---|---|
memory_config |
enabled, observe_enabled, inject_*, max_*, min_similarity |
Memory and retrieval controls |
profile_settings |
base_style, custom_instructions, nick_name, occupation, more_about_you |
User profile and preferences |
advanced_settings |
web_search, code_enabled |
Advanced feature toggles |
| Top-level | enable_trace, enable_tools, preferences |
System features |
Memory Configuration:
enabled- Master toggle for all memory featuresobserve_enabled- Automatically learn from conversationsinject_user_core- Include user core facts in promptsinject_semantic- Include semantic project factsinject_episodic- Include episodic conversation historymax_user_items(0-20) - Maximum user memory items to retrievemax_project_items(0-50) - Maximum project facts to retrievemax_episodic_items(0-20) - Maximum episodic events to retrievemin_similarity(0.0-1.0) - Minimum relevance score for memory retrieval
Profile Settings:
base_style- Conversation style:"Concise","Friendly", or"Professional"custom_instructions- Additional behavior and tone preferencesnick_name- What should Jan call you? (alias:nicknameaccepted on input)occupation- Your occupation or rolemore_about_you- Additional information about yourself
Advanced Settings:
web_search- Enable automatic web search (privacy consideration)code_enabled- Enable code execution features (security consideration)
Reference media using jan_* IDs from the Media API:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v2-30b",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is this?"},
{
"type": "image_url",
"image_url": {
"url": "jan_01hqr8v9k2x3f4g5h6j7k8m9n0"
}
}
]
}
]
}'Use any vision-capable model you have configured (for local-only setups, point jan-cli at a remote provider such as OpenAI, Anthropic, or Qwen VL).
- Response API (Port 8082) - Multi-step orchestration using this service. See Decision Guide for when to use each.
- Media API (Port 8285) - Media resolution for
jan_*IDs. See Media Upload Guide and Jan ID System. - MCP Tools (Port 8091) - Tool integration for LLM responses. See MCP Tools Documentation.
- Kong Gateway (Port 8000) - API routing and load balancing. See Authentication Guide.
Common HTTP status codes:
| Code | Meaning |
|---|---|
| 200 | Success |
| 400 | Invalid request parameters |
| 401 | Unauthorized (invalid/expired token) |
| 403 | Forbidden (insufficient permissions) |
| 404 | Resource not found |
| 429 | Rate limited |
| 500 | Server error |
Example error response:
{
"error": {
"message": "Invalid model specified",
"type": "invalid_request_error",
"code": "invalid_model"
}
}Requests routed through Kong inherit its rate-limiting plugin:
- Default (development): 100 requests per minute per client IP (
kong/kong-dev-full.yml) - Headers:
X-RateLimit-Limit-minute,X-RateLimit-Remaining-minute - Exceeding the limit returns HTTP 429
Calling the service directly on port 8080 bypasses the gateway rate limiter (useful for internal health checks).