The Response API executes tools and generates AI responses for complex tasks.
Examples: API examples index includes cURL/SDK snippets for orchestration flows.
- Direct access: http://localhost:8082
- Through gateway: http://localhost:8000/responses (Kong prefixes
/responses) - Inside Docker: http://response-api:8082
All endpoints require authentication through the Kong gateway.
For complete authentication documentation, see Authentication Guide
Quick example:
# Get guest token
TOKEN=$(curl -s -X POST http://localhost:8000/llm/auth/guest-login | jq -r '.access_token')
# Use in requests
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:8000/responses/v1/responses- Run tools automatically - AI decides which tools to use
- Chain tools together - Use output from one tool as input to another (up to 8 steps)
- Get final answers - LLM generates natural language response from tool results
- Track execution - See which tools ran and how long they took
Not sure if you need Response API? See Decision Guide: LLM API vs Response API to choose the right approach.
| Component | Port | Key Environment Variables |
|---|---|---|
| HTTP Server | 8082 | RESPONSE_API_PORT |
| Database (PostgreSQL) | 5432 | DB_POSTGRESQL_WRITE_DSN, DB_POSTGRESQL_READ1_DSN |
| LLM API upstream | 8080 | RESPONSE_LLM_API_URL |
| MCP Tools upstream | 8091 | RESPONSE_MCP_TOOLS_URL |
RESPONSE_API_PORT=8082
DB_POSTGRESQL_WRITE_DSN=postgres://response_api:password@api-db:5432/response_api?sslmode=disable
# Optional read replica
DB_POSTGRESQL_READ1_DSN=postgres://response_ro:password@api-db-ro:5432/response_api?sslmode=disable
# Upstream services
RESPONSE_LLM_API_URL=http://llm-api:8080
RESPONSE_MCP_TOOLS_URL=http://mcp-tools:8091
# Tool execution limits
RESPONSE_MAX_TOOL_DEPTH=8
TOOL_EXECUTION_TIMEOUT=45sRESPONSE_LOG_LEVEL=info
ENABLE_TRACING=false
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
# Auth (when fronted by Kong or called directly with JWT)
AUTH_ENABLED=true
AUTH_ISSUER=http://localhost:8085/realms/jan
ACCOUNT=account
AUTH_JWKS_URL=http://keycloak:8085/realms/jan/protocol/openid-connect/certsPOST /v1/responses
Create a new response with automatic tool orchestration.
curl -X POST http://localhost:8000/responses/v1/responses \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v2-30b",
"input": "Search for the latest AI news and summarize the top 3 results",
"temperature": 0.3,
"tool_choice": {"type": "auto"},
"stream": false
}'Request Body (subset of CreateResponseRequest):
model(required) - Model identifier understood by the LLM API/cataloginput(required) - User prompt (string or structured object)system_prompt(optional) - Instruction prepended before each runtemperature,max_tokens(optional) - Generation controlstools(optional) - Override available tools (OpenAI-compatible format)tool_choice(optional) -{ "type": "auto" | "none" | "required", "function": {"name": "tool"} }stream(optional) -trueto receive SSE eventsconversation(optional) - Attach to an existing conversation IDprevious_response_id(optional) - Continue from a prior responsemetadata,user(optional) - Free-form payload that is persisted with the response
Response:
{
"id": "resp_01hqr8v9k2x3f4g5h6j7k8m9n0",
"model": "jan-v2-30b",
"input": "Search for the latest AI news and summarize the top 3 results",
"output": "Here are the latest AI news items...",
"tool_executions": [
{
"id": "toolexec_123",
"tool": "google_search",
"input": { "q": "latest AI news", "num": 3 },
"output": "...",
"duration_ms": 250
}
],
"execution_metadata": {
"max_depth": 8,
"actual_depth": 1,
"total_duration_ms": 2500,
"status": "completed"
},
"created_at": "2025-11-10T10:30:00Z",
"updated_at": "2025-11-10T10:30:02.500Z"
}Enable stream: true to receive incremental events (text/event-stream), matching the SSE observer in services/response-api/internal/interfaces/httpserver/handlers/response_handler.go.
curl -N http://localhost:8000/responses/v1/responses \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{
"model": "jan-v2-30b",
"input": "Search for the latest AI news and summarize the top 3 results",
"stream": true
}'The stream emits events such as response.created, response.tool_call, response.output_text.delta, and response.completed.
GET /v1/responses/{response_id}
Retrieve a specific response and its execution metadata.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0DELETE /v1/responses/{response_id}
curl -X DELETE -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0POST /v1/responses/{response_id}/cancel
curl -X POST -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0/cancelGET /v1/responses/{response_id}/input_items
Returns the normalized conversation items that were sent to the LLM (useful for replaying the request or for debugging tool runs).
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0/input_itemsThe Response API does not currently expose a list endpoint for all responses. Persisted executions can be queried directly from the service database.
GET /healthz
# Gateway
curl http://localhost:8000/responses/healthz
# Direct
curl http://localhost:8082/healthz- Validate input parameters
- Check tool availability via MCP Tools
- Query MCP Tools for available tools
- Build tool call graph
- Execute tools in sequence/parallel as needed
- Apply depth limit (max 8)
- Apply timeout per tool (45s)
- Pass tool results to LLM API
- Generate final response using context
- Store execution trace in PostgreSQL
- Record tool outputs and timing
- Return complete execution metadata
Limits how deep tool calls can chain:
- Value: 1-15 (default: 8)
- Meaning: Maximum recursive depth of tool calls
- Example: search -> extract -> summarize = depth 2
Per-tool call timeout:
- Value: Duration string (default: 45s)
- Example: "30s", "1m", "500ms"
- Behavior: Cancels tool if it exceeds timeout
| Status | Error | Cause |
|---|---|---|
| 400 | Invalid request | Missing/malformed parameters |
| 404 | Response not found | Invalid response ID |
| 408 | Tool execution timeout | Tool exceeded timeout |
| 500 | Execution error | Tool or LLM error |
Example error:
{
"error": {
"message": "Tool execution exceeded maximum depth",
"type": "execution_error",
"code": "max_depth_exceeded"
}
}- LLM API (Port 8080) - Generates final response
- MCP Tools (Port 8091) - Tool execution and discovery
- Kong Gateway (Port 8000) - API routing
- PostgreSQL - Execution storage
MAX_TOOL_EXECUTION_DEPTH=1 # Single tool call only
TOOL_EXECUTION_TIMEOUT=15s # Short timeoutMAX_TOOL_EXECUTION_DEPTH=8 # Allow up to 8 levels
TOOL_EXECUTION_TIMEOUT=120s # Long timeout for complex workRequests routed through Kong (http://localhost:8000/responses/...) must include either:
Authorization: Bearer <token>(Keycloak JWT - guest tokens work for local testing)X-API-Key: sk_*(custom plugin managed by Kong)
When AUTH_ENABLED=true the service also validates JWTs on port 8082. Use the gateway path whenever possible for rate limiting and centralized logging.
The Response API supports OpenAI-compatible background mode for asynchronous response generation. This allows clients to submit long-running requests without holding open HTTP connections.
Components:
- PostgreSQL-backed Queue: Uses the
responsestable withSELECT FOR UPDATE SKIP LOCKEDfor reliable task distribution - Worker Pool: Fixed-size pool of background workers (default: 4) that poll for queued tasks
- Webhook Notifications: HTTP POST callbacks when tasks complete or fail
- Graceful Cancellation: Queued or in-progress tasks can be cancelled
Task Lifecycle:
Client Request (background=true, store=true)
↓
Create Response (status=queued, queued_at=now)
↓
Return Response Immediately (201 Created)
↓
Worker Dequeues Task
↓
Mark Processing (status=in_progress, started_at=now)
↓
Execute LLM Orchestration with Tool Calls
↓
Update Status (completed/failed, completed_at=now)
↓
Send Webhook Notification (async, non-blocking)
Add these environment variables to enable background mode:
# Worker Pool
BACKGROUND_WORKER_COUNT=4 # Number of concurrent workers
BACKGROUND_POLL_INTERVAL=2s # How often workers check for queued tasks
BACKGROUND_TASK_TIMEOUT=600s # Max execution time per task (10 minutes)
# Webhook Delivery
WEBHOOK_MAX_RETRIES=3 # Retry attempts for failed webhooks
WEBHOOK_RETRY_DELAY=2s # Delay between retry attempts
WEBHOOK_TIMEOUT=10s # HTTP timeout per webhook attempt
WEBHOOK_USER_AGENT=jan-response-api/1.0Recommended Settings:
| Environment | Workers | Poll Interval | Task Timeout | Use Case |
|---|---|---|---|---|
| Development | 2-4 | 2s | 600s (10m) | Local testing, fast iteration |
| Production | 8-16 | 5s | 1200s (20m) | High throughput, complex tasks |
| High-load | 16-32 | 3s | 900s (15m) | Many concurrent tasks |
Add "background": true and "store": true to any response request:
Request:
curl -X POST http://localhost:8000/responses/v1/responses \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v2-30b",
"input": "Write a comprehensive analysis of quantum computing trends",
"background": true,
"store": true,
"metadata": {
"webhook_url": "https://example.com/webhooks/responses",
"user_id": "user_123"
}
}'Response (201 Created):
{
"id": "resp_abc123",
"object": "response",
"status": "queued",
"background": true,
"store": true,
"queued_at": 1705315800,
"created_at": 1705315800,
"model": "jan-v2-30b",
"input": "Write a comprehensive analysis...",
"metadata": {
"webhook_url": "https://example.com/webhooks/responses",
"user_id": "user_123"
}
}Use the standard GET endpoint to check task status:
Request:
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_abc123Response (Queued):
{
"id": "resp_abc123",
"status": "queued",
"queued_at": 1705315800,
...
}Response (In Progress):
{
"id": "resp_abc123",
"status": "in_progress",
"queued_at": 1705315800,
"started_at": 1705315805,
...
}Response (Completed):
{
"id": "resp_abc123",
"status": "completed",
"output": "The comprehensive analysis of quantum computing trends...",
"usage": {
"prompt_tokens": 150,
"completion_tokens": 500,
"total_tokens": 650
},
"queued_at": 1705315800,
"started_at": 1705315805,
"completed_at": 1705316122,
"tool_executions": [...],
...
}Use the cancel endpoint:
Request:
curl -X POST -H "Authorization: Bearer <token>" \
http://localhost:8000/responses/v1/responses/resp_abc123/cancelResponse:
{
"id": "resp_abc123",
"status": "cancelled",
"cancelled_at": 1705315860,
...
}Cancellation Behavior:
- If status is
queued: Immediately marks cancelled, prevents worker pickup - If status is
in_progress: Marks cancelled, but task may complete normally (cooperative cancellation) - If status is
completedorfailed: No-op, returns current state
When a background task completes or fails, the Response API sends an HTTP POST to the webhook URL specified in metadata.webhook_url.
Webhook Payload (Completed):
{
"id": "resp_abc123",
"event": "response.completed",
"status": "completed",
"output": "The response content...",
"usage": {
"prompt_tokens": 150,
"completion_tokens": 500,
"total_tokens": 650
},
"tool_executions": [...],
"metadata": {
"webhook_url": "https://example.com/webhooks/responses",
"user_id": "user_123"
},
"queued_at": 1705315800,
"started_at": 1705315805,
"completed_at": 1705316122
}Webhook Payload (Failed):
{
"id": "resp_abc123",
"event": "response.failed",
"status": "failed",
"error": {
"code": "execution_failed",
"message": "LLM provider timeout after 600s"
},
"metadata": {
"webhook_url": "https://example.com/webhooks/responses",
"user_id": "user_123"
},
"queued_at": 1705315800,
"started_at": 1705315805,
"completed_at": 1705316405
}Webhook HTTP Headers:
Content-Type: application/jsonUser-Agent: jan-response-api/1.0X-Jan-Event: response.completed(orresponse.failed)X-Jan-Response-ID: resp_abc123
Webhook Delivery:
- Method: HTTP POST
- Retries: Up to 3 attempts with 2-second delays
- Timeout: 10 seconds per attempt
- Non-blocking: Webhook failures are logged but don't affect task completion
- Status Codes: 2xx considered success, all others trigger retry
- Requires store=true: Background tasks must be persisted to the database
- API Key Storage: The user's API key (Bearer token or X-API-Key header) is stored securely and used for LLM API calls during background execution
- Task Timeout: Tasks exceeding
BACKGROUND_TASK_TIMEOUTwill be marked as failed - Queue Ordering: Tasks are processed in FIFO order based on
queued_attimestamp - No Streaming: Background mode is incompatible with
stream: true - Worker Restart: In-progress tasks may fail if workers restart (status will show
failed)
queued → in_progress → completed
queued → in_progress → failed
queued → cancelled
in_progress → cancelled (cooperative)
Valid Status Values:
queued- Task waiting for workerin_progress- Worker currently executingcompleted- Successfully finishedfailed- Error during executioncancelled- Cancelled by user
Quick Test:
# 1. Create background task
RESP_ID=$(curl -s -X POST http://localhost:8000/responses/v1/responses \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v2-30b",
"input": "Write a haiku about coding",
"background": true,
"store": true,
"metadata": {"webhook_url": "https://webhook.site/your-id"}
}' | jq -r '.id')
echo "Created task: $RESP_ID"
# 2. Poll until complete
while true; do
STATUS=$(curl -s -H "Authorization: Bearer <token>" \
"http://localhost:8000/responses/v1/responses/$RESP_ID" \
| jq -r '.status')
echo "Status: $STATUS"
[[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "failed" ]] && break
sleep 2
done
# 3. Get final result
curl -s -H "Authorization: Bearer <token>" \
"http://localhost:8000/responses/v1/responses/$RESP_ID" | jqWebhook Testing with webhook.site:
- Go to https://webhook.site/ to get a unique URL
- Use that URL as
metadata.webhook_urlin your request - View received webhooks in the browser
Local Webhook Server:
Create a simple HTTP server that accepts POST requests to /webhook and logs the received event data.
# Run webhook server
python webhook_server.py
# Use http://host.docker.internal:9000/webhook in requestsComprehensive test suite at tests/automation/responses-background-webhook.json:
Test Suites:
- Setup & Authentication
- Basic Background Mode
- Background with Webhooks
- Background with Tool Calling
- Cancellation
- Conversation Continuity
- Error Handling
- Complex Scenarios
- Monitoring & Observability
- Long-Running Research Task
Running Tests:
# Run all tests
jan-cli api-test run tests/automation/responses-background-webhook.json \
--timeout-request 60000
# Export results
jan-cli api-test run tests/automation/responses-background-webhook.json \
--timeout-request 60000 \
--reporters cli,jsonSymptoms: Tasks remain in queued status indefinitely
Solutions:
- Check worker logs:
docker logs <response-api-container> --tail 100 - Verify workers started: Look for "worker X started" messages
- Check
BACKGROUND_WORKER_COUNT > 0 - Verify database connectivity
- Check for database locks:
SELECT * FROM pg_locks WHERE granted = false;
Symptoms: Workers running but queue depth not decreasing
Solutions:
- Verify
BACKGROUND_POLL_INTERVALsetting - Check worker logs for errors
- Ensure tasks have
background=trueandstore=true - Check LLM API availability:
curl http://llm-api:8080/healthz
Symptoms: Tasks complete but webhooks not received
Solutions:
- Test webhook URL:
curl -X POST <webhook_url> -d '{"test":"data"}' - Use
http://host.docker.internal:<port>for local development - Check response-api logs for webhook errors
- Verify webhook endpoint returns 2xx status
- Check firewall/network policies
Symptoms: Tasks marked as failed with timeout errors
Solutions:
- Increase
BACKGROUND_TASK_TIMEOUT(default: 600s) - Optimize prompts to reduce processing time
- Check LLM API response times
- Monitor tool execution duration in logs
- Consider breaking into smaller tasks
Symptoms: Many queued tasks, slow processing
Solutions:
- Increase
BACKGROUND_WORKER_COUNT - Scale horizontally: Run multiple response-api instances
- Monitor database performance
- Check LLM API rate limits
- Optimize tool execution times