Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
comprehensive-examples.md	comprehensive-examples.md

Response API Documentation

The Response API executes tools and generates AI responses for complex tasks.

Quick Start

Examples: API examples index includes cURL/SDK snippets for orchestration flows.

URLs

Direct access: http://localhost:8082
Through gateway: http://localhost:8000/responses (Kong prefixes /responses)
Inside Docker: http://response-api:8082

Authentication

All endpoints require authentication through the Kong gateway.

For complete authentication documentation, see Authentication Guide

Quick example:

# Get guest token
TOKEN=$(curl -s -X POST http://localhost:8000/llm/auth/guest-login | jq -r '.access_token')

# Use in requests
curl -H "Authorization: Bearer $TOKEN" \
 http://localhost:8000/responses/v1/responses

What You Can Do

Run tools automatically - AI decides which tools to use
Chain tools together - Use output from one tool as input to another (up to 8 steps)
Get final answers - LLM generates natural language response from tool results
Track execution - See which tools ran and how long they took

Not sure if you need Response API? See Decision Guide: LLM API vs Response API to choose the right approach.

Service Ports & Configuration

Component	Port	Key Environment Variables
HTTP Server	8082	`RESPONSE_API_PORT`
Database (PostgreSQL)	5432	`DB_POSTGRESQL_WRITE_DSN`, `DB_POSTGRESQL_READ1_DSN`
LLM API upstream	8080	`RESPONSE_LLM_API_URL`
MCP Tools upstream	8091	`RESPONSE_MCP_TOOLS_URL`

Required Environment Variables

RESPONSE_API_PORT=8082
DB_POSTGRESQL_WRITE_DSN=postgres://response_api:password@api-db:5432/response_api?sslmode=disable
# Optional read replica
DB_POSTGRESQL_READ1_DSN=postgres://response_ro:password@api-db-ro:5432/response_api?sslmode=disable

# Upstream services
RESPONSE_LLM_API_URL=http://llm-api:8080
RESPONSE_MCP_TOOLS_URL=http://mcp-tools:8091

# Tool execution limits
RESPONSE_MAX_TOOL_DEPTH=8
TOOL_EXECUTION_TIMEOUT=45s

Optional Configuration

RESPONSE_LOG_LEVEL=info
ENABLE_TRACING=false
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

# Auth (when fronted by Kong or called directly with JWT)
AUTH_ENABLED=true
AUTH_ISSUER=http://localhost:8085/realms/jan
ACCOUNT=account
AUTH_JWKS_URL=http://keycloak:8085/realms/jan/protocol/openid-connect/certs

Main Endpoints

Create Response (Multi-Step Orchestration)

POST /v1/responses

Create a new response with automatic tool orchestration.

curl -X POST http://localhost:8000/responses/v1/responses \
 -H "Authorization: Bearer <token>" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "jan-v2-30b",
 "input": "Search for the latest AI news and summarize the top 3 results",
 "temperature": 0.3,
 "tool_choice": {"type": "auto"},
 "stream": false
 }'

Request Body (subset of CreateResponseRequest):

model (required) - Model identifier understood by the LLM API/catalog
input (required) - User prompt (string or structured object)
system_prompt (optional) - Instruction prepended before each run
temperature, max_tokens (optional) - Generation controls
tools (optional) - Override available tools (OpenAI-compatible format)
tool_choice (optional) - { "type": "auto" | "none" | "required", "function": {"name": "tool"} }
stream (optional) - true to receive SSE events
conversation (optional) - Attach to an existing conversation ID
previous_response_id (optional) - Continue from a prior response
metadata, user (optional) - Free-form payload that is persisted with the response

Response:

{
  "id": "resp_01hqr8v9k2x3f4g5h6j7k8m9n0",
  "model": "jan-v2-30b",
  "input": "Search for the latest AI news and summarize the top 3 results",
  "output": "Here are the latest AI news items...",
  "tool_executions": [
    {
      "id": "toolexec_123",
      "tool": "google_search",
      "input": { "q": "latest AI news", "num": 3 },
      "output": "...",
      "duration_ms": 250
    }
  ],
  "execution_metadata": {
    "max_depth": 8,
    "actual_depth": 1,
    "total_duration_ms": 2500,
    "status": "completed"
  },
  "created_at": "2025-11-10T10:30:00Z",
  "updated_at": "2025-11-10T10:30:02.500Z"
}

Streaming Responses

Enable stream: true to receive incremental events (text/event-stream), matching the SSE observer in services/response-api/internal/interfaces/httpserver/handlers/response_handler.go.

curl -N http://localhost:8000/responses/v1/responses \
 -H "Authorization: Bearer <token>" \
 -H "Content-Type: application/json" \
 -H "Accept: text/event-stream" \
 -d '{
 "model": "jan-v2-30b",
 "input": "Search for the latest AI news and summarize the top 3 results",
 "stream": true
 }'

The stream emits events such as response.created, response.tool_call, response.output_text.delta, and response.completed.

Get Response

GET /v1/responses/{response_id}

Retrieve a specific response and its execution metadata.

curl -H "Authorization: Bearer <token>" \
 http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0

Delete Response

DELETE /v1/responses/{response_id}

curl -X DELETE -H "Authorization: Bearer <token>" \
 http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0

Cancel In-Flight Response

POST /v1/responses/{response_id}/cancel

curl -X POST -H "Authorization: Bearer <token>" \
 http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0/cancel

List Input Items (Conversation Replay)

GET /v1/responses/{response_id}/input_items

Returns the normalized conversation items that were sent to the LLM (useful for replaying the request or for debugging tool runs).

curl -H "Authorization: Bearer <token>" \
 http://localhost:8000/responses/v1/responses/resp_01hqr8v9k2x3f4g5h6j7k8m9n0/input_items

The Response API does not currently expose a list endpoint for all responses. Persisted executions can be queried directly from the service database.

Health Check

GET /healthz

# Gateway
curl http://localhost:8000/responses/healthz

# Direct
curl http://localhost:8082/healthz

Tool Execution Flow

1. Request Processing

Validate input parameters
Check tool availability via MCP Tools

2. Tool Discovery

Query MCP Tools for available tools
Build tool call graph

3. Iterative Execution

Execute tools in sequence/parallel as needed
Apply depth limit (max 8)
Apply timeout per tool (45s)

4. LLM Delegation

Pass tool results to LLM API
Generate final response using context

5. Result Storage

Store execution trace in PostgreSQL
Record tool outputs and timing
Return complete execution metadata

Tool Execution Parameters

Max Tool Execution Depth

Limits how deep tool calls can chain:

Value: 1-15 (default: 8)
Meaning: Maximum recursive depth of tool calls
Example: search -> extract -> summarize = depth 2

Tool Execution Timeout

Per-tool call timeout:

Value: Duration string (default: 45s)
Example: "30s", "1m", "500ms"
Behavior: Cancels tool if it exceeds timeout

Error Handling

Status	Error	Cause
400	Invalid request	Missing/malformed parameters
404	Response not found	Invalid response ID
408	Tool execution timeout	Tool exceeded timeout
500	Execution error	Tool or LLM error

Example error:

{
  "error": {
    "message": "Tool execution exceeded maximum depth",
    "type": "execution_error",
    "code": "max_depth_exceeded"
  }
}

Related Services

LLM API (Port 8080) - Generates final response
MCP Tools (Port 8091) - Tool execution and discovery
Kong Gateway (Port 8000) - API routing
PostgreSQL - Execution storage

Configuration Examples

Quick Response (Single Tool)

MAX_TOOL_EXECUTION_DEPTH=1 # Single tool call only
TOOL_EXECUTION_TIMEOUT=15s # Short timeout

Complex Workflows (Deep Chains)

MAX_TOOL_EXECUTION_DEPTH=8 # Allow up to 8 levels
TOOL_EXECUTION_TIMEOUT=120s # Long timeout for complex work

Authentication

Requests routed through Kong (http://localhost:8000/responses/...) must include either:

Authorization: Bearer <token> (Keycloak JWT - guest tokens work for local testing)
X-API-Key: sk_* (custom plugin managed by Kong)

When AUTH_ENABLED=true the service also validates JWTs on port 8082. Use the gateway path whenever possible for rate limiting and centralized logging.

Background Mode

The Response API supports OpenAI-compatible background mode for asynchronous response generation. This allows clients to submit long-running requests without holding open HTTP connections.

Architecture

Components:

PostgreSQL-backed Queue: Uses the responses table with SELECT FOR UPDATE SKIP LOCKED for reliable task distribution
Worker Pool: Fixed-size pool of background workers (default: 4) that poll for queued tasks
Webhook Notifications: HTTP POST callbacks when tasks complete or fail
Graceful Cancellation: Queued or in-progress tasks can be cancelled

Task Lifecycle:

Client Request (background=true, store=true)
    ↓
Create Response (status=queued, queued_at=now)
    ↓
Return Response Immediately (201 Created)
    ↓
Worker Dequeues Task
    ↓
Mark Processing (status=in_progress, started_at=now)
    ↓
Execute LLM Orchestration with Tool Calls
    ↓
Update Status (completed/failed, completed_at=now)
    ↓
Send Webhook Notification (async, non-blocking)

Configuration

Add these environment variables to enable background mode:

# Worker Pool
BACKGROUND_WORKER_COUNT=4        # Number of concurrent workers
BACKGROUND_POLL_INTERVAL=2s      # How often workers check for queued tasks
BACKGROUND_TASK_TIMEOUT=600s     # Max execution time per task (10 minutes)

# Webhook Delivery
WEBHOOK_MAX_RETRIES=3            # Retry attempts for failed webhooks
WEBHOOK_RETRY_DELAY=2s           # Delay between retry attempts
WEBHOOK_TIMEOUT=10s              # HTTP timeout per webhook attempt
WEBHOOK_USER_AGENT=jan-response-api/1.0

Recommended Settings:

Environment	Workers	Poll Interval	Task Timeout	Use Case
Development	2-4	2s	600s (10m)	Local testing, fast iteration
Production	8-16	5s	1200s (20m)	High throughput, complex tasks
High-load	16-32	3s	900s (15m)	Many concurrent tasks

API Usage

Creating a Background Response

Add "background": true and "store": true to any response request:

Request:

curl -X POST http://localhost:8000/responses/v1/responses \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jan-v2-30b",
    "input": "Write a comprehensive analysis of quantum computing trends",
    "background": true,
    "store": true,
    "metadata": {
      "webhook_url": "https://example.com/webhooks/responses",
      "user_id": "user_123"
    }
  }'

Response (201 Created):

{
  "id": "resp_abc123",
  "object": "response",
  "status": "queued",
  "background": true,
  "store": true,
  "queued_at": 1705315800,
  "created_at": 1705315800,
  "model": "jan-v2-30b",
  "input": "Write a comprehensive analysis...",
  "metadata": {
    "webhook_url": "https://example.com/webhooks/responses",
    "user_id": "user_123"
  }
}

Polling for Status

Use the standard GET endpoint to check task status:

Request:

curl -H "Authorization: Bearer <token>" \
  http://localhost:8000/responses/v1/responses/resp_abc123

Response (Queued):

{
  "id": "resp_abc123",
  "status": "queued",
  "queued_at": 1705315800,
  ...
}

Response (In Progress):

{
  "id": "resp_abc123",
  "status": "in_progress",
  "queued_at": 1705315800,
  "started_at": 1705315805,
  ...
}

Response (Completed):

{
  "id": "resp_abc123",
  "status": "completed",
  "output": "The comprehensive analysis of quantum computing trends...",
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 500,
    "total_tokens": 650
  },
  "queued_at": 1705315800,
  "started_at": 1705315805,
  "completed_at": 1705316122,
  "tool_executions": [...],
  ...
}

Cancelling a Background Task

Use the cancel endpoint:

Request:

curl -X POST -H "Authorization: Bearer <token>" \
  http://localhost:8000/responses/v1/responses/resp_abc123/cancel

Response:

{
  "id": "resp_abc123",
  "status": "cancelled",
  "cancelled_at": 1705315860,
  ...
}

Cancellation Behavior:

If status is queued: Immediately marks cancelled, prevents worker pickup
If status is in_progress: Marks cancelled, but task may complete normally (cooperative cancellation)
If status is completed or failed: No-op, returns current state

Webhook Notifications

When a background task completes or fails, the Response API sends an HTTP POST to the webhook URL specified in metadata.webhook_url.

Webhook Payload (Completed):

{
  "id": "resp_abc123",
  "event": "response.completed",
  "status": "completed",
  "output": "The response content...",
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 500,
    "total_tokens": 650
  },
  "tool_executions": [...],
  "metadata": {
    "webhook_url": "https://example.com/webhooks/responses",
    "user_id": "user_123"
  },
  "queued_at": 1705315800,
  "started_at": 1705315805,
  "completed_at": 1705316122
}

Webhook Payload (Failed):

{
  "id": "resp_abc123",
  "event": "response.failed",
  "status": "failed",
  "error": {
    "code": "execution_failed",
    "message": "LLM provider timeout after 600s"
  },
  "metadata": {
    "webhook_url": "https://example.com/webhooks/responses",
    "user_id": "user_123"
  },
  "queued_at": 1705315800,
  "started_at": 1705315805,
  "completed_at": 1705316405
}

Webhook HTTP Headers:

Content-Type: application/json
User-Agent: jan-response-api/1.0
X-Jan-Event: response.completed (or response.failed)
X-Jan-Response-ID: resp_abc123

Webhook Delivery:

Method: HTTP POST
Retries: Up to 3 attempts with 2-second delays
Timeout: 10 seconds per attempt
Non-blocking: Webhook failures are logged but don't affect task completion
Status Codes: 2xx considered success, all others trigger retry

Background Mode Constraints

Requires store=true: Background tasks must be persisted to the database
API Key Storage: The user's API key (Bearer token or X-API-Key header) is stored securely and used for LLM API calls during background execution
Task Timeout: Tasks exceeding BACKGROUND_TASK_TIMEOUT will be marked as failed
Queue Ordering: Tasks are processed in FIFO order based on queued_at timestamp
No Streaming: Background mode is incompatible with stream: true
Worker Restart: In-progress tasks may fail if workers restart (status will show failed)

Status Transitions

queued → in_progress → completed
queued → in_progress → failed
queued → cancelled
in_progress → cancelled (cooperative)

Valid Status Values:

queued - Task waiting for worker
in_progress - Worker currently executing
completed - Successfully finished
failed - Error during execution
cancelled - Cancelled by user

Testing Background Mode

Quick Test:

# 1. Create background task
RESP_ID=$(curl -s -X POST http://localhost:8000/responses/v1/responses \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "jan-v2-30b",
    "input": "Write a haiku about coding",
    "background": true,
    "store": true,
    "metadata": {"webhook_url": "https://webhook.site/your-id"}
  }' | jq -r '.id')

echo "Created task: $RESP_ID"

# 2. Poll until complete
while true; do
  STATUS=$(curl -s -H "Authorization: Bearer <token>" \
    "http://localhost:8000/responses/v1/responses/$RESP_ID" \
    | jq -r '.status')
  echo "Status: $STATUS"
  [[ "$STATUS" == "completed" ]] || [[ "$STATUS" == "failed" ]] && break
  sleep 2
done

# 3. Get final result
curl -s -H "Authorization: Bearer <token>" \
  "http://localhost:8000/responses/v1/responses/$RESP_ID" | jq

Webhook Testing with webhook.site:

Go to https://webhook.site/ to get a unique URL
Use that URL as metadata.webhook_url in your request
View received webhooks in the browser

Local Webhook Server:

Create a simple HTTP server that accepts POST requests to /webhook and logs the received event data.

# Run webhook server
python webhook_server.py

# Use http://host.docker.internal:9000/webhook in requests

Automated Testing

Comprehensive test suite at tests/automation/responses-background-webhook.json:

Test Suites:

Setup & Authentication
Basic Background Mode
Background with Webhooks
Background with Tool Calling
Cancellation
Conversation Continuity
Error Handling
Complex Scenarios
Monitoring & Observability
Long-Running Research Task

Running Tests:

# Run all tests
jan-cli api-test run tests/automation/responses-background-webhook.json \
  --timeout-request 60000

# Export results
jan-cli api-test run tests/automation/responses-background-webhook.json \
  --timeout-request 60000 \
  --reporters cli,json

Troubleshooting

Tasks Stuck in Queued

Symptoms: Tasks remain in queued status indefinitely

Solutions:

Check worker logs: docker logs <response-api-container> --tail 100
Verify workers started: Look for "worker X started" messages
Check BACKGROUND_WORKER_COUNT > 0
Verify database connectivity
Check for database locks: SELECT * FROM pg_locks WHERE granted = false;

Workers Not Processing Tasks

Symptoms: Workers running but queue depth not decreasing

Solutions:

Verify BACKGROUND_POLL_INTERVAL setting
Check worker logs for errors
Ensure tasks have background=true and store=true
Check LLM API availability: curl http://llm-api:8080/healthz

Webhook Delivery Failures

Symptoms: Tasks complete but webhooks not received

Solutions:

Test webhook URL: curl -X POST <webhook_url> -d '{"test":"data"}'
Use http://host.docker.internal:<port> for local development
Check response-api logs for webhook errors
Verify webhook endpoint returns 2xx status
Check firewall/network policies

Tasks Timing Out

Symptoms: Tasks marked as failed with timeout errors

Solutions:

Increase BACKGROUND_TASK_TIMEOUT (default: 600s)
Optimize prompts to reduce processing time
Check LLM API response times
Monitor tool execution duration in logs
Consider breaking into smaller tasks

High Queue Depth

Symptoms: Many queued tasks, slow processing

Solutions:

Increase BACKGROUND_WORKER_COUNT
Scale horizontally: Run multiple response-api instances
Monitor database performance
Check LLM API rate limits
Optimize tool execution times

FilesExpand file tree

response-api

Directory actions

More options

Directory actions

More options

Latest commit

History

response-api

Folders and files

parent directory

README.md

Response API Documentation

Quick Start

URLs

Authentication

What You Can Do

Service Ports & Configuration

Required Environment Variables

Optional Configuration

Main Endpoints

Create Response (Multi-Step Orchestration)

Streaming Responses

Get Response

Delete Response

Cancel In-Flight Response

List Input Items (Conversation Replay)

Health Check

Tool Execution Flow

1. Request Processing

2. Tool Discovery

3. Iterative Execution

4. LLM Delegation

5. Result Storage

Tool Execution Parameters

Max Tool Execution Depth

Tool Execution Timeout

Error Handling

Related Services

Configuration Examples

Quick Response (Single Tool)

Complex Workflows (Deep Chains)

See Also

Authentication

Background Mode

Architecture

Configuration

API Usage

Creating a Background Response

Polling for Status

Cancelling a Background Task

Webhook Notifications

Background Mode Constraints

Status Transitions

Testing Background Mode

Automated Testing

Troubleshooting

Tasks Stuck in Queued

Workers Not Processing Tasks

Webhook Delivery Failures

Tasks Timing Out

High Queue Depth