Skip to content

Conversation

@AviadHayumi
Copy link

Summary

Add a new DEFAULT_MAX_TOKENS environment variable that sets a global default for max_tokens across all models, eliminating the need to configure max_tokens individually for each model in the MODELS env var.

Why

When using OpenAI-compatible backends like NVIDIA NIM, not setting max_tokens can cause issues:

  • NIM may use very high default values (e.g., 131072) that exceed the model's context window
  • This leads to errors like Input length + max new tokens > max sequence length

Before this change, users had to configure both OPENAI_BASE_URL and MODELS just to set max_tokens:

OPENAI_BASE_URL=http://nim-service.namespace.svc.cluster.local/v1
MODELS='[{"id":"meta/llama-3.1-8b-instruct","name":"Llama 3.1 8B (NIM)","parameters":{"max_tokens":4096}}]'

After this change, users can simply set:

OPENAI_BASE_URL=http://nim-service.namespace.svc.cluster.local/v1
DEFAULT_MAX_TOKENS=4096

This is much simpler - no need to duplicate the model ID/name or use the MODELS config just for setting token limits.

Changes

File Change
src/lib/server/config.ts Add DEFAULT_MAX_TOKENS to ExtraConfigKeys type
src/lib/server/endpoints/openai/endpointOai.ts Use DEFAULT_MAX_TOKENS as fallback when model's max_tokens parameter is not set

Test plan

  • npm run check passes
  • Tested with NVIDIA NIM - requests now include reasonable max_tokens values
  • Verified that per-model parameters.max_tokens still takes priority over DEFAULT_MAX_TOKENS
  • Verified that omitting both uses undefined (existing behavior)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 24017e6f5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@AviadHayumi AviadHayumi force-pushed the feat/default-max-tokens branch from 24017e6 to 0a4586e Compare January 22, 2026 09:41
@gary149
Copy link
Collaborator

gary149 commented Jan 26, 2026

ok

@gary149 gary149 closed this Jan 26, 2026
@gary149 gary149 reopened this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants