Skip to content

[codex] Add vLLM backend for Qwen LMM adapter#2389

Draft
hansent wants to merge 2 commits into
mainfrom
hansent/playon-vllm-codex-block
Draft

[codex] Add vLLM backend for Qwen LMM adapter#2389
hansent wants to merge 2 commits into
mainfrom
hansent/playon-vllm-codex-block

Conversation

@hansent
Copy link
Copy Markdown
Collaborator

@hansent hansent commented May 29, 2026

Summary

Adds an env-gated vLLM backend path for the Qwen 3.5 VLM adapter so native Qwen workflow blocks can keep using the existing Inference/LMM request surface while delegating generation to an OpenAI-compatible vLLM server.

Changes

  • Adds VLLM_LMM_* environment variables for enabling delegation, configuring the backend URL, served model name, API key, timeout, and temperature.
  • Updates InferenceModelsQwen35VLAdapter to skip local AutoModel.from_pretrained(...) when VLLM_LMM_ENABLED=true and call chat.completions.create(...) on the configured vLLM backend instead.
  • Converts the native Qwen prompt convention (prompt<system_prompt>system) into OpenAI-compatible chat messages.
  • Preserves already-base64 LMM request images by wrapping the existing payload as a data URL, avoiding an extra JPEG recompress before sending to vLLM. Non-base64 image inputs still use the existing image loading fallback.
  • Adds unit tests covering URL normalization, prompt splitting, skipped local model loading, vLLM payload construction, and base64 image preservation.

Deploy Notes

For the PlayOn custom image/deployment, the intended env shape is:

VLLM_LMM_ENABLED=true
VLLM_LMM_BASE_URL=http://playon-vllm.playon-vllm.svc.cluster.local:8000
VLLM_LMM_MODEL_NAME=vlm-ocr-14
VLLM_LMM_TEMPERATURE=0
VLLM_LMM_TIMEOUT_SECONDS=120

VLLM_LMM_BASE_URL may omit /v1; the adapter appends it before constructing the OpenAI client.

Validation

uv run --no-project --python /Users/hansent/code/inference/.venv/bin/python pytest \
  tests/inference/unit_tests/models/test_qwen3_5vl_vllm_adapter.py \
  tests/workflows/unit_tests/core_steps/models/foundation/test_qwen_vlm.py \
  tests/workflows/unit_tests/core_steps/models/foundation/test_vlm_remote_execution.py

Result: 36 passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant