[codex] Add vLLM backend for Qwen LMM adapter by hansent · Pull Request #2389 · roboflow/inference

hansent · 2026-05-29T19:26:17Z

Summary

Adds an env-gated vLLM backend path for the Qwen 3.5 VLM adapter so native Qwen workflow blocks can keep using the existing Inference/LMM request surface while delegating generation to an OpenAI-compatible vLLM server.

Changes

Adds VLLM_LMM_* environment variables for enabling delegation, configuring the backend URL, served model name, API key, timeout, and temperature.
Updates InferenceModelsQwen35VLAdapter to skip local AutoModel.from_pretrained(...) when VLLM_LMM_ENABLED=true and call chat.completions.create(...) on the configured vLLM backend instead.
Converts the native Qwen prompt convention (prompt<system_prompt>system) into OpenAI-compatible chat messages.
Preserves already-base64 LMM request images by wrapping the existing payload as a data URL, avoiding an extra JPEG recompress before sending to vLLM. Non-base64 image inputs still use the existing image loading fallback.
Adds unit tests covering URL normalization, prompt splitting, skipped local model loading, vLLM payload construction, and base64 image preservation.

Deploy Notes

For the PlayOn custom image/deployment, the intended env shape is:

VLLM_LMM_ENABLED=true
VLLM_LMM_BASE_URL=http://playon-vllm.playon-vllm.svc.cluster.local:8000
VLLM_LMM_MODEL_NAME=vlm-ocr-14
VLLM_LMM_TEMPERATURE=0
VLLM_LMM_TIMEOUT_SECONDS=120

VLLM_LMM_BASE_URL may omit /v1; the adapter appends it before constructing the OpenAI client.

Validation

uv run --no-project --python /Users/hansent/code/inference/.venv/bin/python pytest \
  tests/inference/unit_tests/models/test_qwen3_5vl_vllm_adapter.py \
  tests/workflows/unit_tests/core_steps/models/foundation/test_qwen_vlm.py \
  tests/workflows/unit_tests/core_steps/models/foundation/test_vlm_remote_execution.py

Result: 36 passed.

hansent added 2 commits May 29, 2026 14:13

Add vLLM backend for Qwen LMM adapter

e485ed3

Preserve base64 images for Qwen vLLM backend

dd49ae2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add vLLM backend for Qwen LMM adapter#2389

[codex] Add vLLM backend for Qwen LMM adapter#2389
hansent wants to merge 2 commits into
mainfrom
hansent/playon-vllm-codex-block

hansent commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hansent commented May 29, 2026

Summary

Changes

Deploy Notes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant