Skip to content

Conversation

@XBastille
Copy link

Description

Adds streaming support for batched prompts, resolving issue #406.

Changes

  • Removed restriction that blocked batched streaming in _sampler.py
  • Fixed _stream_sample_loop to wait for ALL batch elements to complete (not just first)
  • Updated _stream_decode_state to properly handle batch dimensions
  • Net reduction of 3 lines of code

Testing

  • Tested with Gemma3_4B on NVIDIA L40s GPU
  • Single prompt streaming: Works (unchanged)
  • Batched non-streaming: Works (unchanged)
  • Batched streaming: Now works (new feature!)

Backward Compatibility

Fully backward compatible - no breaking changes to existing API.

Fixes #406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Streaming is not supported for batched prompts. Let us know if you need this feature.

1 participant