Support both callback and generator-based streaming in all Chat Generators #8742

vblagoje · 2025-01-17T11:09:02Z

Motivation

Currently, all chat generator components (OpenAI, Anthropic, etc.) support streaming only through callbacks:

def callback(chunk: StreamingChunk):
    print(chunk.content)

generator = OpenAIChatGenerator()
result = generator.run(messages, streaming_callback=callback)

This works well for simple use cases (notebooks etc), but becomes problematic when:

Integrating with async frameworks like FastAPI that expect a generator for streaming responses:

@app.get("/stream")
async def stream_endpoint():
    return StreamingResponse(generator_function())  # Needs a generator

Building pipelines where downstream components expect to consume a stream of tokens:
```
pipeline.connect("chat", "stream_processor.stream")  # Currently not possible
```

Implementing custom streaming logic that doesn't fit the callback pattern:

# Currently not possible:
for chunk in result.stream:
    # Custom processing
    await process_chunk(chunk)

Proposed Solution

Add a second output socket stream to all chat generator components:

@component.output_types(
    replies=List[ChatMessage],
    stream=Generator[ChatCompletionChunk, None, None]
)

This allows components to support both streaming patterns:

Callback-based (existing behavior):

result = generator.run(messages, streaming_callback=callback)
# result = {"replies": [...], "stream": None}

Generator-based (new behavior):

result = generator.run(messages)
# result = {"replies": [], "stream": <generator>}
for chunk in result["stream"]:
    print(chunk.content)

Implementation Details

Two possible approaches:

Option 1: Use Component Socket Detection

Add helper method to detect if stream socket is connected:

def has_output_receivers(self, instance, socket_name: str) -> bool:
    if not hasattr(instance, "__haystack_output__"):
        return False
    socket = instance.__haystack_output__.get(socket_name)
    return bool(socket.receivers) if socket else False

Components would enable streaming based on either callback or socket connection:

stream_has_receivers = component.has_output_receivers(self, "stream")
is_streaming = streaming_callback is not None or stream_has_receivers

Pros:

Clean integration with pipeline system
No API changes needed
Automatic streaming when socket connected

Cons:

Less explicit control over streaming mode
Less explicit and mysterious

Option 2: Use Sentinel Value

Add sentinel to signal generator-based streaming:

class GENERATE_STREAM:
    """Sentinel object used to signal generator-based streaming"""
    pass

GENERATE = GENERATE_STREAM()

# Usage:
result = generator.run(messages, streaming_callback=GENERATE)

Pros:

Explicit control over streaming mode
Clear API intention

Cons:

More complex API

Questions to Resolve

Should we support both streaming modes simultaneously?
How should we handle errors in streaming mode?
Should we standardize chunk format across different LLM providers which would be nice, no?

The text was updated successfully, but these errors were encountered:

vblagoje · 2025-01-20T08:34:21Z

This article by @aryaminus covers the workarounds and mental gymnastics users have to go through to enable this functionality.

mpangrazzi · 2025-01-20T09:00:02Z

@vblagoje yeah I saw it. In general I agree with proposed approach, my points:

Should we support both streaming modes simultaneously?

I think we need first to investigate which is the primary use case for streaming_callback (apart from the article above). We may decide to support both modes initially, then gently deprecate one.

How should we handle errors in streaming mode?

On SSE use case (ie open-webui), network timeout errors (the most common ones) should be handled on the client side. If one instead simply consume a generator, error should be handled while consuming it with classic try / except block.

Should we standardize chunk format across different LLM providers which would be nice, no?

I agree on this!

aryaminus · 2025-01-20T14:36:25Z

@vblagoje felt like dropping my thoughts:

Should we support both streaming modes simultaneously?
w/o overcomplicating, generator-based might have better inclination as it allows adding flavors, unless we want something simple and pragmatic and thus callback-based.
How should we handle errors in streaming mode?
propagate errors through the generator for generator-based / pass errors to the callback or log them explicitly for callback-based.
Should we standardize chunk format across LLM providers?
not by default.

my leniency is towards Option 1 (Component Socket Detection) for seamless integration but suggest adding an optional parameter (e.g., streaming_mode="callback" | "generator") for explicit control.

vblagoje · 2025-01-27T09:24:48Z

Thanks @aryaminus - very good suggestion on streaming_mode="callback" | "generator") for explicit control. We are internally talking how to proceed on this. More feedback from the community is always appreciated!

vblagoje mentioned this issue Jan 20, 2025

streaming_callback with async functions #7231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support both callback and generator-based streaming in all Chat Generators #8742

Support both callback and generator-based streaming in all Chat Generators #8742

vblagoje commented Jan 17, 2025

vblagoje commented Jan 20, 2025

mpangrazzi commented Jan 20, 2025 •

edited

Loading

aryaminus commented Jan 20, 2025 •

edited

Loading

vblagoje commented Jan 27, 2025 •

edited

Loading

Support both callback and generator-based streaming in all Chat Generators #8742

Support both callback and generator-based streaming in all Chat Generators #8742

Comments

vblagoje commented Jan 17, 2025

Motivation

Proposed Solution

Implementation Details

Option 1: Use Component Socket Detection

Option 2: Use Sentinel Value

Questions to Resolve

vblagoje commented Jan 20, 2025

mpangrazzi commented Jan 20, 2025 • edited Loading

aryaminus commented Jan 20, 2025 • edited Loading

vblagoje commented Jan 27, 2025 • edited Loading

mpangrazzi commented Jan 20, 2025 •

edited

Loading

aryaminus commented Jan 20, 2025 •

edited

Loading

vblagoje commented Jan 27, 2025 •

edited

Loading