You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support to automatic "hybrid" streaming (#182)
* Allow async_streaming_generator to support also sync-only streaming components assigning proper streaming_callback type
* Refactoring ; Remove unneded comments
* Using only 'auto' and False values ; refactoring ; add docs
* Add extra deps for tests
* Improved check for streaming capable components (check the run/run_async method signature rather than components attributes)
* Lint
* Update tests and docs without HuggingFace components
* Update docs
* Update docs/concepts/pipeline-wrapper.md
Co-authored-by: Sebastian Husch Lee <[email protected]>
* Update docs/concepts/pipeline-wrapper.md
Co-authored-by: Sebastian Husch Lee <[email protected]>
* Update docs/concepts/pipeline-wrapper.md
Co-authored-by: Sebastian Husch Lee <[email protected]>
* Simplify find_all_streaming_components check
* Remove duplicate test
* Using allow_sync_streaming_callbacks=True/False; Refactoring + add example
---------
Co-authored-by: Sebastian Husch Lee <[email protected]>
## Hybrid Streaming: Mixing Async and Sync Components
180
+
181
+
!!! tip "Compatibility for Legacy Components"
182
+
When working with legacy pipelines or components that only support sync streaming callbacks (like `OpenAIGenerator`), use `allow_sync_streaming_callbacks=True` to enable hybrid mode. For new code, prefer async-compatible components and use the default strict mode.
183
+
184
+
Some Haystack components only support synchronous streaming callbacks and don't have async equivalents. Examples include:
185
+
186
+
-`OpenAIGenerator` - Legacy OpenAI text generation (⚠️ Note: `OpenAIChatGenerator` IS async-compatible)
187
+
- Other components without `run_async()` support
188
+
189
+
### The Problem
190
+
191
+
By default, `async_streaming_generator` requires all streaming components to support async callbacks:
allow_sync_streaming_callbacks=True# ✅ Auto-detect and enable hybrid mode
220
+
)
221
+
```
222
+
223
+
### What `allow_sync_streaming_callbacks=True` Does
224
+
225
+
When you set `allow_sync_streaming_callbacks=True`, the system enables **intelligent auto-detection**:
226
+
227
+
1.**Scans Components**: Automatically inspects all streaming components in your pipeline
228
+
2.**Detects Capabilities**: Checks if each component has `run_async()` support
229
+
3.**Enables Hybrid Mode Only If Needed**:
230
+
- ✅ If **all components support async** → Uses pure async mode (no overhead)
231
+
- ✅ If **any component is sync-only** → Automatically enables hybrid mode
232
+
4.**Bridges Sync to Async**: For sync-only components, wraps their callbacks to work seamlessly with the async event loop
233
+
5.**Zero Configuration**: You don't need to know which components are sync/async - it figures it out automatically
234
+
235
+
!!! success "Smart Behavior"
236
+
Setting `allow_sync_streaming_callbacks=True` does NOT force hybrid mode. It only enables it when actually needed. If your pipeline is fully async-capable, you get pure async performance with no overhead!
237
+
238
+
### Configuration Options
239
+
240
+
```python
241
+
# Option 1: Strict mode (Default - Recommended)
242
+
allow_sync_streaming_callbacks=False
243
+
# → Raises error if sync-only components found
244
+
# → Best for: New code, ensuring proper async components, best performance
245
+
246
+
# Option 2: Auto-detection (Compatibility mode)
247
+
allow_sync_streaming_callbacks=True
248
+
# → Automatically detects and enables hybrid mode only when needed
249
+
# → Best for: Legacy pipelines, components without async support, gradual migration
250
+
```
251
+
252
+
### Example: Legacy OpenAI Generator with Async Pipeline
253
+
254
+
```python
255
+
from typing import AsyncGenerator
256
+
from haystack import AsyncPipeline
257
+
from haystack.components.builders import PromptBuilder
258
+
from haystack.components.generators import OpenAIGenerator
259
+
from haystack.utils import Secret
260
+
from hayhooks import BasePipelineWrapper, get_last_user_message, async_streaming_generator
261
+
262
+
classLegacyOpenAIWrapper(BasePipelineWrapper):
263
+
defsetup(self) -> None:
264
+
# OpenAIGenerator only supports sync streaming (legacy component)
- Working with legacy pipelines that use `OpenAIGenerator` or other sync-only components
304
+
- Deploying YAML pipelines with unknown/legacy component types
305
+
- Migrating old code that doesn't have async equivalents yet
306
+
- Third-party components without async support
307
+
308
+
### Performance Considerations
309
+
310
+
-**Pure async pipeline**: No overhead
311
+
-**Hybrid mode (auto-detected)**: Minimal overhead (~1-2 microseconds per streaming chunk for sync components)
312
+
-**Network-bound operations**: The overhead is negligible compared to LLM generation time
313
+
314
+
!!! success "Best Practice"
315
+
**For new code**: Use the default strict mode (`allow_sync_streaming_callbacks=False`) to ensure you're using proper async components.
316
+
317
+
**For legacy/compatibility**: Use `allow_sync_streaming_callbacks=True` when working with older pipelines or components that don't support async streaming yet.
|[multi_llm_streaming](./pipeline_wrappers/multi_llm_streaming/)| Multiple LLM components with automatic streaming | • Two sequential LLMs<br/>• Automatic multi-component streaming<br/>• No special configuration needed<br/>• Shows default streaming behavior | Demonstrating how hayhooks automatically streams from all components in a pipeline |
10
10
|[async_question_answer](./pipeline_wrappers/async_question_answer/)| Async question-answering pipeline with streaming support | • Async pipeline execution<br/>• Streaming responses<br/>• OpenAI Chat Generator<br/>• Both API and chat completion interfaces | Building conversational AI systems that need async processing and real-time streaming responses |
11
+
|[async_hybrid_streaming](./pipeline_wrappers/async_hybrid_streaming/)| AsyncPipeline with legacy sync-only components using hybrid mode | • AsyncPipeline with OpenAIGenerator<br/>• `allow_sync_streaming_callbacks=True`<br/>• Automatic sync-to-async bridging<br/>• Migration example | Using legacy components (OpenAIGenerator) in async pipelines, migrating from sync to async gradually, handling third-party sync-only components |
11
12
|[chat_with_website](./pipeline_wrappers/chat_with_website/)| Answer questions about website content | • Web content fetching<br/>• HTML to document conversion<br/>• Content-based Q&A<br/>• Configurable URLs | Creating AI assistants that can answer questions about specific websites or web-based documentation |
12
13
|[chat_with_website_mcp](./pipeline_wrappers/chat_with_website_mcp/)| MCP-compatible website chat pipeline | • MCP (Model Context Protocol) support<br/>• Website content analysis<br/>• API-only interface<br/>• Simplified deployment | Integrating website analysis capabilities into MCP-compatible AI systems and tools |
13
14
|[chat_with_website_streaming](./pipeline_wrappers/chat_with_website_streaming/)| Streaming website chat responses | • Real-time streaming<br/>• Website content processing<br/>• Progressive response generation<br/>• Enhanced user experience | Building responsive web applications that provide real-time AI responses about website content |
This example demonstrates using `allow_sync_streaming_callbacks=True` to enable hybrid streaming mode with AsyncPipeline and legacy sync-only components.
4
+
5
+
## Overview
6
+
7
+
This example shows how to use an **AsyncPipeline** with **OpenAIGenerator** (a legacy component that only supports synchronous streaming callbacks) by enabling hybrid mode with `allow_sync_streaming_callbacks=True`.
8
+
9
+
## The Problem
10
+
11
+
Some Haystack components like `OpenAIGenerator` only support **synchronous** streaming callbacks and don't have `run_async()` support. When you try to use them with `async_streaming_generator` in an AsyncPipeline, you'll get an error:
12
+
13
+
```text
14
+
ValueError: Component 'llm' of type 'OpenAIGenerator' seems to not support async streaming callbacks
15
+
```
16
+
17
+
## The Solution
18
+
19
+
Set `allow_sync_streaming_callbacks=True` to enable **hybrid mode**:
When `allow_sync_streaming_callbacks=True`, the system automatically detects components with sync-only streaming callbacks (e.g., `OpenAIGenerator`) and enables hybrid mode to bridge them to work in async context. If all components support async, no bridging is applied (pure async mode).
curl -X POST http://localhost:1416/v1/chat/completions \
60
+
-H "Content-Type: application/json" \
61
+
-d '{
62
+
"model": "async_hybrid_streaming",
63
+
"messages": [{"role": "user", "content": "What is machine learning?"}],
64
+
"stream": true
65
+
}'
66
+
```
67
+
68
+
## Performance
69
+
70
+
Hybrid mode might have a minimal overhead (~1-2 microseconds per streaming chunk for sync components). This is negligible compared to network latency and LLM generation time.
0 commit comments