Skip to content

Commit 19830f4

Browse files
authored
Docs update (#183)
* Add docs for include_outputs_from * Fix wrong params * Align correctly with default (localhost) * remove blank spaces * Lint * Remove mentions to dc-query-api
1 parent 42dc75e commit 19830f4

File tree

6 files changed

+113
-10
lines changed

6 files changed

+113
-10
lines changed

docs/concepts/pipeline-wrapper.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,92 @@ YAML configuration follows the same priority rules: YAML setting > environment v
502502

503503
See the [Multi-LLM Streaming Example](https://github.com/deepset-ai/hayhooks/tree/main/examples/pipeline_wrappers/multi_llm_streaming) for a complete working implementation.
504504

505+
## Accessing Intermediate Outputs with `include_outputs_from`
506+
507+
!!! info "Understanding Pipeline Outputs"
508+
By default, Haystack pipelines only return outputs from **leaf components** (final components with no downstream connections). Use `include_outputs_from` to also get outputs from intermediate components like retrievers, preprocessors, or parallel branches.
509+
510+
### Streaming with `on_pipeline_end` Callback
511+
512+
For streaming responses, pass `include_outputs_from` to `streaming_generator()` or `async_streaming_generator()`, and use the `on_pipeline_end` callback to access intermediate outputs. For example:
513+
514+
```python
515+
def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Generator:
516+
question = get_last_user_message(messages)
517+
518+
# Store retrieved documents for citations
519+
self.retrieved_docs = []
520+
521+
def on_pipeline_end(result: dict[str, Any]) -> None:
522+
# Access intermediate outputs here
523+
if "retriever" in result:
524+
self.retrieved_docs = result["retriever"]["documents"]
525+
# Use for citations, logging, analytics, etc.
526+
527+
return streaming_generator(
528+
pipeline=self.pipeline,
529+
pipeline_run_args={
530+
"retriever": {"query": question},
531+
"prompt_builder": {"query": question}
532+
},
533+
include_outputs_from={"retriever"}, # Make retriever outputs available
534+
on_pipeline_end=on_pipeline_end
535+
)
536+
```
537+
538+
**What happens:** The `on_pipeline_end` callback receives both `llm` and `retriever` outputs in the `result` dict, allowing you to access retrieved documents alongside the generated response.
539+
540+
The same pattern works with async streaming:
541+
542+
```python
543+
async def run_chat_completion_async(self, model: str, messages: List[dict], body: dict) -> AsyncGenerator:
544+
question = get_last_user_message(messages)
545+
546+
def on_pipeline_end(result: dict[str, Any]) -> None:
547+
if "retriever" in result:
548+
self.retrieved_docs = result["retriever"]["documents"]
549+
550+
return async_streaming_generator(
551+
pipeline=self.async_pipeline,
552+
pipeline_run_args={
553+
"retriever": {"query": question},
554+
"prompt_builder": {"query": question}
555+
},
556+
include_outputs_from={"retriever"},
557+
on_pipeline_end=on_pipeline_end
558+
)
559+
```
560+
561+
### Non-Streaming API
562+
563+
For non-streaming `run_api` or `run_api_async` endpoints, pass `include_outputs_from` directly to `pipeline.run()` or `pipeline.run_async()`. For example:
564+
565+
```python
566+
def run_api(self, query: str) -> dict:
567+
result = self.pipeline.run(
568+
data={"retriever": {"query": query}},
569+
include_outputs_from={"retriever"}
570+
)
571+
# Build custom response with both answer and sources
572+
return {"answer": result["llm"]["replies"][0], "sources": result["retriever"]["documents"]}
573+
```
574+
575+
Same pattern for async:
576+
577+
```python
578+
async def run_api_async(self, query: str) -> dict:
579+
result = await self.async_pipeline.run_async(
580+
data={"retriever": {"query": query}},
581+
include_outputs_from={"retriever"}
582+
)
583+
return {"answer": result["llm"]["replies"][0], "sources": result["retriever"]["documents"]}
584+
```
585+
586+
!!! tip "When to Use `include_outputs_from`"
587+
- **Streaming**: Pass `include_outputs_from` to `streaming_generator()` or `async_streaming_generator()` and use `on_pipeline_end` callback to access the outputs
588+
- **Non-streaming**: Pass `include_outputs_from` directly to `pipeline.run()` or `pipeline.run_async()`
589+
- **YAML Pipelines**: Automatically handled - see [YAML Pipeline Deployment](yaml-pipeline-deployment.md#output-mapping)
590+
505591
## File Upload Support
506592

507593
Hayhooks can handle file uploads by adding a `files` parameter:

docs/concepts/yaml-pipeline-deployment.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ outputs:
7979
-d '{
8080
"name": "my_chat_pipeline",
8181
"description": "Chat pipeline for Q&A",
82-
"yaml_content": "...",
82+
"source_code": "...",
8383
"overwrite": false
8484
}'
8585
```
@@ -94,7 +94,7 @@ outputs:
9494
json={
9595
"name": "my_chat_pipeline",
9696
"description": "Chat pipeline for Q&A",
97-
"yaml_content": "...", # Your YAML content as string
97+
"source_code": "...", # Your YAML content as string
9898
"overwrite": False
9999
}
100100
)
@@ -151,6 +151,23 @@ outputs:
151151
- Response fields are serialized to JSON
152152
- Complex objects are automatically serialized
153153

154+
!!! success "Automatic `include_outputs_from` Derivation"
155+
Hayhooks **automatically** derives the `include_outputs_from` parameter from your `outputs` section. This ensures that all components referenced in the outputs are included in the pipeline results, even if they're not leaf components.
156+
157+
**Example:** If your outputs reference `retriever.documents` and `llm.replies`, Hayhooks automatically sets `include_outputs_from={"retriever", "llm"}` when running the pipeline.
158+
159+
**What this means:** You don't need to configure anything extra - just declare your outputs in the YAML, and Hayhooks ensures those component outputs are available in the results!
160+
161+
!!! note "Comparison with PipelineWrapper"
162+
**YAML Pipelines** (this page): `include_outputs_from` is **automatic** - derived from your `outputs` section
163+
164+
**PipelineWrapper**: `include_outputs_from` must be **manually passed**:
165+
166+
- For streaming: Pass to `streaming_generator()` / `async_streaming_generator()`
167+
- For non-streaming: Pass to `pipeline.run()` / `pipeline.run_async()`
168+
169+
See [PipelineWrapper: include_outputs_from](pipeline-wrapper.md#accessing-intermediate-outputs-with-include_outputs_from) for examples.
170+
154171
## API Usage
155172

156173
### After Deployment

docs/features/cli-commands.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ hayhooks run --reload
7373

7474
| Option | Short | Description | Default |
7575
|--------|-------|-------------|---------|
76-
| `--host` | | Host to bind to | `127.0.0.1` |
76+
| `--host` | | Host to bind to | `localhost` |
7777
| `--port` | | Port to listen on | `1416` |
7878
| `--workers` | | Number of worker processes | `1` |
7979
| `--pipelines-dir` | | Directory for pipeline definitions | `./pipelines` |
@@ -97,7 +97,7 @@ hayhooks mcp run --host 0.0.0.0 --port 1417
9797

9898
| Option | Short | Description | Default |
9999
|--------|-------|-------------|---------|
100-
| `--host` | | MCP server host | `127.0.0.1` |
100+
| `--host` | | MCP server host | `localhost` |
101101
| `--port` | | MCP server port | `1417` |
102102
| `--pipelines-dir` | | Directory for pipeline definitions | `./pipelines` |
103103
| `--additional-python-path` | | Additional Python path | `None` |

docs/features/mcp-support.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,15 @@ pip install hayhooks[mcp]
3030
hayhooks mcp run
3131
```
3232

33-
This starts the MCP server on `HAYHOOKS_MCP_HOST:HAYHOOKS_MCP_PORT` (default: `127.0.0.1:1417`).
33+
This starts the MCP server on `HAYHOOKS_MCP_HOST:HAYHOOKS_MCP_PORT` (default: `localhost:1417`).
3434

3535
### Configuration
3636

3737
Environment variables for MCP server:
3838

3939
```bash
40-
HAYHOOKS_MCP_HOST=127.0.0.1 # MCP server host
41-
HAYHOOKS_MCP_PORT=1417 # MCP server port
40+
HAYHOOKS_MCP_HOST=localhost # MCP server host
41+
HAYHOOKS_MCP_PORT=1417 # MCP server port
4242
```
4343

4444
## Transports

docs/getting-started/configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ hayhooks run --host 0.0.0.0 --port 1416 --pipelines-dir ./pipelines
3838

3939
The most frequently used options:
4040

41-
- `HAYHOOKS_HOST` - Host to bind to (default: `127.0.0.1`)
41+
- `HAYHOOKS_HOST` - Host to bind to (default: `localhost`)
4242
- `HAYHOOKS_PORT` - Port to listen on (default: `1416`)
4343
- `HAYHOOKS_PIPELINES_DIR` - Pipeline directory for auto-deployment (default: `./pipelines`)
4444
- `LOG` - Log level: `DEBUG`, `INFO`, `WARNING`, `ERROR` (default: `INFO`)
@@ -51,7 +51,7 @@ For the complete list of all environment variables and detailed descriptions, se
5151

5252
```bash
5353
# .env.development
54-
HAYHOOKS_HOST=127.0.0.1
54+
HAYHOOKS_HOST=localhost
5555
HAYHOOKS_PORT=1416
5656
LOG=DEBUG
5757
HAYHOOKS_SHOW_TRACEBACKS=true

src/hayhooks/server/utils/deploy_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ def add_yaml_pipeline_to_registry(
548548
if streaming_components:
549549
clog.debug(f"Found streaming_components in YAML: {streaming_components}")
550550

551-
# Automatically derive include_outputs_from from the outputs mapping (matches dc-query-api behavior)
551+
# Automatically derive include_outputs_from from the outputs mapping.
552552
# This ensures we get outputs from all components referenced in the outputs declaration,
553553
# not just leaf components. Useful for debugging and getting intermediate results.
554554
# Extract component names from paths like "llm.replies" -> "llm"

0 commit comments

Comments
 (0)