-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Search before asking
- I searched in the issues and found nothing similar.
Description
The current design feeds all items into the LLM at once, which risks exceeding the context window. A practical solution is to introduce batched summarization with the following workflow:
-
Chunking Input Data
-
Split the memory items into fixed-size batches (configurable, e.g., 50–100 items per batch).
-
Each batch is summarized independently, producing a concise representation.
-
Hierarchical Summarization
-
After batch-level summaries are generated, feed those summaries into a second summarization step.
-
This produces a global summary that captures the overall context without overwhelming the LLM.
-
Configurable Parameters
-
Allow users to configure:
-
Batch size (number of items per summarization call).
-
- Summarization depth (single-pass vs. hierarchical).
-
Retention policy (e.g., keep both batch summaries and global summary for traceability).
-
Implementation Sketch (Pseudo-Java/Python)
List items = getMemoryItems();
int batchSize = 50;
List batchSummaries = new ArrayList<>();
for (int i = 0; i < items.size(); i += batchSize) {
List batch = items.subList(i, Math.min(i + batchSize, items.size()));
String summary = llm.summarize(batch);
batchSummaries.add(summary);
}
// Hierarchical summarization
String globalSummary = llm.summarize(batchSummaries);
storeSummary(globalSummary);
- Advantages
- Prevents context overflow by respecting LLM limits.
- Scales to very large memory sets.
- Maintains fidelity by layering summaries instead of discarding details
Next Steps
- Add a runtime configuration option for batch size and summarization depth.
- Implement a summarization operator in the Flink runtime that can be reused across agents.
- Provide benchmarks comparing single-pass vs. batched summarization to validate efficiency
Are you willing to submit a PR?
- I'm willing to submit a PR!