feat(gepa): add tool description optimization for multi-agent systems #8928

Ju-usc · 2025-10-10T06:12:23Z

Description

Adds tool description optimization capability to GEPA optimizer for multi-agent systems.

When optimize_tool_descriptions=True, GEPA now:

Extracts tool descriptions from all nested modules (via named_sub_modules())
Includes them in the optimization process alongside signature instructions
Returns optimized system with improved tool descriptions

This enables holistic optimization where both agent reasoning (signatures) and tool usage (descriptions) are improved based on end-to-end execution traces.

Issue

Closes #8706

Changes

Add optimize_tool_descriptions parameter to GEPA (default False)
Extract tool descriptions using named_sub_modules() traversal in compile()
Apply optimized descriptions in DspyAdapter.build_program()
Add 4 comprehensive tests covering single-agent and multi-agent scenarios

Usage Example

import dspy

# Create multi-agent system
class ResearchAssistant(dspy.Module):
    def __init__(self):
        super().__init__()
        search_tool = dspy.Tool(search_fn, name="search", desc="Searches web")
        self.researcher = dspy.ReAct("query -> findings", tools=[search_tool])
        
        def delegate_research(query):
            return self.researcher(query=query).findings
        
        research_tool = dspy.Tool(delegate_research, name="research", desc="Research things")
        calc_tool = dspy.Tool(calc_fn, name="calculator", desc="Does math")
        self.assistant = dspy.ReAct("question -> answer", tools=[research_tool, calc_tool])

# Enable tool optimization
optimizer = dspy.GEPA(
    metric=my_metric,
    reflection_lm=lm,
    auto="light",
    optimize_tool_descriptions=True,  # ← Enable tool optimization
)

# Optimizes ALL tools (calculator, research, search) holistically
optimized = optimizer.compile(ResearchAssistant(), trainset=train, valset=val)

Backward Compatibility

✅ Fully backward compatible - default optimize_tool_descriptions=False

Tests

All 16 tests pass (4 new + 12 existing GEPA tests)
Tests cover: adapter functionality, single-agent, multi-agent nested discovery, end-to-end optimization

- Add optimize_tool_descriptions parameter (default False) to GEPA - Extract tool descriptions from all nested modules via named_sub_modules() - Apply optimized descriptions in DspyAdapter.build_program() - Enables holistic optimization of tools across main and subagent modules - Tests: 4 new tests, all 16 pass (4 new + 12 existing)

Ju-usc · 2025-10-10T06:16:07Z

Apologies for accidentally closing #8927

Thank you for the thorough review, @LakshyAAAgrawal! I'll address your feedback:

Since tools are categorically different from prompts, they should use a different reflection meta prompt. The default reflection meta prompt is shown here https://dspy.ai/api/optimizers/GEPA/GEPA_Advanced/#default-implementation, whereas I assume that the tool must use somewhat different meta prompt. Can you implement a propose_new_texts method that mimics the default_proposer shown in the link above for all prompts, but calls to a tool description specific prompt/signature for tool evolution.
Can you also add some description to the documentation, explaining that this feature is beneficial for React agents.
(This is not a requirement to merge the PR) Would it be possible to add a simple and short tutorial demonstrating the use and performance improvement via tool evolution?

I'll start working on items 1 and 2 and update the PR soon. Please let me know if you have any specific preferences for the tutorial format!

LakshyAAAgrawal · 2025-10-10T06:20:53Z

Thanks a lot! For the tutorial, I think you can follow the current GEPA tutorial format (load a dataset, show an example from the dataset, build a dspy program, evaluate the baseline program on testset, run GEPA with new optimization settings, show the optimized programs' prompts and tool descriptions, and finally evaluate the optimized program).

Hopefully we should be able to see a nice and large gain on agentic tasks with this amazing contribution by you!

- Add ToolProposer with GenerateImprovedToolDescription signature - Implement routing logic to separate tools from signatures - Tools use ToolProposer, signatures use custom or parent default - Backward compatible: preserves existing custom_instruction_proposer behavior - Add test verifying routing splits components correctly

- Define tool functions outside class for clarity - Match structure of simple ReAct example - Add clear comments explaining architecture - Make code more readable and maintainable

Ju-usc · 2025-10-10T09:40:59Z

Hi @LakshyAAAgrawal,

I've implemented the tool-specific proposer as requested! Here's what's included:

1. Tool-Specific Proposer Implementation ✅

Added GenerateImprovedToolDescriptionFromFeedback signature with a specialized reflection prompt
Implemented ToolProposer and SingleComponentToolProposer following the MultiModalInstructionProposer pattern
Routing logic in DspyAdapter that directs tools to ToolProposer and signatures to custom/default proposers
Fully backward compatible with existing custom instruction proposers

2. Documentation ✅

Added comprehensive section to GEPA_Advanced.md
Explains when to use tool optimization (ReAct agents, multi-agent systems)
Includes usage examples for both simple and nested agent architectures
Documents how to inspect optimized tool descriptions

Reflection Prompt Design:
The tool-specific prompt is intentionally open-ended to avoid prescriptive patterns that might lead to local minima. It asks the LM to identify patterns in successful/unsuccessful tool usage and extract domain-specific information, without suggesting specific heuristics.

Before I create a short tutorial (item #3), would you have any feedback on:

The reflection prompt design - is it general enough? Any improvements you'd suggest?
The implementation approach - does the routing logic make sense?
The documentation - anything unclear or missing?

Any feedback would be helpful before I invest time in the tutorial. Thank you!

Ju-usc · 2025-10-11T03:01:40Z

wait there is a bug in the implementation working on it to fix. Also test has to be fixed.

…euse Tools now copy ReAct's reflective data with tool-specific annotation instead of complex trajectory extraction. This 15-line approach reuses ReAct's existing context (thoughts, tool calls, observations) and adds focused annotation for each tool. Implementation: - Tools receive full ReAct reflective examples (same trajectory context) - Feedback prefixed: [Optimizing tool: 'X'] for focused optimization - Reflection LM sees complete multi-step execution traces per tool Benefits: - Simpler: 15 lines vs 70+ line extraction approach - Reuses code: No duplicate trajectory formatting logic - Same context: Tools see full ReAct execution traces - Clean: Removed all debug output Tests: - 4 focused tests following GEPA patterns (removed 1 redundant) - 226KB fixture with 34 LM + 6 reflection calls - All tests passing with gpt-5-nano traces Documentation: - Updated GEPA_Advanced.md with implementation details - Explains reflective dataset construction approach

LakshyAAAgrawal · 2025-10-11T05:07:54Z

docs/docs/api/optimizers/GEPA/GEPA_Advanced.md

+
+The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.
+
+Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide **which tool to use** in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions.


which tool to use, when to use it, and how to use it. All three are captured by the description.

Let's avoid the word "fundamentally". One can imagine that all of tool descriptions can (and many times do) simply included in the system prompt itself.

Please also add a corresponding entry in GEPA Overview, that links to this file/section.

LakshyAAAgrawal · 2025-10-11T05:10:25Z

docs/docs/api/optimizers/GEPA/GEPA_Advanced.md

+
+Consider enabling `optimize_tool_descriptions=True` when:
+
+- **Building ReAct agents**: ReAct agents rely on tool descriptions to make action selection decisions


One should consider using this, when they use dspy.Tool anywhere in the DSPy program. Here are a few scenarios for using dspy.Tool:

LakshyAAAgrawal · 2025-10-11T05:11:18Z

docs/docs/api/optimizers/GEPA/GEPA_Advanced.md

+
+1. **Discovers all tools**: Traverses your program including nested sub-modules to find all `dspy.Tool` instances
+2. **Categorizes components**: Separates tools (identified by `tool:` prefix) from signature instructions
+3. **Routes components appropriately**:


During reflection, routes the components to be modified by the right metaprompt...

LakshyAAAgrawal · 2025-10-11T05:14:40Z

docs/docs/api/optimizers/GEPA/GEPA_Advanced.md

+)
+```
+
+**Note:** Tool optimization is fully backward compatible. Existing programs without tools, or with `optimize_tool_descriptions=False`, continue to work exactly as before.


I don't think we need to inform users about backward compatibility here. It should be implicit that there should be no behaviour changes for any program not containing dspy.Tool.

LakshyAAAgrawal · 2025-10-11T05:54:35Z

dspy/teleprompt/gepa/gepa.py

            raised if a mismatch in module-level and predictor-level score is detected.
+        optimize_tool_descriptions: Whether to optimize tool descriptions for modules with tools 
+            (e.g., ReAct agents). When enabled, tool descriptions are included in the optimization 
+            process alongside signature instructions. Default is False.


Add a link to GEPA Advanced/Tool section

LakshyAAAgrawal · 2025-10-11T06:01:25Z

dspy/teleprompt/gepa/gepa_utils.py

                    )

            self.propose_new_texts = custom_propose_new_texts
+        elif self.optimize_tool_descriptions:


Edge case: What should happen when user tries to provide both a custom proposer, and enables optimize_tool_descriptions

LakshyAAAgrawal · 2025-10-11T06:13:31Z

dspy/teleprompt/gepa/gepa_utils.py

+                # Handle signature components - replicate proposer's default behavior
+                sig_texts = {}
+                if sig_components:
+                    from gepa.strategies.instruction_proposal import InstructionProposalSignature


This is a slight deviation from this PR, but would be a large enhancement (feel free to ignore):

Create 2 fields, self.instruction_proposal_signature and self.tool_proposer, which are initialized to the default InstructionProposalSignature and ToolProposerSignature.

Take an argument from dspy.GEPA that can override the default signature values.

LakshyAAAgrawal · 2025-10-11T06:17:15Z

dspy/teleprompt/gepa/gepa_utils.py

+        # Second pass: Process tools by copying ReAct data with annotation
+        react_module_name = None
+        for name in ret_d.keys():
+            if "react" in name.lower():


Is this robust? Might it be better to use isinstance or some other way?

LakshyAAAgrawal · 2025-10-11T06:25:21Z

dspy/teleprompt/gepa/instruction_proposal.py

+
+    Your task is to write a better description for this tool.
+
+    Read the examples carefully and identify patterns in when the tool was used successfully versus when it was misused or overlooked. Identify any domain-specific information about the tool's capabilities or appropriate usage that may not be available to the assistant in the future. The assistant may have developed effective patterns for tool selection - if so, ensure the tool description supports those patterns.


Tool use. Also suggest identifying any failure modes of the tool?

dspy/teleprompt/gepa/instruction_proposal.py

LakshyAAAgrawal · 2025-10-11T06:45:56Z

Dear @Ju-usc,

This is a great PR. Thanks a lot! I have tried to be overly critical and made too many nits. Feel free to ignore if you disagree with something. Let me know if you'd like me to address anything!

Regarding the meta prompt, overall I think it looks great. However, I suggest that as you build the tutorial, you may find that the reflection prompt needs tweaking, or the content exposed in reflective_dataset for the tool may be lacking or need improvement. This is going to be an empirical exercise, which will guide what works in the reflection meta prompts. ! Looking forward to the tutorial on this too!

You may already have thoughts about what you'd like to show in the tutorial, but if not, you may consider building off (https://kargarisaac.medium.com/building-and-optimizing-multi-agent-rag-systems-with-dspy-and-gepa-2b88b5838ce2) by @kargarisaac.

- Add GenerateImprovedToolDescriptionFromFeedback signature documentation - Include tool-aware metric example showing trajectory access - Document tool prefix annotation in feedback - Note component_selector applies to both signatures and tools - Fix 'fundamentally' language per reviewer feedback

- Separate Pass 1 (predictor examples) and Pass 2 (tool aggregation) - Clarify Generated Outputs includes full trajectory for ReAct - Fix feedback annotation format to [Tool 'name' from 'predictor_key'] - Add Component Identification & Proposer Routing section - Explain dual-proposer independence (custom proposer doesn't affect tool proposer) - Use consistent terminology: 'predictor' and 'signature instructions'

Per reviewer feedback, backward compatibility should be implicit

- Add component_selector='all' to optimize all components together - Show how to view optimized tool descriptions - Add example output demonstrating improvement from vague to specific descriptions - Remove unnecessary comments for cleaner examples

- Document why full ReAct trajectory is shared with all tools - Explain rationale: tool interdependencies, selection patterns, workflow context - Add concrete example of optimization benefit - Describe alternative considered (tool-specific filtering) and rejection reasoning - Add future work section on joint tool optimization - Present two architectural approaches: separate proposer vs extending ReAct proposer - Include implementation details, benefits, challenges, and decision rationale

- Add Tool Description Optimization section to GEPA overview.md with link to advanced guide - Add documentation link to optimize_tool_descriptions parameter in gepa.py - Addresses reviewer feedback to make tool optimization more discoverable

…ion details - Restructure 'When to Use' as numbered list (1-5) per reviewer feedback - Move section after implementation details for better flow - Remove tool: prefix implementation detail from component identification - Explain tool discovery via ReAct modules in user-friendly terms - Add custom proposer compatibility clarification - Address optional PR feedback items (11 & 13)

Ju-usc · 2025-10-13T01:26:41Z

Hi @LakshyAAAgrawal,

Thanks for the detailed review! I tried my best to address all the critical points, though I may have missed something or introduced new issues. Feel free to add any thoughts! I also have a few questions for you.

Implementation:
Built tool optimization by discovering tools through ReAct modules, creating reflective datasets that share full trajectories with each tool, and routing to a specialized tool proposer. Refactored the string-matching approach you flagged (replaced "react" in name.lower() with proper module traversal). Also streamlined tests and fixed the custom proposer routing to ensure independence.

Documentation:
Added comprehensive docs covering the tool-specific reflection prompt, implementation flow (Pass 1/Pass 2), usage examples with visualization, and design rationale in code comments. Restructured "When to Use" as a numbered list per your feedback and removed implementation details from user-facing sections.

Key design decision: For reflective dataset, we share/deepcopy the complete ReAct trajectory with each tool (not just tool-specific segments) because tools need full context to understand selection patterns and workflow dependencies. This trades token efficiency for richer optimization.

Future work - would love your thoughts:

For reflective dataset for tools, I documented two approaches to reduce token duplication (see code comments for details): joint optimization with ReAct instruction by a separate proposer, or extending the instruction proposer to include tools. The second seems more elegant. What's your take - should we pursue one of these, or is the current approach good enough?

Testing:
The existing tests cover the core functionality, but I'm not sure if they're robust/clean enough. Could you provide guidance on:

What test scenarios should we add to make this production-ready?
Any edge cases or failure modes I should be testing for?
Test structure/organization improvements?

Ready for re-review!

- Add note that proposed architecture details may change - Expand challenges with counterpoints and questions - Mark implementation notes as optional to avoid overengineering

Ju-usc added 3 commits October 9, 2025 20:07

style: fix ruff formatting (trailing whitespace)

cf0be4f

style: apply ruff formatting fixes

aa53fe2

Ju-usc added 2 commits October 10, 2025 02:06

docs(gepa): clean up multi-agent example code

c4f2041

- Define tool functions outside class for clarity - Match structure of simple ReAct example - Add clear comments explaining architecture - Make code more readable and maintainable

Ju-usc force-pushed the feature/tool-description-optimization branch from 197f077 to c4f2041 Compare October 10, 2025 09:38

LakshyAAAgrawal reviewed Oct 11, 2025

View reviewed changes

dspy/teleprompt/gepa/instruction_proposal.py Show resolved Hide resolved

LakshyAAAgrawal reviewed Oct 11, 2025

View reviewed changes

dspy/teleprompt/gepa/instruction_proposal.py Outdated Show resolved Hide resolved

Ju-usc added 7 commits October 11, 2025 17:38

fix(gepa): unify custom proposer routing for tools

04f7e3d

docs(gepa): clarify tool reflection prompt

f92e184

test: streamline GEPA tool optimization tests

7178869

fix(gepa): streamline tool proposer formatting

e34703b

test(gepa): drop legacy dummy tool fixture

3f05311

Ju-usc added 5 commits October 12, 2025 17:22

docs(gepa): remove backward compatibility note

ea1204a

Per reviewer feedback, backward compatibility should be implicit

docs(gepa): clarify future work section in code comments

19d7717

- Add note that proposed architecture details may change - Expand challenges with counterpoints and questions - Mark implementation notes as optional to avoid overengineering

Ju-usc requested a review from LakshyAAAgrawal October 13, 2025 22:53


		The `optimize_tool_descriptions` parameter enables GEPA to optimize tool descriptions in addition to signature instructions. This is particularly valuable for ReAct agents and other tool-using systems, where the quality of tool descriptions directly impacts the agent's ability to select appropriate tools for each task.

		Unlike signature instructions that guide reasoning strategies, tool descriptions serve a fundamentally different purpose: they help agents decide which tool to use in a given situation. GEPA recognizes this categorical difference and applies a specialized reflection prompt tailored for tool selection decisions.


		Consider enabling `optimize_tool_descriptions=True` when:

		- Building ReAct agents: ReAct agents rely on tool descriptions to make action selection decisions


		Your task is to write a better description for this tool.

		Read the examples carefully and identify patterns in when the tool was used successfully versus when it was misused or overlooked. Identify any domain-specific information about the tool's capabilities or appropriate usage that may not be available to the assistant in the future. The assistant may have developed effective patterns for tool selection - if so, ensure the tool description supports those patterns.

feat(gepa): add tool description optimization for multi-agent systems #8928

Are you sure you want to change the base?

feat(gepa): add tool description optimization for multi-agent systems #8928

Uh oh!

Conversation

Ju-usc commented Oct 10, 2025

Description

Issue

Changes

Usage Example

Backward Compatibility

Tests

Uh oh!

Ju-usc commented Oct 10, 2025

Uh oh!

LakshyAAAgrawal commented Oct 10, 2025

Uh oh!

Ju-usc commented Oct 10, 2025

Uh oh!

Ju-usc commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshyAAAgrawal Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LakshyAAAgrawal commented Oct 11, 2025

Uh oh!

Ju-usc commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ju-usc commented Oct 11, 2025 •

edited

Loading

LakshyAAAgrawal Oct 11, 2025 •

edited

Loading