oh #47

gitanon112 · 2025-04-08T19:50:29Z

gitanon112
Apr 8, 2025

gitanon112 · 2025-04-08T19:57:25Z

gitanon112
Apr 8, 2025
Author

Im sure evaluator-optimizer can be applied here somehow, perhaps after a tool_selection decision/agent run. However, the lack of long-term memory still seems like an issue?

0 replies

evalstate · 2025-04-08T21:07:53Z

evalstate
Apr 8, 2025
Maintainer

Cool - lots to dig to in to here, but super-quick thoughts on a couple of those points:

History Management. For fast-agent, agents can have their history restored via MCP Prompts (which are user/assistant pairs). If you use the ***SAVE_HISTORY feature (or programatically by agent._llm.message_history). There's a bit more about that here: https://fast-agent.ai/agents/prompting/#mcp-prompts and here: https://fast-agent.ai/models/. There's a bit more work to do to make it seamless, and a features to add but that's the intent.
The Basic Memory MCP Server is amazing. As a shared storage between Agents it's great, it's editable markdown. I would see this as orthogonal to Agent state storage - it's a storage and retrieval mechanism for facts.

I've pinged this thread over to someone who is doing some cutting edge work on the other problems you raise (@SecretiveShell) - hopefully they can add a comment too.

0 replies

SecretiveShell · 2025-04-08T21:45:41Z

SecretiveShell
Apr 8, 2025

We implement a form of dynamic server/tool selection such that given a task, an agent can "choose" from a list of servers/tools which it will need to complete the task(aka an agent will receive the task, find servers_needed, and create an agent with those servers for the task)

There are a few ways to do this. My personal favourite is to use a vector embeddings model (like sentence-transformers/all-MiniLM-L6-v2) to create a vector representation of what the tool does. You can then get the agent to list the tools it needs for the job, and match those to the precomputed tool vectors. This allows you to have thousands of tools and only select the ones you want, whist also handling deduplication (which is important for your 2 web search tools).

We implement solo "server-specialist" agents for each server that act as "pros" for that server, however, this runs into a similar issue of how can the orchestrator/router "decide" what is the best agent to choose for a task.

This is another really cool idea, that can also be solved by a vector search. You would require an LLM to summarise what a server does by aggregating across all the prompts, and using that as the initial embedding. You would then do a similar workflow but the llm would just describe the integration it needs not the actual tools themselves.

Another point/issue I think is relevant is storing tool run statistics, in particular, the metrics SPEED, COST,OUTPUT QUALITY. For example, say tool A and tool B both do web search w/same cost/output quality, but tool A takes 5 seconds vs tool B taking 10 seconds. In our ideal future, we want our agent to realize this and choose tool A over tool B.

You could vector embed the tools, and then use a density based clustering algorithm to group similar tools automatically (see dbscan). from here you can collect 4-5 samples of tool calls from each tool, and then have an agent review them and perform the ranking on quality. Speed is easily timed and cost is generally a fixed rate for the API endpoint used. You can calculate a centroid for each cluster and use that in your tool vector search, and then pick the best tool for the job based on those evaluated metrics, based on the specific circumstances of the tool call.

You can see how this can give tool_selection_agent a lot of power -> ex: tool A is faster and cheaper but output quality is 5/10, while tool B is slower and costlier but output quality is 10/10. Based on the task, tool_selection_agent can better choose tools. Say it determines that the given task necessitates high output quality -> can choose tool B over tool A.

You could break this out into different agent policies, such as:

always pick the cheapest (reducing overall cost)
always pick the most accurate (boosting overall performance)
always pick the fastest (good for draft documents or tests etc)
or a smarter algorithm that can take in situational context like you suggest.

I looked over how memory is implemented in the repo + prompted agents, and currently it seems every agents memory is short-term/only by session. I think a memory system that is long-term/accessible by all agents would be a significant next step. For example, say in session 1 agent uses a tool wrong, realises this, and tries again/corrects till it gets it right. However, lack of long-term memory = it will make the same mistakes again in session 2. If it had a way to store these insights long-term, this could be avoided. Not sure the best way to approach this, there are a bunch of mcp memory server implementations(https://www.pulsemcp.com/servers?q=memory) basic-memory mcp stands out, another cool one here. But maybe a custom implementation for our specific use may be better.

This is not something I have thought about as much, but you could probably add a notes style annotation system, such that the llm can annotate tool descriptions with extra data when it encounters a failure. When this happens you could then recompute the embeddings to have the tool calling system improve over time, and teach itself to better utilise the tools through real world experience. That would be an interesting experiment to run.

1 reply

gitanon112 Apr 8, 2025
Author

Wow, many great insights here, thank you and @evalstate a lot. Will keep you both in the loop on any progress I make in these areas in this thread

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

oh #47

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

oh #47

Uh oh!

Uh oh!

gitanon112 Apr 8, 2025

Replies: 3 comments · 1 reply

Uh oh!

gitanon112 Apr 8, 2025 Author

Uh oh!

Uh oh!

evalstate Apr 8, 2025 Maintainer

Uh oh!

SecretiveShell Apr 8, 2025

Uh oh!

gitanon112 Apr 8, 2025 Author

gitanon112
Apr 8, 2025

Replies: 3 comments 1 reply

gitanon112
Apr 8, 2025
Author

evalstate
Apr 8, 2025
Maintainer

SecretiveShell
Apr 8, 2025

gitanon112 Apr 8, 2025
Author