oh #47
Replies: 3 comments 1 reply
-
Im sure evaluator-optimizer can be applied here somehow, perhaps after a tool_selection decision/agent run. However, the lack of long-term memory still seems like an issue? |
Beta Was this translation helpful? Give feedback.
-
Cool - lots to dig to in to here, but super-quick thoughts on a couple of those points:
I've pinged this thread over to someone who is doing some cutting edge work on the other problems you raise (@SecretiveShell) - hopefully they can add a comment too. |
Beta Was this translation helpful? Give feedback.
-
There are a few ways to do this. My personal favourite is to use a vector embeddings model (like
This is another really cool idea, that can also be solved by a vector search. You would require an LLM to summarise what a server does by aggregating across all the prompts, and using that as the initial embedding. You would then do a similar workflow but the llm would just describe the integration it needs not the actual tools themselves.
You could vector embed the tools, and then use a density based clustering algorithm to group similar tools automatically (see dbscan). from here you can collect 4-5 samples of tool calls from each tool, and then have an agent review them and perform the ranking on quality. Speed is easily timed and cost is generally a fixed rate for the API endpoint used. You can calculate a centroid for each cluster and use that in your tool vector search, and then pick the best tool for the job based on those evaluated metrics, based on the specific circumstances of the tool call.
You could break this out into different agent policies, such as:
This is not something I have thought about as much, but you could probably add a notes style annotation system, such that the llm can annotate tool descriptions with extra data when it encounters a failure. When this happens you could then recompute the embeddings to have the tool calling system improve over time, and teach itself to better utilise the tools through real world experience. That would be an interesting experiment to run. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Beta Was this translation helpful? Give feedback.
All reactions