You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/conceptual_guides/intro_agents.md
+22-29Lines changed: 22 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,39 +15,28 @@ rendered properly in your Markdown viewer.
15
15
-->
16
16
# Introduction to Agents
17
17
18
-
### What is an agent?
18
+
### 🤔 What are agents?
19
19
20
-
Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task.
20
+
Any efficient system using AI will need to provide LLMs some kind of access to the real world: for instance the possibility to call a search tool to get external information, or to act on certain programs in order to solve a task. In other words, LLMs should have ***agency***. Agentic programs are the gateway to the outside world for LLMs.
21
21
22
-
In other words, give them some ***agency***. Agentic programs are the gateway to the outside world for LLMs.
23
-
24
-
For a rigorous definition, AI Agents are *“programs in which the workflow is determined by LLM outputs”*.
22
+
> [!TIP]
23
+
> AI Agents are **programs where LLM outputs control the workflow**.
25
24
26
25
Any system leveraging LLMs will integrate the LLM outputs into code. The influence of the LLM's input on the code workflow is the level of agency of LLMs in the system.
27
26
28
-
Note that with this definition, "agent" is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less influence to the LLM on your workflow.
29
-
30
-
- If the output of the LLM has no impact on the workflow, as in a program that just postprocesses a LLM's output and returns it, this system is not agentic at all.
31
-
- If an LLM output is used to determine which branch of an `if/else` switch is ran, the system starts to have some level of agency: it's a router.
32
-
33
-
Then it can get more agentic.
27
+
Note that with this definition, "agent" is not a discrete, 0 or 1 definition: instead, "agency" evolves on a continuous spectrum, as you give more or less power to the LLM on your workflow.
34
28
35
-
- If you use an LLM output to determine which function is run and with which arguments, that's tool calling.
36
-
- If you use an LLM output to determine if you should keep iterating in a while loop, you have a multi-step agent.
29
+
See in the table below how agency can vary across systems:
37
30
38
-
| Agency Level | Description | How that's called | Example Pattern |
|★★★ | LLM output controls iteration and program continuation | Multi-step Agent|`while llm_should_continue(): execute_next_step()`|
37
+
|★★★ | One agentic workflow can start another agentic workflow | Multi-Agent |`if llm_trigger(): execute_agent()`|
45
38
46
-
Since the system’s versatility goes in lockstep with the level of agency that you give to the LLM, agentic systems can perform much broader tasks than any classic program.
47
-
48
-
Programs are not just tools anymore, confined to an ultra-specialized task : they are agents.
49
-
50
-
One type of agentic system is quite simple: the multi-step agent. It has this structure:
39
+
The multi-step agent has this code structure:
51
40
52
41
```python
53
42
memory = [user_defined_task]
@@ -57,7 +46,11 @@ while llm_should_continue(memory): # this loop is the multi-step part
57
46
memory += [action, observations]
58
47
```
59
48
60
-
This agentic system just runs in a loop, execution a new action at each step (the action can involve calling some pre-determined *tools* that are just functions), until its observations make it apparent that a satisfactory state has been reached to solve the given task.
49
+
This agentic system runs in a loop, executing a new action at each step (the action can involve calling some pre-determined *tools* that are just functions), until its observations make it apparent that a satisfactory state has been reached to solve the given task. Here’s an example of how a multi-step agent can solve a simple math question:
@@ -114,12 +107,12 @@ All these elements need tight coupling to make a well-functioning system. That's
114
107
115
108
Why is code better? Well, because we crafted our code languages specifically to be great at expressing actions performed by a computer. If JSON snippets were a better way, JSON would be the top programming language and programming would be hell on earth.
116
109
117
-
Code has better:
110
+
Writing actions in code rather than JSON-like snippets provides better:
118
111
119
112
-**Composability:** could you nest JSON actions within each other, or define a set of JSON actions to re-use later, the same way you could just define a python function?
120
113
-**Object management:** how do you store the output of an action like `generate_image` in JSON?
121
-
-**Generality:** code is built to express simply anything you can do have a computer do.
122
-
-**Representation in LLM training corpuses:**why not leverage this benediction of the sky that plenty of quality actions have already been included in LLM training corpuses?
114
+
-**Generality:** code is built to express simply anything you can have a computer do.
115
+
-**Representation in LLM training data:** plenty of quality code actions is already included in LLMs’ training data which means they’re already trained for this!
123
116
124
117
This is illustrated on the figure below, taken from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030).
0 commit comments