-
Notifications
You must be signed in to change notification settings - Fork 46
Overview/ait 189 intro token #3035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: AIT-129-AIT-Docs-release-branch
Are you sure you want to change the base?
Overview/ait 189 intro token #3035
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
GregHolmes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! (I can't approve or anything as I raised it)
@rainbowFi I've left some comments on my thoughts.
I also think we need to be careful and remember that if some of this (such as the full list of agents/frameworks) isn't available on release, we need to remove the TODO comments.
| * [Complex message patterns](#message) | ||
| * [Enterprise controls](#enterprise) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these not be "Advanced messaging" and "User input" ? (Maybe user input isn't a helpful title).
But they're the sections defined in the AIT Docs IA Miro.
|
|
||
| ### Complex message patterns <a id="message"/> | ||
|
|
||
| Truly interactive AI experiences require more than a simple HTTP request-response exchange between a single client and agent. AI transport allows the use of [complex messaging patterns](//TODO: Link here), for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing if this is meant to be where Advanced messaging is, the link would be /docs/ai-transport/features/advanced-messaging Yet to be created though.
|
|
||
| ### Enterprise controls <a id="enterprise"/> | ||
|
|
||
| Ably's platform provides [integrations](/docs/platform/integrations) and capabilities to ensure that your application will meet the requirements of enterprise environments, for example [message auditing](/docs/platform/integrations/streaming), [client identification](/docs/auth/identified-clients) and [RBAC](/docs/auth/capabilities). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call it capabilities elsewhere in the docs, should we keep with capabilities instead of RBAC?
| meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution." | ||
| --- | ||
|
|
||
| Token streaming is a technique used with Large Language Models (LLMs) where the model's response is transmitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real-time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this paragraph is necessarily correct. We're focusing specifically on streaming per token in this paragraph. But the more preferred way is also valid, streaming per response? Should we talk about that too?
Also, probably an AI addition but it's realtime :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking about the general definition of token streaming here, rather than anything to do with our recommendations of how to token stream over Ably (which comes later)
|
|
||
|  | ||
|
|
||
| If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The exact mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The exact mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. | |
| If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. in detail the mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. |
| ### Message-per-token <a id="pattern-per-token"/> | ||
| Token streaming with [message-per-token](/docs/ai-transport/features/token-streaming/message-per-token) is a pattern where every token generated by your model is published as its own Ably message. Each token then appears as one message in the channel history. | ||
|
|
||
| This pattern is useful when clients only care about the most recent part of a response and you are happy to treat the channel history as a short sliding window rather than a full conversation log. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other possible reason for using message-per-token is where you want the Ably transport to preserve the specific breakdown of the response into separate fragments. This might be because some higher-level framework is dependent on knowing that breakdown, or is handling token concatenation in some way that is incompatible with Ably performing concatenation of fragments.
400eb09 to
f8056cb
Compare
a5c00a1 to
f904645
Compare
mschristensen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this - I left a few comments. Taking a step back, given this is such a key piece of the offering, I feel that we can do more to describe the value proposition for token streaming over Ably. Are there ways we can explicitly enumerate the key parts of the user experience that constitute a great token streaming experience? We could then contrast those with the complexities of achievng this in a connection-oriented HTTP streaming model, and how Ably solves this out of the box.
I think there is some overlap conceptually with the Sessions & identity overview, but I think it would be okay to repeat some of that here, with a token-streaming rather than session emphasis.
Let's discuss in our catch up tomorrow :)
| meta_description: "Learn about token streaming with Ably AI Transport, including common patterns and the features provided by the Ably solution." | ||
| --- | ||
|
|
||
| Token streaming is a technique used with Large Language Models (LLMs) where the model's response is emitted progressively as each token is generated, rather than waiting for the complete response before transmission begins. This allows users to see the response appear incrementally, similar to watching someone type in real time, giving an improved user experience. This is normally accomplished by streaming the tokens as the response to an HTTP request from the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we prefer to use single word "realtime" at Ably.
(This is not what most of the internet seems to do, but alas this is our convention)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is normally accomplished by streaming the tokens as the response to an HTTP request from the client."
I think this can be moved out into a new paragraph. I think the intro paragraph should focus on the description of what token streaming is before getting into how it is implemented.
Then, I would suggest colocating this statement with the content that follows after the image, since that paragraph starts by describing the weakness of this approach.
| If an HTTP stream is interrupted, for example because the client loses network connection, then any tokens that were transmitted during the interruption will be lost. Ably AI Transport solves this problem by streaming tokens to a [Pub/Sub channel](docs/channels), which is not tied to the connection state of either the client or the agent. A client that [reconnects](/docs/connect/states#connection-state-recovery) can receive any tokens transmitted while it was disconnected. If a new client connects, for example because the user has moved to a different device, then it is possible to hydrate the new client with all the tokens transmitted for the current request as well as the output from any previous requests. The detailed mechanism for doing this will depend on which [token streaming pattern](#patterns) you choose to use. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a wall of text, but there are some nice bits of value prop in there. Can we pull those out, perhaps into bullets?
aebe2c1 to
ea0ac8d
Compare
General overview intro page for AIT, giving a summary of major feature groups
Overview page for token streaming - set direction, link to later pages
Co-authored-by: Paddy Byers <[email protected]>
This reverts commit 78b0411.
d2cb41d to
b560500
Compare
Description
Adds overview pages to the documentation covering
the overall AIT product, listing the major features and linking them to other documentation
token streaming, including an overview of the proposed architecture and patterns