-
Notifications
You must be signed in to change notification settings - Fork 41
Prefill assistant response #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you achieve this with structured output using a regular expression as the prefix of the sentence you want the assistant to produce? For example:
|
That's a really great idea Brad! I am a bit worried about the performance implications though. When using a prefill we would want to skip generating the prefilled tokens. With the Regex approach, I assume the API would still regenerate each token of the prefilled text. So the regex approach would provide all the guidance benefits of the prefilling, but would still result in the same amount of work/latency as if there was no prefill. Just a note on our specific use case. We are building a type of general spell checking tool. If the user checks a long document and then changes a word in the middle of the document we would like to provide the already corrected document up until that change as part of the response so the LLM doesn't recompute the corrections for it. (I am aware of the planned Proofreader API, which we might use when it is launched.) |
I agree that having something like this would be useful for your use case and other autofill-style interactions. Having an explicit prefill API will let use do this more efficiently, and is more straightforward than using a responseConstraint to force this behavior. @domenic fyi |
I just came across a discussion of this today at https://standardcompletions.org/. I like the DeepSeek I think this would be pretty easy to add, so I'm happy to put up a PR doing so. |
The DeepSeek |
It should be possible to provide a partial assistant message and have the assistant complete the rest of it.
This is useful for guiding the response, helping enforcing certain output structures, and having the assistant expand on an already completed message.
Examples of this in practice are:
From the explainer it seems like the multi-user example (https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#customizing-the-role-per-prompt) might support this.
However, it appears, at least in Chrome 137, the LLM always starts a new sentence and doesn't continue partially completed sentences that are in final assistant last message.
For example, the following will consistently start a new sentence or add something like "... [response]" instead of completing the started sentence
The text was updated successfully, but these errors were encountered: