Skip to content

Prefill assistant response #118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
scblaze opened this issue May 25, 2025 · 6 comments
Open

Prefill assistant response #118

scblaze opened this issue May 25, 2025 · 6 comments
Labels
ecosystem parity A feature that other popular language model APIs offer enhancement New feature or request

Comments

@scblaze
Copy link

scblaze commented May 25, 2025

It should be possible to provide a partial assistant message and have the assistant complete the rest of it.

This is useful for guiding the response, helping enforcing certain output structures, and having the assistant expand on an already completed message.

Examples of this in practice are:

From the explainer it seems like the multi-user example (https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#customizing-the-role-per-prompt) might support this.

However, it appears, at least in Chrome 137, the LLM always starts a new sentence and doesn't continue partially completed sentences that are in final assistant last message.

For example, the following will consistently start a new sentence or add something like "... [response]" instead of completing the started sentence

const multiUserSession = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a mediator in a discussion between two departments."
  }]
});

const result = await multiUserSession.prompt([
  { role: "user", content: "Marketing: We need more budget for advertising campaigns." },
  { role: "assistant", content: "That makes me wonder what" }
]);
@bradtriebwasser
Copy link

Could you achieve this with structured output using a regular expression as the prefix of the sentence you want the assistant to produce?

For example:

const regExp = /Brazil is located in.*/;
const session = await LanguageModel.create();
const result = await session.prompt(
  `Finish the sentence.`,
  { responseConstraint: regExp }
);

console.log(result); // "Brazil is located in South America, but it is **not** part of the Caribbean region."

@scblaze
Copy link
Author

scblaze commented May 29, 2025

That's a really great idea Brad!

I am a bit worried about the performance implications though. When using a prefill we would want to skip generating the prefilled tokens. With the Regex approach, I assume the API would still regenerate each token of the prefilled text. So the regex approach would provide all the guidance benefits of the prefilling, but would still result in the same amount of work/latency as if there was no prefill.

Just a note on our specific use case. We are building a type of general spell checking tool. If the user checks a long document and then changes a word in the middle of the document we would like to provide the already corrected document up until that change as part of the response so the LLM doesn't recompute the corrections for it. (I am aware of the planned Proofreader API, which we might use when it is launched.)

@clarkduvall
Copy link

I agree that having something like this would be useful for your use case and other autofill-style interactions. Having an explicit prefill API will let use do this more efficiently, and is more straightforward than using a responseConstraint to force this behavior. @domenic fyi

@domenic
Copy link
Collaborator

domenic commented May 30, 2025

I just came across a discussion of this today at https://standardcompletions.org/. I like the DeepSeek prefix: true approach (as does the creator of that website).

I think this would be pretty easy to add, so I'm happy to put up a PR doing so.

@scblaze
Copy link
Author

scblaze commented May 30, 2025

The DeepSeek prefix: true approach looks great.

@domenic
Copy link
Collaborator

domenic commented Jun 5, 2025

I put up a draft explainer/spec PR at #124 . Writing the example made me feel that we should prioritize #44 in addition, as a similar small response-constraining feature.

@domenic domenic added enhancement New feature or request ecosystem parity A feature that other popular language model APIs offer labels Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecosystem parity A feature that other popular language model APIs offer enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants