Prefill assistant response #118

scblaze · 2025-05-25T06:56:04Z

It should be possible to provide a partial assistant message and have the assistant complete the rest of it.

This is useful for guiding the response, helping enforcing certain output structures, and having the assistant expand on an already completed message.

Examples of this in practice are:

Claude's prefill resonse (https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#prefill-claude%E2%80%99s-response)
S1 thinking model (https://arxiv.org/pdf/2501.19393)

From the explainer it seems like the multi-user example (https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#customizing-the-role-per-prompt) might support this.

However, it appears, at least in Chrome 137, the LLM always starts a new sentence and doesn't continue partially completed sentences that are in final assistant last message.

For example, the following will consistently start a new sentence or add something like "... [response]" instead of completing the started sentence

const multiUserSession = await LanguageModel.create({
  initialPrompts: [{
    role: "system",
    content: "You are a mediator in a discussion between two departments."
  }]
});

const result = await multiUserSession.prompt([
  { role: "user", content: "Marketing: We need more budget for advertising campaigns." },
  { role: "assistant", content: "That makes me wonder what" }
]);

The text was updated successfully, but these errors were encountered:

bradtriebwasser · 2025-05-28T17:52:18Z

Could you achieve this with structured output using a regular expression as the prefix of the sentence you want the assistant to produce?

For example:

const regExp = /Brazil is located in.*/;
const session = await LanguageModel.create();
const result = await session.prompt(
  `Finish the sentence.`,
  { responseConstraint: regExp }
);

console.log(result); // "Brazil is located in South America, but it is **not** part of the Caribbean region."

scblaze · 2025-05-29T07:36:23Z

That's a really great idea Brad!

I am a bit worried about the performance implications though. When using a prefill we would want to skip generating the prefilled tokens. With the Regex approach, I assume the API would still regenerate each token of the prefilled text. So the regex approach would provide all the guidance benefits of the prefilling, but would still result in the same amount of work/latency as if there was no prefill.

Just a note on our specific use case. We are building a type of general spell checking tool. If the user checks a long document and then changes a word in the middle of the document we would like to provide the already corrected document up until that change as part of the response so the LLM doesn't recompute the corrections for it. (I am aware of the planned Proofreader API, which we might use when it is launched.)

clarkduvall · 2025-05-29T17:00:07Z

I agree that having something like this would be useful for your use case and other autofill-style interactions. Having an explicit prefill API will let use do this more efficiently, and is more straightforward than using a responseConstraint to force this behavior. @domenic fyi

domenic · 2025-05-30T03:39:13Z

I just came across a discussion of this today at https://standardcompletions.org/. I like the DeepSeek prefix: true approach (as does the creator of that website).

I think this would be pretty easy to add, so I'm happy to put up a PR doing so.

scblaze · 2025-05-30T12:23:32Z

The DeepSeek prefix: true approach looks great.

Closes #118.

domenic · 2025-06-05T05:25:56Z

I put up a draft explainer/spec PR at #124 . Writing the example made me feel that we should prioritize #44 in addition, as a similar small response-constraining feature.

domenic added a commit that referenced this issue Jun 5, 2025

Add prefills (assistant prefixes)

ad70529

Closes #118.

domenic mentioned this issue Jun 5, 2025

Add prefills (assistant prefixes) #124

Open

domenic added enhancement New feature or request ecosystem parity A feature that other popular language model APIs offer labels Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefill assistant response #118

Prefill assistant response #118

scblaze commented May 25, 2025

bradtriebwasser commented May 28, 2025

Uh oh!

scblaze commented May 29, 2025 •

edited

Loading

Uh oh!

clarkduvall commented May 29, 2025

Uh oh!

domenic commented May 30, 2025

Uh oh!

scblaze commented May 30, 2025

Uh oh!

domenic commented Jun 5, 2025

Uh oh!

Prefill assistant response #118

Prefill assistant response #118

Comments

scblaze commented May 25, 2025

bradtriebwasser commented May 28, 2025

Uh oh!

scblaze commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarkduvall commented May 29, 2025

Uh oh!

domenic commented May 30, 2025

Uh oh!

scblaze commented May 30, 2025

Uh oh!

domenic commented Jun 5, 2025

Uh oh!

scblaze commented May 29, 2025 •

edited

Loading