-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
In models that sample tokens from the prior, it is unnecessary to actually run the LLM on the newly sampled token unless the particle survives the next resampling step. Maybe there is a good way to buffer or lazily execute the LLM calls so that this optimization is automated.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels