Replies: 1 comment
-
No, this is not supported unless you modify vLLM code |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a unique setup with a large document, many user might have questions for it.
Currently, i'm concatenating the
document || question
as prompt for each request. Which do benefit from prefix cache, but still requires some amount of overhead. i wonder if it's possible to havecache_reference_to_document || question
support, so it's more explicit and reduce overheadsBeta Was this translation helpful? Give feedback.
All reactions