You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Also, the model and usage are returned as headers. When we grab the model out, we get from response body -> request body -> path. It should be the header first and if not exists, continue the existing flow.
For the usage, we grab from the body. We need to grab from there. Here are what they will look like:
Response Headers
info
These headers should be considered a beta feature, and are subject to change in the future.
x-total-tokens: The number of tokens in both the input prompt and the output.
x-prompt-tokens: The number of tokens in the prompt.
x-generated-tokens: The number of generated tokens.
x-total-time: The total time the request took in the inference server, in milliseconds.
x-time-per-token: The average time it took to generate each output token, in milliseconds.
x-queue-time: The time the request was in the internal inference server queue, in milliseconds.
x-model-id: predibase/Meta-Llama-3.1-8B-Instruct-dequantized (example model name)
Lastly, add docs for Predibase support. Reference other integrations we have, such as the integrations with DeepInfra, Fireworks AI.
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered:
Please approve and merge the PR once you've verified that the changes work. If there are nits, leave a "Request Changes" review for me. Otherwise, checkout the branch to make changes.
What happened?
Within the worker, we map to the predibase base URL. It says https://api.app.predibase.com but it should be https://serving.app.predibase.com
Also, the model and usage are returned as headers. When we grab the model out, we get from response body -> request body -> path. It should be the header first and if not exists, continue the existing flow.
For the usage, we grab from the body. We need to grab from there. Here are what they will look like:
Response Headers
info
These headers should be considered a beta feature, and are subject to change in the future.
x-total-tokens: The number of tokens in both the input prompt and the output.
x-prompt-tokens: The number of tokens in the prompt.
x-generated-tokens: The number of generated tokens.
x-total-time: The total time the request took in the inference server, in milliseconds.
x-time-per-token: The average time it took to generate each output token, in milliseconds.
x-queue-time: The time the request was in the internal inference server queue, in milliseconds.
x-model-id: predibase/Meta-Llama-3.1-8B-Instruct-dequantized (example model name)
Lastly, add docs for Predibase support. Reference other integrations we have, such as the integrations with DeepInfra, Fireworks AI.
Relevant log output
No response
Twitter / LinkedIn details
No response
The text was updated successfully, but these errors were encountered: