server : check that the prompt fits in the slot's context #10030

ggerganov · 2024-10-24T08:06:06Z

In embedding and reranking mode, a prompt could fit in the batch but could exceed the slot's context size. This PR adds a check to handle such cases gracefully, instead of crashing.

Testing

./llama-server \
    -m ./models/bge-large-zh-v1.5/ggml-model-f16.gguf \
    --port 8012 -a [email protected] -ngl 100 \
    --embeddings -ub 8192 -b 8192 --pooling cls

curl \
    http://localhost:8012/v1/embeddings -H "Content-Type: application/json" \
    -H "Authorization: Bearer no-key" \
    -d '{"input": ["'"$(printf 'hello %.0s' $(seq 1 550))"'"], "encoding_format": "float"}'

{
  "error": {
    "code": 500,
    "message": "input is larger than the max context size. skipping",
    "type": "server_error"
  }
}

ggml-ci

ngxson · 2024-10-24T19:55:54Z

examples/server/server.cpp

+
+                            if (slot.n_prompt_tokens > slot.n_ctx) {
+                                slot.release();
+                                send_error(slot, "input is larger than the max context size. skipping", ERROR_TYPE_SERVER);


I'd suggest changing some wordings to match the message when number of tokens is larger than n_ubatch

Suggested change

send_error(slot, "input is larger than the max context size. skipping", ERROR_TYPE_SERVER);

send_error(slot, "input is too large to process. increase the context size", ERROR_TYPE_SERVER);

Suggesting to increase the context size can lead to the same issue as the user experienced in #9978 due to setting a context size larger than what the model supports. So I'll leave the version without the suggestion.

server : check that the prompt fits in the slot's context

1905ba1

ggml-ci

github-actions bot added examples python python script changes server labels Oct 24, 2024

ggerganov mentioned this pull request Oct 24, 2024

Bug: llama-server crash with --embeddings #9978

Closed

ngxson reviewed Oct 24, 2024

View reviewed changes

Merge branch 'master' into gg/server-check-ctx

bd67115

ggerganov merged commit bc5ba00 into master Oct 25, 2024
56 checks passed

ggerganov deleted the gg/server-check-ctx branch October 25, 2024 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : check that the prompt fits in the slot's context #10030

server : check that the prompt fits in the slot's context #10030

ggerganov commented Oct 24, 2024

ngxson Oct 24, 2024

ggerganov Oct 25, 2024

	send_error(slot, "input is larger than the max context size. skipping", ERROR_TYPE_SERVER);
	send_error(slot, "input is too large to process. increase the context size", ERROR_TYPE_SERVER);

server : check that the prompt fits in the slot's context #10030

server : check that the prompt fits in the slot's context #10030

Conversation

ggerganov commented Oct 24, 2024

Testing

ngxson Oct 24, 2024

Choose a reason for hiding this comment

ggerganov Oct 25, 2024

Choose a reason for hiding this comment