The accuracy of using the batch processing function of the client is low #1354

guleng · 2024-09-09T06:31:29Z

guleng
Sep 9, 2024

@zhyncs
I used the batch processing function of the client, but found that its accuracy was too low to use. Am I using it incorrectly？
Tip: Write a 500 word TV drama script
But the effect of the answer did not meet expectations. When I used the openai interface or generate interface, the answer effect was very good

Here is my script:

import sglang as sgl
import time
@sgl.function
def text_qa(s, question):
    s += "Q: " + question + "\n"
    s += "A:" + sgl.gen("answer", stop="\n", max_tokens=600,temperature=0)
    return s

sgl.set_default_backend(sgl.RuntimeEndpoint("http://j425b8be4181a-8000.us-west-a.proxy.x-gpu.co"))
start_time = time.time()
states = text_qa.run_batch(
    [
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
        {"question": "Write a 500 word TV drama script"},
    ],
    progress_bar=True
)
for i, state in enumerate(states):
    print(f"Result {i + 1}:")
    print(state)
end_time = time.time()
print(end_time - start_time)

Response status returned:

Thank you very much for your help

Answered by merrymercy

Sep 10, 2024

Remove stop="\n" in text_qa to fix the early-stop issue.
For this model, it is better to use chat template (e..g, sgl.user, sgl.assistant)

@sgl.function
def text_qa(s, question):
    s += sgl.user(question)
    s += sgl.assistant(sgl.gen("answer", max_tokens=600, temperature=0))
    return s

View full answer

merrymercy · 2024-09-10T07:39:24Z

merrymercy
Sep 10, 2024
Maintainer

It seems the outputs are truncated. What is your model and the server launch script?

2 replies

guleng Sep 10, 2024
Author

@merrymercy
python3 -m sglang.launch_server --model-path /models/Qwen2-7B-Instruct --host 0.0.0.0 --port 8000 --dp --mem-fraction-static 0.8

merrymercy Sep 10, 2024
Maintainer

Remove stop="\n" in text_qa to fix the early-stop issue.
For this model, it is better to use chat template (e..g, sgl.user, sgl.assistant)

@sgl.function
def text_qa(s, question):
    s += sgl.user(question)
    s += sgl.assistant(sgl.gen("answer", max_tokens=600, temperature=0))
    return s

Answer selected by merrymercy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The accuracy of using the batch processing function of the client is low #1354

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

The accuracy of using the batch processing function of the client is low #1354

guleng Sep 9, 2024

Replies: 1 comment · 2 replies

merrymercy Sep 10, 2024 Maintainer

guleng Sep 10, 2024 Author

merrymercy Sep 10, 2024 Maintainer

guleng
Sep 9, 2024

Replies: 1 comment 2 replies

merrymercy
Sep 10, 2024
Maintainer

guleng Sep 10, 2024
Author

merrymercy Sep 10, 2024
Maintainer