Remove threadsafe #2907

grimoire · 2024-12-17T08:58:31Z

Thread-safe mode has been removed.
asyncio.Queue -> asyncio.Event
Better host performance

Note that EOS would be output in this PR.

lvhan028 · 2024-12-18T03:42:36Z

We have users who use pytorch engine in multi-thread env.
Pls provide a guide for them about migrating the non-threadsafe pytorch engine

lvhan028 · 2024-12-18T03:50:16Z

Add WARNING that threadsafe is removed

lvhan028 · 2024-12-18T03:56:28Z

"Better host performance", so what's the performance now?

grimoire · 2024-12-18T08:32:34Z

"Better host performance", so what's the performance now?

llama3-8b, tp=1, 3000 prompt, 256 concurrency

concurrency: 256
elapsed_time: 133.107s

first token latency(s)(min, max, ave): 0.119, 4.574, 0.621
per-token latency(s) percentile(50, 75, 95, 99): [0.028, 0.03, 0.284, 0.47]

number of prompt tokens: 676779
number of completion tokens: 612685
token throughput (completion token): 4602.956 token/s
token throughput (prompt + completion token): 9687.436 token/s
RPS (request per second): 22.538 req/s
RPM (request per minute): 1352.297 req/min

llama3-8b, tp=1, 10000 prompt, 512 concurrency

concurrency: 512
elapsed_time: 386.856s

first token latency(s)(min, max, ave): 0.259, 7.529, 0.823
per-token latency(s) percentile(50, 75, 95, 99): [0, 0.055, 0.894, 1.138]

number of prompt tokens: 2238358
number of completion tokens: 1995438
token throughput (completion token): 5158.094 token/s
token throughput (prompt + completion token): 10944.123 token/s
RPS (request per second): 25.849 req/s
RPM (request per minute): 1550.966 req/min

lvhan028 · 2024-12-18T08:42:14Z

"Note that EOS would be output in this PR."
@lzhangzz will the tm refactoring you are working on output the EOS and stop_token_id to async_engine?

lmdeploy/pytorch/engine/engine_checker.py

docs/zh_cn/advance/pytorch_multithread.md

RunningLeon

LGTM

lzhangzz · 2024-12-25T06:21:53Z

"Note that EOS would be output in this PR." @lzhangzz will the tm refactoring you are working on output the EOS and stop_token_id to async_engine?

We need to discuss how EOS/stop_token_ids should be skipped in the async engine.

For some models, EOS is part of their chat template we may exclude the token in the reponse but step should not be rewinded (i.e. the token is kept in kv cache).

However, for models like vicuna, EOS must be excluded from both response and kv cache (rewind step to the token before EOS).

grimoire added 5 commits December 17, 2024 10:55

remove threadsafe

19738b2

optimize performance

acab4e7

22.4

47a03e7

22.5

7f8dec6

delete jsonl

a1caab8

lvhan028 requested review from AllentDan, irexyc and RunningLeon December 17, 2024 13:03

lvhan028 added the improvement label Dec 17, 2024

Merge branch 'main' into remove-threadsafe

2de5db2

add docs

1007f86

RunningLeon reviewed Dec 18, 2024

View reviewed changes

lmdeploy/pytorch/engine/engine_checker.py Outdated Show resolved Hide resolved

fix link

053a34e

RunningLeon reviewed Dec 18, 2024

View reviewed changes

docs/zh_cn/advance/pytorch_multithread.md Show resolved Hide resolved

grimoire added 2 commits December 18, 2024 19:28

rst

7254c0d

Merge branch 'main' into remove-threadsafe

b9998d9

RunningLeon approved these changes Dec 23, 2024

View reviewed changes

grimoire added 2 commits December 23, 2024 11:53

remove sleep req step

d62076b

remove scheduler sleep

6ded3f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove threadsafe #2907

Remove threadsafe #2907

grimoire commented Dec 17, 2024 •

edited

Loading

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

grimoire commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

RunningLeon left a comment

lzhangzz commented Dec 25, 2024

Remove threadsafe #2907

Are you sure you want to change the base?

Remove threadsafe #2907

Conversation

grimoire commented Dec 17, 2024 • edited Loading

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

grimoire commented Dec 18, 2024

lvhan028 commented Dec 18, 2024

RunningLeon left a comment

Choose a reason for hiding this comment

lzhangzz commented Dec 25, 2024

grimoire commented Dec 17, 2024 •

edited

Loading