Skip to content

Commit

Permalink
add dummy
Browse files Browse the repository at this point in the history
Signed-off-by: youkaichao <[email protected]>
  • Loading branch information
youkaichao committed Dec 11, 2024
1 parent dd24928 commit f9614f8
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/source/usage/torch_compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,11 +143,11 @@ For a dynamic workload, we can use the ``VLLM_LOG_BATCHSIZE_INTERVAL`` environme
Throughput: 44.39 requests/s, 22728.17 total tokens/s, 11364.08 output tokens/s
$ # 2. Run the same setting with profiling
$ VLLM_LOG_BATCHSIZE_INTERVAL=1.0 python3 benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 64
$ VLLM_LOG_BATCHSIZE_INTERVAL=1.0 python3 benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B --load-format dummy --num-scheduler-steps 64
INFO 12-10 15:42:47 forward_context.py:58] Batchsize distribution (batchsize, count): [(256, 769), (232, 215), ...]
$ # 3. The most common batch sizes are 256 and 232, so we can compile the model for these two batch sizes
$ python3 benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 64 -O "{'level': 3, 'candidate_compile_sizes': [232, 256]}"
$ python3 benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B --load-format dummy --num-scheduler-steps 64 -O "{'level': 3, 'candidate_compile_sizes': [232, 256]}"
init engine (profile, create kv cache, warmup model) took 87.18 seconds
Throughput: 46.11 requests/s, 23606.51 total tokens/s, 11803.26 output tokens/s
Expand Down

0 comments on commit f9614f8

Please sign in to comment.