Improving benchmarking scripts with real prompts in heterogenous GPU story #722

nwangfw · 2025-02-20T22:24:25Z

🐛 Describe the bug

In our current gpu-benchmarking scripts, we always use the prompt "Hi Hi Hi ..." to test model performance. deepeseek-coder-7b model always returns a long enough response until the maximum length. Therefore, we can simply generating desired length of input by changing number of "Hi" in our requests and usemax-length parameter in query to get the desired output length.

However, we found that this benchmarking method doesn't work in the 33b model, which returns a very short response for such a prompt, which means our current benchmarking strategy is no longer working.

We need to improve our exiting benchmarking script to make it general enough to work on any model. The current idea is that

We need to create a dataset and send all prompts there to the model and records their corresponding response length.
Write program to filter different input-output pattern prompts from them and use the filtered prompts for benchmarking tests.
Automate above process and run it before we run our current benchmarking script.

Steps to Reproduce

Expected behavior

We expect to use real prompts with different input-output pattern for benchmarking tests.

Environment

-LLM used: deepseek-coder-33b

The text was updated successfully, but these errors were encountered:

nwangfw added area/heterogeneous kind/bug Something isn't working labels Feb 20, 2025

nwangfw self-assigned this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving benchmarking scripts with real prompts in heterogenous GPU story #722

Improving benchmarking scripts with real prompts in heterogenous GPU story #722

nwangfw commented Feb 20, 2025 •

edited

Loading

Improving benchmarking scripts with real prompts in heterogenous GPU story #722

Improving benchmarking scripts with real prompts in heterogenous GPU story #722

Comments

nwangfw commented Feb 20, 2025 • edited Loading

🐛 Describe the bug

Steps to Reproduce

Expected behavior

Environment

nwangfw commented Feb 20, 2025 •

edited

Loading