Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving benchmarking scripts with real prompts in heterogenous GPU story #722

Open
nwangfw opened this issue Feb 20, 2025 · 0 comments
Open
Assignees
Labels
area/heterogeneous kind/bug Something isn't working

Comments

@nwangfw
Copy link
Collaborator

nwangfw commented Feb 20, 2025

🐛 Describe the bug

In our current gpu-benchmarking scripts, we always use the prompt "Hi Hi Hi ..." to test model performance. deepeseek-coder-7b model always returns a long enough response until the maximum length. Therefore, we can simply generating desired length of input by changing number of "Hi" in our requests and usemax-length parameter in query to get the desired output length.

Image

However, we found that this benchmarking method doesn't work in the 33b model, which returns a very short response for such a prompt, which means our current benchmarking strategy is no longer working.

We need to improve our exiting benchmarking script to make it general enough to work on any model. The current idea is that

  1. We need to create a dataset and send all prompts there to the model and records their corresponding response length.
  2. Write program to filter different input-output pattern prompts from them and use the filtered prompts for benchmarking tests.
  3. Automate above process and run it before we run our current benchmarking script.

Steps to Reproduce

Image

Expected behavior

We expect to use real prompts with different input-output pattern for benchmarking tests.

Environment

-LLM used: deepseek-coder-33b

@nwangfw nwangfw added area/heterogeneous kind/bug Something isn't working labels Feb 20, 2025
@nwangfw nwangfw self-assigned this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/heterogeneous kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant