Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nvashutoshd
Copy link

This PR addresses and clarifies the rules for dataset sorting in LLM benchmarks, specifically focusing on the offline scenario. The changes aim to ensure fair and consistent performance evaluation across submissions.

Proposed changes for LLM benchmarks-

  1. Dataset Sorting Limitation: Sorting is restricted to within the boundaries of a single dataset copy. This prevents the creation of batches with identical queries from multiple dataset copies, which could lead to artificial performance advantages.
  2. Clarification on Runtime: Submitters are still allowed to run beyond the minimum duration time, and sorting of queries within a dataset copy is permitted.
  3. Addressing Performance Bias: The new rule eliminates the potential unfair advantage gained from batching identical queries with identical sequence lengths across multiple dataset copies. The refined rules better reflect real-world scenarios, where identical queries are not typically grouped together

@nvashutoshd nvashutoshd requested a review from a team as a code owner February 4, 2025 10:21
Copy link

github-actions bot commented Feb 4, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@nvashutoshd
Copy link
Author

cc @nv-ananjappa @nvzhihanj

@mrmhodak
Copy link
Contributor

mrmhodak commented Feb 4, 2025

WG Notes:

  1. Sorting outside of the dataset boundary prohibited (regardless of runtime).
  2. LLMs instead of list of benchmarks

@nvashutoshd
Copy link
Author

recheck

@mrmhodak
Copy link
Contributor

mrmhodak commented Feb 4, 2025

@nvashutoshd @ashwin:
This seems to be impractical to implement if somebody wants to sort. Loadgen sends one query with all the data, there are no markers where dataset ends and begins. To sort, one would need to hardcode how long the dataset is (or similar), which is explicitly prohibited.

@nvzhihanj
Copy link
Contributor

nvzhihanj commented Feb 4, 2025

This seems to be impractical to implement if somebody wants to sort.

@mrmhodak the scenario you mentioned only happens when: 1) offline scenarios, where all queries are dumped to the SUT in one shot; 2) requested query count is larger than the dataset size (e.g. if you specify a larger min_duration or min_query_count).

I do agree it's impractical to implement sorting in such scenario, but sorting across boundary will enable exploitative actions like batching same queries which should not be allowed. In the future, LoadGen should be the one implementing and accepting sorting lambdas. @pgmpablo157321 FYI.

@nvashutoshd
Copy link
Author

@mrmhodak @nv-ananjappa @nvzhihanj - I've updated the proposed rule change based on our discussion in the WG meeting this morning.

The rules now cover any/all LLMs, and limits sorting outside dataset boundary irrespective of runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants