Release v0.3.0 · llm-d/llm-d-benchmark

What's Changed

Full support for "experiments" (design of experiments)
- Each "well-lit" path now has both an "experiment" file (accessible via execution of e2e.sh) and a scenario (accessible via execution of both e2e.sh and standup.sh/teardown.sh).
- All scenarios tested, and an initial experimental dataset collected and made available. The exception at this point is the "wide-ep-lws", slated for the next release
Code conversion (bash to python)
- Individual standup standup [steps] (https://github.com/llm-d/llm-d-benchmark/tree/main/setup/steps) 0,1,2,3,4,6,7,8 converted from bash to python
Better support for the execution of the benchmark load generating phase - run.sh - against pre-deployed stacks.
- Automatically detect current namespace, llm-d stack URL, and served model name.
- Do not require a hugging face token when generating load
- Generate the standardized benchmark report taking into account that the stack was pre-deployed, and not all deployment parameters are available.
Benchmark report generation and data analysis
- The standardized benchmark report had its format refined and updated to accommodate all different harnesses
- For each "well-lit" path a Jupyter Analysis Notebook (e.g., analysis_pd was created.
Documentation overhaul
- Main documentation significantly expanded.
- Individual components (e.g., Benchmarking Report and Configuration Explorer) have their own docs, indexed from the main
Publicly available experimental data.
- Experimental runs for each "well-lit" path and data is publicly available at the project's Google Drive
Configuration Explorer
- The number of parameters required to successfully deploy a model served by an llm-d stack - while making efficient use of scarce resources such as GPUs - pointed to the need for some mechanism to help users avoiding obvious "dead ends" (i.e., standup scenarios bound to fail due to lack of resources)
- The Configuration Explorer is a standalone tool which provides two main functionalities:
  - "capacity planner": given certain input parameters, will the llm-d stack be even capable of serving a model?
  - "configuration sweeper": given certain input parameters and workload parameters, what is the maximum/average recorded performance?
- The "capacity planner" is presently available as an stand-alone UI and also as library fully integrated on the benchmark lifecycle (e.g., standup.sh).
Initial support for multiple-models with modelservice
- A single stack has multiple models, and each model can be individually accessed via different URLs
- This capability relies on the llm-d-modelservice standup method
More extensive CI/CD
- Run full tests, testing all standup methods, whenever a PR is open
- Test every single standup method and harness nightly.

Regular Contributors to this release

New Contributors

@petecheslock made their first contribution in #314
@mengmeiye made their first contribution in #388
@Edwinhr716 made their first contribution in #42

Full Changelog: v0.2.9...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Regular Contributors to this release

New Contributors

Contributors

Uh oh!