What's Changed
- Full support for "experiments" (design of experiments)
- Each "well-lit" path now has both an "experiment" file (accessible via execution of
e2e.sh) and a scenario (accessible via execution of bothe2e.shandstandup.sh/teardown.sh). - All scenarios tested, and an initial experimental dataset collected and made available. The exception at this point is the "wide-ep-lws", slated for the next release
- Each "well-lit" path now has both an "experiment" file (accessible via execution of
- Code conversion (
bashtopython)- Individual standup
standup[steps] (https://github.com/llm-d/llm-d-benchmark/tree/main/setup/steps)0,1,2,3,4,6,7,8converted frombashtopython
- Individual standup
- Better support for the execution of the benchmark load generating phase -
run.sh- against pre-deployed stacks.- Automatically detect current
namespace,llm-dstack URL, and served model name. - Do not require a hugging face token when generating load
- Generate the standardized benchmark report taking into account that the stack was pre-deployed, and not all deployment parameters are available.
- Automatically detect current
- Benchmark report generation and data analysis
- The standardized benchmark report had its format refined and updated to accommodate all different harnesses
- For each "well-lit" path a Jupyter Analysis Notebook (e.g., analysis_pd was created.
- Documentation overhaul
- Main documentation significantly expanded.
- Individual components (e.g., Benchmarking Report and Configuration Explorer) have their own docs, indexed from the main
- Publicly available experimental data.
- Experimental runs for each "well-lit" path and data is publicly available at the project's Google Drive
- Configuration Explorer
- The number of parameters required to successfully deploy a model served by an
llm-dstack - while making efficient use of scarce resources such as GPUs - pointed to the need for some mechanism to help users avoiding obvious "dead ends" (i.e., standup scenarios bound to fail due to lack of resources) - The Configuration Explorer is a standalone tool which provides two main functionalities:
- "capacity planner": given certain input parameters, will the llm-d stack be even capable of serving a model?
- "configuration sweeper": given certain input parameters and workload parameters, what is the maximum/average recorded performance?
- The "capacity planner" is presently available as an stand-alone UI and also as library fully integrated on the benchmark lifecycle (e.g.,
standup.sh).
- The number of parameters required to successfully deploy a model served by an
- Initial support for multiple-models with modelservice
- A single stack has multiple models, and each model can be individually accessed via different URLs
- This capability relies on the llm-d-modelservice standup method
- More extensive CI/CD
- Run full tests, testing all standup methods, whenever a PR is open
- Test every single standup method and harness nightly.
Regular Contributors to this release
- @namasl
- @kalantar
- @Vezio
- @jgchn
- @deanlorenz
- @manoelmarques
- @achandrasekar
- @yossiovadia
- @pancak3
- @maugustosilva
New Contributors
- @petecheslock made their first contribution in #314
- @mengmeiye made their first contribution in #388
- @Edwinhr716 made their first contribution in #42
Full Changelog: v0.2.9...v0.3.0