Skip to content

Commit a84fa23

Browse files
Add 'nop' scenario, update docs, change categories report (#203)
1 parent be2f5ee commit a84fa23

File tree

3 files changed

+84
-15
lines changed

3 files changed

+84
-15
lines changed

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,18 +95,35 @@ For a discussion of relevant workloads, please consult this [document](https://d
9595

9696
Pieces of information identifying a particular cluster. This information includes, but it is not limited to, GPU model, llm model and llm-d parameters (an environment file, and optionally a `values.yaml` file for modelservice helm charts)
9797

98-
#### Harness
98+
#### Harnesses
9999

100100
Load Generator (python code) which drives the benchmark load. Today, llm-d-benchmark supports [fmperf](https://github.com/fmperf-project/fmperf), [inference-perf](https://github.com/kubernetes-sigs/inference-perf), [guidellm](https://github.com/vllm-project/guidellm.git) and the benchmarks found on the `benchmarks` folder on [vllm](https://github.com/vllm-project/vllm.git). There are ongoing efforts to consolidate and provide an easier way to support different load generators.
101101

102+
The `nop` harness, combined with env. variables and when using in `standalone` mode, will parse the vLLM log and create reports with
103+
loading time statistics.
104+
105+
The additional env. variables to set are:
106+
107+
| Environment Variable | Example Values |
108+
| -------------------------------------------- | -------------- |
109+
| LLMDBENCH_VLLM_STANDALONE_VLLM_LOAD_FORMAT | `safetensors, tensorizer, runai_streamer, fastsafetensors` |
110+
| LLMDBENCH_VLLM_STANDALONE_VLLM_LOGGING_LEVEL | `DEBUG, INFO, WARNING` etc |
111+
| LLMDBENCH_VLLM_STANDALONE_PREPROCESS | `source /setup/preprocess/standalone-preprocess.sh ; /setup/preprocess/standalone-preprocess.py` |
112+
113+
The env. `LMDBENCH_VLLM_STANDALONE_VLLM_LOGGING_LEVEL` must be set to `DEBUG` so that the `nop` categories report finds all categories.
114+
115+
The env. `LLMDBENCH_VLLM_STANDALONE_PREPROCESS` must be set to the above value for the `nop` harness in order to install load format
116+
dependencies, export additional env. variables and pre-serialize models when using the `tensorizer` load format.
117+
The preprocess scripts will run in the vLLM standalone pod before the vLLM server starts.
118+
102119
#### Workload
103120

104121
Workload is the actual benchmark load specification which includes the LLM use case to benchmark, traffic pattern, input / output distribution and dataset. Supported workload profiles can be found under `workload/profiles`.
105122

106123
> [!IMPORTANT]
107124
> The triple `<scenario>`,`<harness>`,`<workload>`, combined with the standup/teardown capabilities provided by [llm-d-infra](https://github.com/llm-d-incubation/llm-d-infra.git) and [llm-d-modelservice](https://github.com/llm-d/llm-d-model-service.git) should provide enough information to allow an experiment to be reproduced.
108125
109-
### Dependecies
126+
### Dependencies
110127

111128
- [llm-d-infra](https://github.com/llm-d-incubation/llm-d-infra.git)
112129
- [llm-d-modelservice](https://github.com/llm-d/llm-d-model-service.git)
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Empty env. variables need to be filled by user
2+
3+
export LLMDBENCH_HF_TOKEN=
4+
export LLMDBENCH_IMAGE_REGISTRY=
5+
export LLMDBENCH_IMAGE_REPO=
6+
export LLMDBENCH_IMAGE_NAME=
7+
export LLMDBENCH_IMAGE_TAG=
8+
export LLMDBENCH_HARNESS_SERVICE_ACCOUNT=llm-d-benchmark-runner
9+
10+
export LLMDBENCH_HARNESS_NAME=nop
11+
export LLMDBENCH_HARNESS_EXPERIMENT_PROFILE=nop.yaml
12+
export LLMDBENCH_CONTROL_WORK_DIR=~/llm-d-benchmark
13+
export LLMDBENCH_DEPLOY_METHODS=standalone
14+
export LLMDBENCH_DEPLOY_MODEL_LIST=llama-3b
15+
16+
17+
export LLMDBENCH_VLLM_COMMON_NAMESPACE=
18+
export LLMDBENCH_VLLM_COMMON_AFFINITY=nvidia.com/gpu.product:NVIDIA-A100-SXM4-80GB
19+
export LLMDBENCH_VLLM_COMMON_PVC_STORAGE_CLASS=
20+
export LLMDBENCH_VLLM_COMMON_REPLICAS=1
21+
22+
export LLMDBENCH_VLLM_STANDALONE_VLLM_LOAD_FORMAT=safetensors
23+
#export LLMDBENCH_VLLM_STANDALONE_VLLM_LOAD_FORMAT=tensorizer
24+
#export LLMDBENCH_VLLM_STANDALONE_VLLM_LOAD_FORMAT=runai_streamer
25+
#export LLMDBENCH_VLLM_STANDALONE_VLLM_LOAD_FORMAT=fastsafetensors
26+
27+
# set to debug so that all vllm log lines can be categorized
28+
export LLMDBENCH_VLLM_STANDALONE_VLLM_LOGGING_LEVEL=DEBUG
29+
30+
# source preprocessor script that will install libraries for some load formats and set env. variables
31+
# run preprocessor python that will change the debug log date format and pre-serialize a model when using
32+
# tensorizer load format
33+
export LLMDBENCH_VLLM_STANDALONE_PREPROCESS="source /setup/preprocess/standalone-preprocess.sh ; /setup/preprocess/standalone-preprocess.py"
34+
35+
export LLMDBENCH_VLLM_STANDALONE_IMAGE=vllm/vllm-openai:v0.10.0

workload/harnesses/nop-llm-d-benchmark.py

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,17 @@
1616
import logging
1717
from typing import Any
1818
from urllib.parse import urljoin, urlparse
19+
from pathlib import Path
1920
import pandas
2021
import requests
2122

22-
from pathlib import Path
2323
from kubernetes import client, config
2424

2525
# Configure logging
2626
logger = logging.getLogger(__name__)
2727
logger.setLevel(logging.DEBUG)
2828

29-
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
29+
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
3030

3131
REQUEST_TIMEOUT = 60.0 # time (seconds) to wait for request
3232
MAX_VLLM_WAIT = 15.0 * 60.0 # time (seconds) to wait for vllm to respond
@@ -37,6 +37,11 @@
3737
"start": "No plugins for group",
3838
"end": "detected platform",
3939
},
40+
{
41+
"title": "LLM Imports",
42+
"start": "detected platform",
43+
"end": "All plugins in this group will be loaded",
44+
},
4045
{
4146
"title": "Add CLI Args",
4247
"start": "All plugins in this group will be loaded",
@@ -103,7 +108,9 @@ class LogCategory:
103108
start: str = ""
104109
end: str = ""
105110
start_line: str = ""
111+
start_line_number: int = 0
106112
end_line: str = ""
113+
end_line_number: int = 0
107114
next: LogCategory | None = None
108115
parent: LogCategory | None = None
109116
root_child: LogCategory | None = None
@@ -460,7 +467,7 @@ def populate_log_categories(vllm_model: str, logs: str, root_log_category: LogCa
460467
index = idx + 1
461468
logger.info(
462469
"Skip tensorizer serialization. Start from log line %d: %s",
463-
index,
470+
index + 1,
464471
log_list[index],
465472
)
466473
break
@@ -493,7 +500,11 @@ def add_uncategorized_categories(key: list[int], log_category: LogCategory):
493500
key[0] = log_category.key + 1
494501
log_category.title = "Uncategorized"
495502
log_category.start_time = category.end_time
503+
log_category.start_line = category.end_line
504+
log_category.start_line_number = category.end_line_number
496505
log_category.end_time = next_category.start_time
506+
log_category.end_line = next_category.start_line
507+
log_category.end_line_number = next_category.start_line_number
497508
log_category.parent = category.parent
498509
log_category.next = next_category
499510
category.next = log_category
@@ -522,6 +533,7 @@ def populate_log_category(
522533

523534
if category.start_time is not None:
524535
category.start_line = log_list[index]
536+
category.start_line_number = index + 1
525537

526538
if category.end_line == "" and category.end in log_list[index]:
527539
category.end_time = extract_datetime(log_list[index])
@@ -535,6 +547,7 @@ def populate_log_category(
535547

536548
if category.end_time is not None:
537549
category.end_line = log_list[index]
550+
category.end_line_number = index + 1
538551

539552
if category.root_child is not None:
540553
index = populate_log_category(index, log_list, category.root_child)
@@ -700,29 +713,33 @@ def write_log_categories_to_log(log_category: LogCategory, file: io.BufferedWrit
700713
elapsed = ""
701714
if category.start_time is not None and category.end_time is not None:
702715
time_difference = category.end_time - category.start_time
703-
elapsed = f"{time_difference.total_seconds():.2f}"
716+
elapsed = f"{time_difference.total_seconds():.3f}"
704717

705-
file.write(f"Log category : {category.key} '{category.title}'\n")
718+
file.write(f"Log category : {category.key} '{category.title}'\n")
706719
parent_key = f"{category.parent.key}" if category.parent is not None else ""
707-
file.write(f" parent : {parent_key}\n")
720+
file.write(f" parent : {parent_key}\n")
708721
time_format = "%m-%d %H:%M:%S.%f"
709722
date_str = (
710723
category.start_time.strftime(time_format)[:-3]
711724
if category.start_time is not None
712725
else ""
713726
)
714-
file.write(f" start date: {date_str}\n")
727+
file.write(f" start date : '{date_str}'\n")
715728
date_str = (
716729
category.end_time.strftime(time_format)[:-3]
717730
if category.end_time is not None
718731
else ""
719732
)
720-
file.write(f" end date : {date_str}\n")
721-
file.write(f" elapsed : {elapsed}\n")
722-
file.write(f" start : {category.start}\n")
723-
file.write(f" end : {category.end}\n")
724-
file.write(f" start line: {category.start_line}\n")
725-
file.write(f" end line. : {category.end_line}\n")
733+
file.write(f" end date : '{date_str}'\n")
734+
file.write(f" elapsed : {elapsed}\n")
735+
file.write(f" start pattern: '{category.start}'\n")
736+
file.write(f" end pattern : '{category.end}'\n")
737+
file.write(
738+
f" start line : {category.start_line_number} '{category.start_line}'\n"
739+
)
740+
file.write(
741+
f" end line : {category.end_line_number} '{category.end_line}'\n"
742+
)
726743
if category.root_child is not None:
727744
write_log_categories_to_log(category.root_child, file)
728745
category = category.next

0 commit comments

Comments
 (0)