Add BERT Inference Test #459

mattcjo · 2024-07-17T18:51:18Z

Issue #, if available:

Description of changes:
This test being added will run an E2E BERT inference test. The validation for this test was done on a cluster consisting of g5.2xlarge instance type. The cluster has two nodes in total, in compliance with the requirement for the unit tests, but the inference test will only be ran on one of the nodes.

Two validation tests were ran, one for each inference mode: throughput (default), and latency. The results of running the test for each inference mode type can be seen below. These logs were obtained from the pod that ran the E2E BERT inference job.

Throughput

go test -v . -args -bertInferenceImage 905417999469.dkr.ecr.us-west-2.amazonaws.com/aws-bert-inference:latest
=== RUN   TestBertInference
=== RUN   TestBertInference/bert-inference
    bert_inference_test.go:43: Labeled node ip-192-168-31-117.us-west-2.compute.internal with nvidia.com/gpu.present=true
=== RUN   TestBertInference/bert-inference/BERT_inference_Job_succeeds
W0717 18:48:35.391004   14598 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
--- PASS: TestBertInference (15.71s)
    --- PASS: TestBertInference/bert-inference (15.71s)
        --- PASS: TestBertInference/bert-inference/BERT_inference_Job_succeeds (15.01s)
PASS
ok  	github.com/aws/aws-k8s-tester/e2e2/test/cases/bert-inference	21.374s

The logs from the worker pod:

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in throughput mode with batch size 8
Inference Mode: throughput
Average time per batch: 0.0139 seconds
Throughput: 575.82 samples/second

Latency

go test -v . -args -bertInferenceImage 905417999469.dkr.ecr.us-west-2.amazonaws.com/aws-bert-inference:latest -inferenceMode latency
=== RUN   TestBertInference
=== RUN   TestBertInference/bert-inference
    bert_inference_test.go:43: Labeled node ip-192-168-31-117.us-west-2.compute.internal with nvidia.com/gpu.present=true
=== RUN   TestBertInference/bert-inference/BERT_inference_Job_succeeds
W0717 18:27:29.449144   31874 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
--- PASS: TestBertInference (15.73s)
    --- PASS: TestBertInference/bert-inference (15.73s)
        --- PASS: TestBertInference/bert-inference/BERT_inference_Job_succeeds (15.05s)
PASS
ok  	github.com/aws/aws-k8s-tester/e2e2/test/cases/bert-inference	21.383s

The logs from the worker pod:

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in latency mode with batch size 1
Inference Mode: latency
Average time per batch: 0.0087 seconds
Throughput: 114.30 samples/second

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…builds

ndbaker1 · 2024-07-17T19:40:14Z

e2e2/test/cases/bert-inference/bert_inference_test.go

+
+	// Label the first node with the GPU label
+	nodeName := nodes.Items[0].Name
+	cmd := exec.Command("kubectl", "label", "node", nodeName, "nvidia.com/gpu.present=true")


two things, (1) need to investigate if the label here is necessary and (2) if so can we use the k8s clients to do these instead of exec-ing out to kubectl

Not necessary. Just updated

ndbaker1

addition LGTM

ndbaker1 · 2024-07-17T22:41:52Z

e2e2/test/cases/bert-inference/manifests/nvidia-device-plugin.yaml

+      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
+      priorityClassName: "system-node-critical"
+      containers:
+      - image: nvcr.io/nvidia/k8s-device-plugin:v0.14.2


how important is it to keep on the latest?

To be honest, not totally sure. I chose to use the same exact one as our other tests to keep it consistent.

cartermckinnon · 2024-07-17T22:51:28Z

e2e2/test/cases/bert-inference/bert_inference_test.go

@@ -0,0 +1,75 @@
+package bert_inference


can we just call this package inference? we realistically may add other inference cases that aren't BERT

Yeah sure thing. Done.

cartermckinnon · 2024-07-17T22:52:09Z

e2e2/test/cases/bert-inference/manifests/nvidia-device-plugin.yaml

This is fine for this PR, but we should find a way to share a single manifest for things like device plugins that are used by multiple suites

Totally agree. Had briefly discussed this on the side with @weicongw, but decided to push any decisions around it to later.

…neralized

…eralized

cartermckinnon · 2024-07-17T23:39:32Z

e2e2/test/cases/inference/manifests/bert-inference-manual.yaml

Did you intend to commit this?

No I did not... Removed.

mattcjo added 13 commits July 11, 2024 17:53

Add image for e2e bert inference testing, and all its dependencies

ffe3cf8

Update git workflow with a new action to verify bert inference image …

4589c65

…builds

Add base structure for bert-inference test case

aa96507

Update base inference manifest

f5d36e2

Rename Dockerfile and inference manifest

7eb5445

Add base of main_test.go for inference test

6df845b

Merge remote-tracking branch 'upstream/main' into bert-inference-test

2afb889

Remove NVIDIA directory from e2e2/test/cases/bert-inference/

38f2ed6

Delete mpi operator manifest since it's unnecessary

2963d1a

Update main_test.go to apply nvidia manifest

b07e4f8

Add bert inference test to apply bert-inference manifest

1983202

Templatize bert-inference manifest

ff114eb

remove commented code from bert inference manifest

3686375

ndbaker1 reviewed Jul 17, 2024

View reviewed changes

Remove labeling of node since it isn't necessary

cb464ef

ndbaker1 approved these changes Jul 17, 2024

View reviewed changes

ndbaker1 reviewed Jul 17, 2024

View reviewed changes

cartermckinnon reviewed Jul 17, 2024

View reviewed changes

mattcjo added 2 commits July 17, 2024 22:59

Change go package name from bert_inference to inference to be more ge…

7992c09

…neralized

Change directory name from bert-inference to inference to be more gen…

8ee6e29

…eralized

cartermckinnon reviewed Jul 17, 2024

View reviewed changes

remove manifest used solely for testing

1b17fb8

cartermckinnon approved these changes Jul 18, 2024

View reviewed changes

cartermckinnon merged commit b9bcce0 into aws:main Jul 18, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BERT Inference Test #459

Add BERT Inference Test #459

mattcjo commented Jul 17, 2024 •

edited

Loading

ndbaker1 Jul 17, 2024

mattcjo Jul 17, 2024

ndbaker1 left a comment

ndbaker1 Jul 17, 2024

mattcjo Jul 17, 2024

cartermckinnon Jul 17, 2024

mattcjo Jul 17, 2024

cartermckinnon Jul 17, 2024

mattcjo Jul 17, 2024

cartermckinnon Jul 17, 2024

mattcjo Jul 17, 2024

Add BERT Inference Test #459

Add BERT Inference Test #459

Conversation

mattcjo commented Jul 17, 2024 • edited Loading

Throughput

Latency

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndbaker1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattcjo commented Jul 17, 2024 •

edited

Loading