Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BERT Inference Test #459

Merged
merged 17 commits into from
Jul 18, 2024
Merged

Conversation

mattcjo
Copy link
Contributor

@mattcjo mattcjo commented Jul 17, 2024

Issue #, if available:

Description of changes:
This test being added will run an E2E BERT inference test. The validation for this test was done on a cluster consisting of g5.2xlarge instance type. The cluster has two nodes in total, in compliance with the requirement for the unit tests, but the inference test will only be ran on one of the nodes.

Two validation tests were ran, one for each inference mode: throughput (default), and latency. The results of running the test for each inference mode type can be seen below. These logs were obtained from the pod that ran the E2E BERT inference job.

Throughput

go test -v . -args -bertInferenceImage 905417999469.dkr.ecr.us-west-2.amazonaws.com/aws-bert-inference:latest
=== RUN   TestBertInference
=== RUN   TestBertInference/bert-inference
    bert_inference_test.go:43: Labeled node ip-192-168-31-117.us-west-2.compute.internal with nvidia.com/gpu.present=true
=== RUN   TestBertInference/bert-inference/BERT_inference_Job_succeeds
W0717 18:48:35.391004   14598 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
--- PASS: TestBertInference (15.71s)
    --- PASS: TestBertInference/bert-inference (15.71s)
        --- PASS: TestBertInference/bert-inference/BERT_inference_Job_succeeds (15.01s)
PASS
ok  	github.com/aws/aws-k8s-tester/e2e2/test/cases/bert-inference	21.374s

The logs from the worker pod:

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in throughput mode with batch size 8
Inference Mode: throughput
Average time per batch: 0.0139 seconds
Throughput: 575.82 samples/second

Latency

go test -v . -args -bertInferenceImage 905417999469.dkr.ecr.us-west-2.amazonaws.com/aws-bert-inference:latest -inferenceMode latency
=== RUN   TestBertInference
=== RUN   TestBertInference/bert-inference
    bert_inference_test.go:43: Labeled node ip-192-168-31-117.us-west-2.compute.internal with nvidia.com/gpu.present=true
=== RUN   TestBertInference/bert-inference/BERT_inference_Job_succeeds
W0717 18:27:29.449144   31874 warnings.go:70] child pods are preserved by default when jobs are deleted; set propagationPolicy=Background to remove them or set propagationPolicy=Orphan to suppress this warning
--- PASS: TestBertInference (15.73s)
    --- PASS: TestBertInference/bert-inference (15.73s)
        --- PASS: TestBertInference/bert-inference/BERT_inference_Job_succeeds (15.05s)
PASS
ok  	github.com/aws/aws-k8s-tester/e2e2/test/cases/bert-inference	21.383s

The logs from the worker pod:

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
GPU is available
Running inference in latency mode with batch size 1
Inference Mode: latency
Average time per batch: 0.0087 seconds
Throughput: 114.30 samples/second

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


// Label the first node with the GPU label
nodeName := nodes.Items[0].Name
cmd := exec.Command("kubectl", "label", "node", nodeName, "nvidia.com/gpu.present=true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things, (1) need to investigate if the label here is necessary and (2) if so can we use the k8s clients to do these instead of exec-ing out to kubectl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary. Just updated

Copy link
Contributor

@ndbaker1 ndbaker1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition LGTM

# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how important is it to keep on the latest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, not totally sure. I chose to use the same exact one as our other tests to keep it consistent.

@@ -0,0 +1,75 @@
package bert_inference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call this package inference? we realistically may add other inference cases that aren't BERT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure thing. Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for this PR, but we should find a way to share a single manifest for things like device plugins that are used by multiple suites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. Had briefly discussed this on the side with @weicongw, but decided to push any decisions around it to later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to commit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I did not... Removed.

@cartermckinnon cartermckinnon merged commit b9bcce0 into aws:main Jul 18, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants