-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLaMA2 model to pt nightly tests #965
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ManfeiBai
We need a metric to detect regressions. For inference, we can define a latency/token
threshold that we know is reachable. For training, we can define a MFU threshold.
Let's use our benchmark results to define the right threshold values. Wdyt?
@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands) |
For metric to detect regression, agree to add threshold for inference, do we want to add for training, we didn't find training in ReadME yet, so we don't have training for LLaMA2 in this PR, or we mean training for other models? For |
sure, we have GPU test on XL-ML-dashboard already, let's add LLaMA2 on GPU test in the follow-up PR |
Bunch of comments - for the record |
local test failed with |
tested locally and output expected |
will try to |
will create new PRs for GPU tests |
will ad threshold in the follow up PRs |
has added so current PR contain Hi, @miladm, would you mind review this PR again and mark the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the inference latency threshold and train thresholds. Will that be a follow up PR?
we need to push this change asap. Though, it seems we will need a follow up PR that would target more accurate thresholds, different model sizes, on GPU and TPU. Let's push this in for now. Thank you. @wonjoolee95 to take a second look as well.
yes, we have threshold for inference and will have a followup PR for threshold of training will merge this PR now |
Description
Add LLaMA2 model to xl-ml-dashboard to track
Tests
./scripts/run-oneshot.sh -t pt-nightly-l2-n-i-func-v4-8-1vm
./scripts/run-oneshot.sh -t pt-nightly-l2-t-h-f-func-v4-8-1vm
(
l2
meansllama2
,-e
means-eager
,-c
means-chat
,-t
means-tocken
,-q
means-quantization
,-w
means-without-download
,-n-i
means-next-inference
,-s
means-stable
,-xla
means-xla
)Checklist
Before submitting this PR, please make sure (put X in square brackets):