Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLaMA2 model to pt nightly tests #965

Merged
merged 26 commits into from
Aug 25, 2023

Conversation

ManfeiBai
Copy link
Collaborator

@ManfeiBai ManfeiBai commented Aug 2, 2023

Description

Add LLaMA2 model to xl-ml-dashboard to track

Tests

./scripts/run-oneshot.sh -t pt-nightly-l2-n-i-func-v4-8-1vm
./scripts/run-oneshot.sh -t pt-nightly-l2-t-h-f-func-v4-8-1vm

(l2 means llama2, -e means -eager, -c means -chat, -t means -tocken, -q means -quantization, -w means -without-download, -n-i means -next-inference, -s means -stable, -xla means -xla)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run one-shot tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Copy link
Collaborator

@miladm miladm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ManfeiBai

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

@miladm
Copy link
Collaborator

miladm commented Aug 7, 2023

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

@ManfeiBai
Copy link
Collaborator Author

ManfeiBai commented Aug 7, 2023

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

For metric to detect regression, agree to add threshold for inference, do we want to add latency/token threshold by using it to mark test success or failed?

for training, we didn't find training in ReadME yet, so we don't have training for LLaMA2 in this PR, or we mean training for other models?

For using benchmark results to define the right threshold values, sure, will do

@ManfeiBai
Copy link
Collaborator Author

ManfeiBai commented Aug 7, 2023

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

sure, we have GPU test on XL-ML-dashboard already, let's add LLaMA2 on GPU test in the follow-up PR

@miladm
Copy link
Collaborator

miladm commented Aug 18, 2023

Bunch of comments - for the record

@ManfeiBai
Copy link
Collaborator Author

local test failed with ModuleNotFoundError: No module named 'torch.distributed._functional_collectives'

@ManfeiBai
Copy link
Collaborator Author

tested locally and output expected

@ManfeiBai ManfeiBai requested a review from miladm August 22, 2023 17:42
@ManfeiBai
Copy link
Collaborator Author

Thanks @ManfeiBai

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

will try to latency/token for inference accoridng to pytorch-tpu/llama#35 or calculate directly

@ManfeiBai
Copy link
Collaborator Author

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

will create new PRs for GPU tests

@ManfeiBai
Copy link
Collaborator Author

will ad threshold in the follow up PRs

@ManfeiBai
Copy link
Collaborator Author

ManfeiBai commented Aug 24, 2023

has added latency/token as a threshold for inference

so current PR contain inference + llama2 + v4-8 + threshold

Hi, @miladm, would you mind review this PR again and mark the change request fixed when this PR is ready to merge or need new feature?

Copy link
Collaborator

@miladm miladm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the inference latency threshold and train thresholds. Will that be a follow up PR?

we need to push this change asap. Though, it seems we will need a follow up PR that would target more accurate thresholds, different model sizes, on GPU and TPU. Let's push this in for now. Thank you. @wonjoolee95 to take a second look as well.

@ManfeiBai
Copy link
Collaborator Author

I don't see the inference latency threshold and train thresholds. Will that be a follow up PR?

we need to push this change asap. Though, it seems we will need a follow up PR that would target more accurate thresholds, different model sizes, on GPU and TPU. Let's push this in for now. Thank you. @wonjoolee95 to take a second look as well.

yes, we have threshold for inference and will have a followup PR for threshold of training

will merge this PR now

@JackCaoG JackCaoG merged commit 56d5ea7 into GoogleCloudPlatform:master Aug 25, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants