Add LLaMA2 model to pt nightly tests #965

ManfeiBai · 2023-08-02T23:43:27Z

Description

Add LLaMA2 model to xl-ml-dashboard to track

Tests

./scripts/run-oneshot.sh -t pt-nightly-l2-n-i-func-v4-8-1vm
./scripts/run-oneshot.sh -t pt-nightly-l2-t-h-f-func-v4-8-1vm

(l2 means llama2, -e means -eager, -c means -chat, -t means -tocken, -q means -quantization, -w means -without-download, -n-i means -next-inference, -s means -stable, -xla means -xla)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run one-shot tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

miladm

Thanks @ManfeiBai

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

miladm · 2023-08-07T18:16:14Z

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

ManfeiBai · 2023-08-07T18:29:08Z

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

For metric to detect regression, agree to add threshold for inference, do we want to add latency/token threshold by using it to mark test success or failed?

for training, we didn't find training in ReadME yet, so we don't have training for LLaMA2 in this PR, or we mean training for other models?

For using benchmark results to define the right threshold values, sure, will do

ManfeiBai · 2023-08-07T19:04:34Z

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

sure, we have GPU test on XL-ML-dashboard already, let's add LLaMA2 on GPU test in the follow-up PR

miladm · 2023-08-18T01:33:23Z

Bunch of comments - for the record

ManfeiBai · 2023-08-18T08:01:17Z

local test failed with ModuleNotFoundError: No module named 'torch.distributed._functional_collectives'

ManfeiBai · 2023-08-22T05:54:21Z

tested locally and output expected

ManfeiBai · 2023-08-22T18:11:45Z

Thanks @ManfeiBai

We need a metric to detect regressions. For inference, we can define a latency/token threshold that we know is reachable. For training, we can define a MFU threshold.

Let's use our benchmark results to define the right threshold values. Wdyt?

will try to latency/token for inference accoridng to pytorch-tpu/llama#35 or calculate directly

ManfeiBai · 2023-08-22T18:12:18Z

@ManfeiBai wdyt we submit a follow up PR for GPU testing as well? (after this work successfully lands)

will create new PRs for GPU tests

ManfeiBai · 2023-08-24T00:07:45Z

will ad threshold in the follow up PRs

ManfeiBai · 2023-08-24T20:40:39Z

has added latency/token as a threshold for inference

so current PR contain inference + llama2 + v4-8 + threshold

Hi, @miladm, would you mind review this PR again and mark the change request fixed when this PR is ready to merge or need new feature?

miladm

I don't see the inference latency threshold and train thresholds. Will that be a follow up PR?

we need to push this change asap. Though, it seems we will need a follow up PR that would target more accurate thresholds, different model sizes, on GPU and TPU. Let's push this in for now. Thank you. @wonjoolee95 to take a second look as well.

ManfeiBai · 2023-08-25T18:15:24Z

I don't see the inference latency threshold and train thresholds. Will that be a follow up PR?

we need to push this change asap. Though, it seems we will need a follow up PR that would target more accurate thresholds, different model sizes, on GPU and TPU. Let's push this in for now. Thank you. @wonjoolee95 to take a second look as well.

yes, we have threshold for inference and will have a followup PR for threshold of training

will merge this PR now

ManfeiBai added 9 commits August 2, 2023 16:43

Add LLaMA2 model to pt nightly tests

c777f8e

Update targets.jsonnet adding LLaMA2

8a6fb51

Update llama2-model.libsonnet with data

164eb1a

Update llama2-model.libsonnet format

fb616f6

Update targets.jsonnet

bd8b059

Update llama2-model.libsonnet

1ff2c21

Update llama2-model.libsonnet

5d78abc

Update llama2-model.libsonnet

433da86

Update llama2-model.libsonnet

4bc01a3

miladm requested changes Aug 7, 2023

View reviewed changes

ManfeiBai added 9 commits August 16, 2023 15:51

Update llama2-model.libsonnet

8e0ac8d

Update llama2-model.libsonnet

a726647

Update llama2-model.libsonnet

98cc9bb

Update llama2-model.libsonnet

4a8c9f6

Update llama2-model.libsonnet

01f5484

Update llama2-model.libsonnet

315f2e7

Update llama2-model.libsonnet

761fd1c

Update llama2-model.libsonnet

13bc21c

Update llama2-model.libsonnet

c2a6695

miladm assigned ManfeiBai Aug 18, 2023

ManfeiBai added 2 commits August 21, 2023 10:41

Update llama2-model.libsonnet

64d2765

Update llama2-model.libsonnet

335c2d2

ManfeiBai added 2 commits August 22, 2023 00:21

Update llama2-model.libsonnet

252143b

Update llama2-model.libsonnet

99a6b0a

ManfeiBai requested a review from miladm August 22, 2023 17:42

add latency/token as a threshold for llama2 test

87d508b

ManfeiBai added 2 commits August 24, 2023 12:51

Update llama2-model.libsonnet with latency/token threshold

6923691

Update llama2-model.libsonnet

8de98b7

Update llama2-model.libsonnet with training

18f87a1

miladm requested a review from wonjoolee95 August 25, 2023 18:01

miladm approved these changes Aug 25, 2023

View reviewed changes

JackCaoG merged commit 56d5ea7 into GoogleCloudPlatform:master Aug 25, 2023
5 checks passed

ManfeiBai added the llama2 label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLaMA2 model to pt nightly tests #965

Add LLaMA2 model to pt nightly tests #965

ManfeiBai commented Aug 2, 2023 •

edited

Loading

miladm left a comment

miladm commented Aug 7, 2023

ManfeiBai commented Aug 7, 2023 •

edited

Loading

ManfeiBai commented Aug 7, 2023 •

edited

Loading

miladm commented Aug 18, 2023

ManfeiBai commented Aug 18, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 24, 2023

ManfeiBai commented Aug 24, 2023 •

edited

Loading

miladm left a comment

ManfeiBai commented Aug 25, 2023

Add LLaMA2 model to pt nightly tests #965

Add LLaMA2 model to pt nightly tests #965

Conversation

ManfeiBai commented Aug 2, 2023 • edited Loading

Description

Tests

Checklist

miladm left a comment

Choose a reason for hiding this comment

miladm commented Aug 7, 2023

ManfeiBai commented Aug 7, 2023 • edited Loading

ManfeiBai commented Aug 7, 2023 • edited Loading

miladm commented Aug 18, 2023

ManfeiBai commented Aug 18, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 22, 2023

ManfeiBai commented Aug 24, 2023

ManfeiBai commented Aug 24, 2023 • edited Loading

miladm left a comment

Choose a reason for hiding this comment

ManfeiBai commented Aug 25, 2023

ManfeiBai commented Aug 2, 2023 •

edited

Loading

ManfeiBai commented Aug 7, 2023 •

edited

Loading

ManfeiBai commented Aug 7, 2023 •

edited

Loading

ManfeiBai commented Aug 24, 2023 •

edited

Loading