Transformer models generation supports user-provided input embeddings #1276

zongwave · 2024-08-20T10:30:31Z

What does this PR do?

Some multimodal generation model needs user specified embedded tokens input, for example NExT-GPT, this commit enable user-provided input embeddings for model generation.

Input Embeddings Support:

Modified transformers/generation/utils.py to include logic that supports user-provided input embeddings. This allows for more flexibility in how input data can be fed into the models.

Added the --input_embeds option in example/text-generation/run_generation.py to facilitate testing with different models using embedded tokens.

Testing and Performance Evaluation:

Conducted tests using the modified script on the Mistral 7B model with a batch size of 6. The tests included various input scenarios to assess the robustness and performance of the new features.

Enhanced example/text-generation/run_generation.py to support multiple decoding strategies including Greedy, Beam Search, and Contrastive Search. This allows users to specify the decoding strategy during runtime and evaluate the effectiveness of each method.

Fixes # (issue)
Optimum habana transformer models currently do not support user provided input embeddings for token generation.

Usage
To test the new features, use the following command:

python run_generation.py --model_name_or_path mistralai/Mistral-7B-Instruct-v0.2  --input_embeds --batch_size 6

Performance Data:

The table below summarizes the performance data for six popular models with both embeds and tokens input, highlighting the throughput, number of HPU graphs, graph compilation duration, and memory allocation.

Model Name or Path	Input Type	Throughput (tokens/second)	Number of HPU Graphs	Graph Compilation Duration (seconds)	Max Memory Allocated (GB)
tiiuae/falcon-7b	Embeds	32.37	8	13.58	26.97
tiiuae/falcon-7b	Tokens	29.77	7	13.74	26.96
mosaicml/mpt-7b	Embeds	50.4	8	8.77	25.68
mosaicml/mpt-7b	Tokens	50.73	7	9.39	25.68
gpt2	Embeds	93.18	8	4.68	0.65
gpt2	Tokens	76.91	7	5.1	0.65
microsoft/phi-2	Embeds	31.19	8	14.62	10.48
microsoft/phi-2	Tokens	29.55	7	14.63	10.48
mistralai/Mistral-7B-Instruct-v0.2	Embeds	29.44	13	11.04	28.07
mistralai/Mistral-7B-Instruct-v0.2	Tokens	25.6	11	12.09	28.07
Customized LLaMA-based Vicuna	Embeds	28.28	12	10.6	25.31
Customized LLaMA-based Vicuna	Tokens	28.61	11	11.1	25.31

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
[Y] Did you write any new necessary tests?

libinta · 2024-08-31T00:12:52Z

@zongwave as this PR changes the common utils.py, can you run through gaudi2 ci test for text-generation?
https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py
The way to trigger ci test
GAUDI2_CI=true RUN_SLOW=1 python -m pytest test/test_text-generation_example.py -m -k -v xxx
where xxx is specific model or test case,
please validate llama bf16/fp8, 1x/8x

zongwave · 2024-09-05T10:14:47Z

@zongwave as this PR changes the common utils.py, can you run through gaudi2 ci test for text-generation? https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py The way to trigger ci test GAUDI2_CI=true RUN_SLOW=1 python -m pytest test/test_text-generation_example.py -m -k -v xxx where xxx is specific model or test case, please validate llama bf16/fp8, 1x/8x

@libinta
I ran GAUDI_CI test with command:

GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k meta-llama/Llama-2-7b-hf
======================================================================= test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 52 items / 42 deselected / 10 selected

tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 10%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-True-12808] PASSED [ 20%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-False-8711] PASSED [ 30%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 40%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-163-False-128-2048-4774.7] PASSED [ 50%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-94-False-2048-128-1293.3] PASSED [ 60%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-81-False-2048-2048-1942.9] PASSED [ 70%]
tests/test_text_generation_example.py::test_text_generation_torch_compile[token0-meta-llama/Llama-2-7b-hf-102.27823420713148] PASSED [ 80%]
tests/test_text_generation_example.py::test_text_generation_torch_compile_distributed[token0-meta-llama/Llama-2-7b-hf-39.72973199515235] PASSED [ 90%]
tests/test_text_generation_example.py::test_text_generation_distributed_tp[token0-meta-llama/Llama-2-7b-hf-1345.2369318328463] PASSED [100%]

========================================================= 10 passed, 42 deselected in 2522.21s (0:42:02) ==========================================================

zongwave · 2024-09-06T05:06:37Z

@zongwave as this PR changes the common utils.py, can you run through gaudi2 ci test for text-generation? https://github.com/huggingface/optimum-habana/blob/main/tests/test_text_generation_example.py The way to trigger ci test GAUDI2_CI=true RUN_SLOW=1 python -m pytest test/test_text-generation_example.py -m -k -v xxx where xxx is specific model or test case, please validate llama bf16/fp8, 1x/8x

@libinta I ran GAUDI_CI test with command:

GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k meta-llama/Llama

======================================================================= test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 52 items / 37 deselected / 15 selected

tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 6%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-True-12808] PASSED [ 13%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-False-8711] PASSED [ 20%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 26%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-163-False-128-2048-4774.7] PASSED [ 33%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-94-False-2048-128-1293.3] PASSED [ 40%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-81-False-2048-2048-1942.9] PASSED [ 46%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-3042-False-128-128-5374.6] PASSED [ 53%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-750-False-128-2048-7422.4] PASSED [ 60%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-207-False-2048-128-568.5] PASSED [ 66%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-8-172-False-2048-2048-4656.2] PASSED [ 73%]
tests/test_text_generation_example.py::test_text_generation_deepspeed[token0-meta-llama/Llama-2-70b-hf-8-1-64.10514998902435] PASSED [ 80%]
tests/test_text_generation_example.py::test_text_generation_torch_compile[token0-meta-llama/Llama-2-7b-hf-102.27823420713148] PASSED [ 86%]
tests/test_text_generation_example.py::test_text_generation_torch_compile_distributed[token0-meta-llama/Llama-2-7b-hf-39.72973199515235] PASSED [ 93%]
tests/test_text_generation_example.py::test_text_generation_distributed_tp[token0-meta-llama/Llama-2-7b-hf-1345.2369318328463] PASSED [100%]

========================================================= 15 passed, 37 deselected in 7759.19s (2:09:19) ==========================================================

I'm trying to figure out how to trigger llama 8x test case.

regisss · 2024-09-06T08:22:54Z

I'm trying to figure out how to trigger llama 8x test case.

Llama 8x tests are this one and this one.

zongwave · 2024-09-10T10:48:56Z

I'm trying to figure out how to trigger llama 8x test case.

Llama 8x tests are this one and this one.

@regisss @libinta
I ran Gaudi2_CI test with command:
GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k llama

18 llama cases include bf16/fp8, 1x/8x configuration were selected and passed both input embeds and input tokens cases.

I triggered the CI test with "--input_embeds" option manually in test_text_generation_example.py
command += [
f"{path_to_example_dir / 'text-generation' / 'run_generation.py'}",
f"--model_name_or_path {model_name}",
f"--batch_size {batch_size}",
"--use_kv_cache",
"--input_embeds",
f"--max_new_tokens {max_output_tokens}",
]

1. Test results on input embeds generation enabled by adding option "--input_embeds" manually:

============================================================================================ test session starts =============================================================================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 52 items / 34 deselected / 18 selected

tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Meta-Llama-3-8B-1-True-129] PASSED [ 11%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-True-12808] PASSED [ 16%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-False-8711] PASSED [ 22%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-codellama/CodeLlama-34b-hf-1-True-32.644] PASSED [ 27%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 33%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-163-False-128-2048-4774.7] PASSED [ 38%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-94-False-2048-128-1293.3] PASSED [ 44%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-81-False-2048-2048-1942.9] PASSED [ 50%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-3042-False-128-128-5374.6] PASSED [ 55%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-750-False-128-2048-7422.4] PASSED [ 61%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-207-False-2048-128-568.5] PASSED [ 66%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-8-172-False-2048-2048-4656.2] PASSED [ 72%]
tests/test_text_generation_example.py::test_text_generation_deepspeed[token0-meta-llama/Llama-2-70b-hf-8-1-64.10514998902435] PASSED [ 77%]
tests/test_text_generation_example.py::test_text_generation_deepspeed[token0-meta-llama/Meta-Llama-3-70B-Instruct-8-1-64] PASSED [ 83%]
tests/test_text_generation_example.py::test_text_generation_torch_compile[token0-meta-llama/Llama-2-7b-hf-102.27823420713148] PASSED [ 88%]
tests/test_text_generation_example.py::test_text_generation_torch_compile_distributed[token0-meta-llama/Llama-2-7b-hf-39.72973199515235] PASSED [ 94%]
tests/test_text_generation_example.py::test_text_generation_distributed_tp[token0-meta-llama/Llama-2-7b-hf-1345.2369318328463] PASSED [100%]
========================================================================== 18 passed, 34 deselected in 7952.60s (2:12:32) ==========================================================================a

2. Tests results on original input tokens generation:

GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k llama
====================================================================== test session starts =======================================================================
platform linux -- Python 3.10.12, pytest-8.3.2, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 52 items / 34 deselected / 18 selected

tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Meta-Llama-3-8B-1-True-129] PASSED [ 11%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-True-12808] PASSED [ 16%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-False-8711] PASSED [ 22%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-codellama/CodeLlama-34b-hf-1-True-32.644] PASSED [ 27%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 33%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-163-False-128-2048-4774.7] PASSED [ 38%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-94-False-2048-128-1293.3] PASSED [ 44%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-81-False-2048-2048-1942.9] PASSED [ 50%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-3042-False-128-128-5374.6] PASSED [ 55%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-750-False-128-2048-7422.4] PASSED [ 61%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-4-207-False-2048-128-568.5] PASSED [ 66%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-70b-hf-8-172-False-2048-2048-4656.2] PASSED [ 72%]
tests/test_text_generation_example.py::test_text_generation_deepspeed[token0-meta-llama/Llama-2-70b-hf-8-1-64.10514998902435] PASSED [ 77%]
tests/test_text_generation_example.py::test_text_generation_deepspeed[token0-meta-llama/Meta-Llama-3-70B-Instruct-8-1-64] PASSED [ 83%]
tests/test_text_generation_example.py::test_text_generation_torch_compile[token0-meta-llama/Llama-2-7b-hf-102.27823420713148] PASSED [ 88%]
tests/test_text_generation_example.py::test_text_generation_torch_compile_distributed[token0-meta-llama/Llama-2-7b-hf-39.72973199515235] PASSED [ 94%]
tests/test_text_generation_example.py::test_text_generation_distributed_tp[token0-meta-llama/Llama-2-7b-hf-1345.2369318328463] PASSED [100%]

========================================================= 18 passed, 34 deselected in 8202.18s (2:16:42) =========================================================

vidyasiv · 2024-09-11T19:53:17Z

@regisss please take a look

HuggingFaceDocBuilderDev · 2024-09-23T18:53:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM! There is a merge conflict to solve however before we can merge.

zongwave · 2024-09-24T10:17:26Z

LGTM! There is a merge conflict to solve however before we can merge.

@regisss @vidyasiv
I merged the code changes with #948 and verified the CI test for #948 using command:
GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/transformers/tests/models/ -k test_generate_from_inputs_embeds_decoder_only

Unfortunately, 5 test cases failed. It seems this #948 requires generating new tokens from "input_ids" and "inputs_embeds" simultaneously. FAILED tests/transformers/tests/models/llama/test_modeling_llama.py::LlamaModelTest::test_generate_from_inputs_embeds_decoder_only - AssertionError: Lists differ: [[70,[66 chars]7, 78], [90, 71, 10, 82, 86, 98, 64, 64, 64, 6[37 chars] 64]] != [[70,[66 chars]7, 78, 39, 95, 41], [90, 71, 10, 82, 86, 98, 6[61 chars] 90]]

My PR only convers "inputs_embeds" or "input_ids" but not both at the same time. I’m curious about the real use case for this.

I need more time to make both two PRs work together.

zongwave · 2024-09-25T10:23:44Z

LGTM! There is a merge conflict to solve however before we can merge.

@regisss @vidyasiv I merged the code changes with #948 and verified the CI test for #948 using command: GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/transformers/tests/models/ -k test_generate_from_inputs_embeds_decoder_only

Unfortunately, 5 test cases failed. It seems this #948 requires generating new tokens from "input_ids" and "inputs_embeds" simultaneously. FAILED tests/transformers/tests/models/llama/test_modeling_llama.py::LlamaModelTest::test_generate_from_inputs_embeds_decoder_only - AssertionError: Lists differ: [[70,[66 chars]7, 78], [90, 71, 10, 82, 86, 98, 64, 64, 64, 6[37 chars] 64]] != [[70,[66 chars]7, 78, 39, 95, 41], [90, 71, 10, 82, 86, 98, 6[61 chars] 90]]

My PR only convers "inputs_embeds" or "input_ids" but not both at the same time. I’m curious about the real use case for this.

I need more time to make both two PRs work together.

@regisss @vidyasiv

I rebased the PR to the latest, in this update, transformer models support process both "input_ids" and "inputs_embeds" at the same time or separately.

Passed #948 test cases to verify the generation consistent by input tokens or embeds
root@800671-4:/workspace/src/optimum-habana# GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/transformers/tests/models/ -k test_generate_from_inputs_embeds_decoder_only
==================================================================== test session starts =====================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 1226 items / 1218 deselected / 8 selected

tests/transformers/tests/models/bert/test_modeling_bert.py . [ 12%]
tests/transformers/tests/models/falcon/test_modeling_falcon.py . [ 25%]
tests/transformers/tests/models/gpt2/test_modeling_gpt2.py . [ 37%]
tests/transformers/tests/models/gpt_neox/test_modeling_gpt_neox.py . [ 50%]
tests/transformers/tests/models/gptj/test_modeling_gptj.py . [ 62%]
tests/transformers/tests/models/llama/test_modeling_llama.py . [ 75%]
tests/transformers/tests/models/roberta/test_modeling_roberta.py . [ 87%]
tests/transformers/tests/models/t5/test_modeling_t5.py . [ 100%]

Passed test-text-generation CI test cases for single HPU device to verify the performance reach benchmark (can not reserve 8-card Gaudi2 this time, I've ran and passed the total cases before)
root@800671-4:/workspace/src/optimum-habana# GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k llama
==================================================================== test session starts =====================================================================
platform linux -- Python 3.10.12, pytest-8.3.3, pluggy-1.5.0 -- /usr/bin/python
cachedir: .pytest_cache
rootdir: /workspace/src/optimum-habana
configfile: setup.cfg
plugins: anyio-4.4.0
collected 52 items / 34 deselected / 18 selected

tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-1-True-141.25776956002076] PASSED [ 5%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Meta-Llama-3-8B-1-True-129] PASSED [ 11%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-True-12808] PASSED [ 16%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-meta-llama/Llama-2-7b-hf-512-False-8711] PASSED [ 22%]
tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-codellama/CodeLlama-34b-hf-1-True-32.644] PASSED [ 27%]

tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-1230-False-128-128-13152.7] PASSED [ 33%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-163-False-128-2048-4774.7] PASSED [ 38%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-94-False-2048-128-1293.3] PASSED [ 44%]
tests/test_text_generation_example.py::test_text_generation_fp8[token0-meta-llama/Llama-2-7b-hf-1-81-False-2048-2048-1942.9] PASSED [ 50%]

optimum/habana/transformers/generation/utils.py

zongwave requested review from ssarkar2, bhargaveede, vivekgoe and regisss as code owners August 20, 2024 10:30

zongwave force-pushed the main branch 4 times, most recently from 5933b4c to 418d2fc Compare August 21, 2024 02:37

zongwave force-pushed the main branch 4 times, most recently from 68ad075 to d020294 Compare September 10, 2024 10:33

vidyasiv approved these changes Sep 11, 2024

View reviewed changes

libinta added the run-test Run CI for PRs from external contributors label Sep 18, 2024

regisss reviewed Sep 23, 2024

View reviewed changes

zongwave force-pushed the main branch from d020294 to bc782be Compare September 25, 2024 10:09

regisss reviewed Sep 25, 2024

View reviewed changes

optimum/habana/transformers/generation/utils.py Outdated Show resolved Hide resolved

zongwave force-pushed the main branch from bc782be to 40ddbc7 Compare September 25, 2024 10:34

Transformer models generation supports user-provided input embeddings

40ddbc7

regisss approved these changes Sep 25, 2024

View reviewed changes

regisss merged commit d28bae9 into huggingface:main Sep 25, 2024
3 of 4 checks passed

vidyasiv mentioned this pull request Sep 27, 2024

Only pass the use_kv_cache True to generator #1366

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer models generation supports user-provided input embeddings #1276

Transformer models generation supports user-provided input embeddings #1276

zongwave commented Aug 20, 2024 •

edited

Loading

libinta commented Aug 31, 2024

zongwave commented Sep 5, 2024 •

edited

Loading

zongwave commented Sep 6, 2024 •

edited

Loading

regisss commented Sep 6, 2024

zongwave commented Sep 10, 2024 •

edited

Loading

vidyasiv commented Sep 11, 2024

HuggingFaceDocBuilderDev commented Sep 23, 2024

regisss left a comment

zongwave commented Sep 24, 2024

zongwave commented Sep 25, 2024 •

edited

Loading

Transformer models generation supports user-provided input embeddings #1276

Transformer models generation supports user-provided input embeddings #1276

Conversation

zongwave commented Aug 20, 2024 • edited Loading

What does this PR do?

Before submitting

libinta commented Aug 31, 2024

zongwave commented Sep 5, 2024 • edited Loading

zongwave commented Sep 6, 2024 • edited Loading

GAUDI2_CI=1 RUN_SLOW=1 python -m pytest tests/test_text_generation_example.py -v -k meta-llama/Llama

regisss commented Sep 6, 2024

zongwave commented Sep 10, 2024 • edited Loading

18 llama cases include bf16/fp8, 1x/8x configuration were selected and passed both input embeds and input tokens cases.

1. Test results on input embeds generation enabled by adding option "--input_embeds" manually:

2. Tests results on original input tokens generation:

vidyasiv commented Sep 11, 2024

HuggingFaceDocBuilderDev commented Sep 23, 2024

regisss left a comment

Choose a reason for hiding this comment

zongwave commented Sep 24, 2024

zongwave commented Sep 25, 2024 • edited Loading

zongwave commented Aug 20, 2024 •

edited

Loading

zongwave commented Sep 5, 2024 •

edited

Loading

zongwave commented Sep 6, 2024 •

edited

Loading

zongwave commented Sep 10, 2024 •

edited

Loading

zongwave commented Sep 25, 2024 •

edited

Loading