Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True #35343

vpandya-quic · 2024-12-19T13:24:04Z

System Info

transformers version: 4.43.2
Platform: Linux-5.15.0-126-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.26.3
Safetensors version: 0.4.5
Accelerate version: 1.0.1
Accelerate config: not found
PyTorch version (GPU?): 2.4.0a0+gitee1b680 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:

Who can help?

@ArthurZucker @gone

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

For a custom device I am working on adding torch.compile() with CPP inductor backend.
I am trying run "TinyLlama/TinyLlama-1.1B-Chat-v1.0" and it has output difference when using KV cache in generation.
if I use following config output of compiled mode matches with eager mode.
compiled_model.generation_config.use_cache = False

And for large content length generation I see output similar to #30347

Expected behavior

Please help me debugging this further so that my backend generates correct output with compile mode even with KV cache.

The text was updated successfully, but these errors were encountered:

vpandya-quic added the bug label Dec 19, 2024

vpandya-quic changed the title ~~torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True~~ Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True #35343

Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True #35343

vpandya-quic commented Dec 19, 2024

Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True #35343

Llama model, torch.compile output for custom device does not match with eager/cpu when generation_config.use_cache set to True #35343

Comments

vpandya-quic commented Dec 19, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior