Is Llama model initialized on GPU? #776

mahmoodn · 2024-11-03T09:14:54Z

When I run the Llama training command, I see the following message:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

I haven't modified the scripts, so I wonder what is the meaning and effect of that message. Should I set something extra in the configuration options?

Messages before and after that are shown below:

[2024-11-02 19:47:06,515] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,550] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[2024-11-02 19:50:29,074] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 563, num_elems = 68.98B

Loading checkpoint shards:   0%|          | 0/29 [00:00<?, ?it/s]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Llama model initialized on GPU? #776

Is Llama model initialized on GPU? #776

mahmoodn commented Nov 3, 2024

Is Llama model initialized on GPU? #776

Is Llama model initialized on GPU? #776

Comments

mahmoodn commented Nov 3, 2024