You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the Llama training command, I see the following message:
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
I haven't modified the scripts, so I wonder what is the meaning and effect of that message. Should I set something extra in the configuration options?
Messages before and after that are shown below:
[2024-11-02 19:47:06,515] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,550] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[2024-11-02 19:50:29,074] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 563, num_elems = 68.98B
Loading checkpoint shards: 0%| | 0/29 [00:00<?, ?it/s]
The text was updated successfully, but these errors were encountered:
When I run the Llama training command, I see the following message:
I haven't modified the scripts, so I wonder what is the meaning and effect of that message. Should I set something extra in the configuration options?
Messages before and after that are shown below:
The text was updated successfully, but these errors were encountered: