Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Llama model initialized on GPU? #776

Open
mahmoodn opened this issue Nov 3, 2024 · 0 comments
Open

Is Llama model initialized on GPU? #776

mahmoodn opened this issue Nov 3, 2024 · 0 comments

Comments

@mahmoodn
Copy link

mahmoodn commented Nov 3, 2024

When I run the Llama training command, I see the following message:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.

I haven't modified the scripts, so I wonder what is the meaning and effect of that message. Should I set something extra in the configuration options?

Messages before and after that are shown below:

[2024-11-02 19:47:06,515] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:02,996] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,549] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-02 19:50:08,550] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
[2024-11-02 19:50:16,865] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 2
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
[2024-11-02 19:50:29,074] [INFO] [partition_parameters.py:348:__exit__] finished initializing model - num_params = 563, num_elems = 68.98B

Loading checkpoint shards:   0%|          | 0/29 [00:00<?, ?it/s]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant