-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Proper way to use multiple GPUs #2562
Comments
Hi @0xLienid thanks for the question. There are a few ways to get things right.
If you follow the two ways above, you don't need to specify It might be more helpful for us to triage the issue you encountered if you don't mind sharing the log printed when running |
will rerun for the logs in a few, but this is my config generation and compilation command calls that led to this GPU usage. for both,
|
Just wanna share some more pointers that may be helpful: in the log of
The expectation is to see 4 here. |
Another thing is, if your local MLC is installed before Jun 7, then you may need to upgrade to the latest nightly, as we fixed some related logic in #2533. |
I will double check, but this should have been built from source as of yesterday |
If it says this and prints the following log
then there is nothing wrong with |
Got it, then it should be fine as #2533 is already included. |
yes. it says this. the model is ~70GB and I have 4 A100s. so it should fit comfortably when sharded across them. for now i've worked around this by increasing the GPU mem share so that it all fits within one, but obviously that's less than ideal. ok. once my evals are done running i'll rerun the config and compile to get logs |
@MasterJH5574 this is the log |
Thanks for sharing! It looks pretty normal actually. How is the log when loading parameters? How much can the progress bar go? |
❓ General Questions
What is the proper way to actually utilize multiple GPUs? When I generate config, compile, and load the MLCEngine with multiple tensor shards it will still error out if the model size is larger than a single one of the GPU's memory. Also, if I check
nvidia-smi
it is only really utilizing one GPU.e.g. this was run with 4 tensor shards
The text was updated successfully, but these errors were encountered: