Tensor parallel in distributed inference #10118
MohmedMonsef
announced in
Q&A
Replies: 1 comment 4 replies
-
Within a node, networking is generally fast. This means that the added communication overhead of TP is not as much of a concern compared to the big improvement you get from using multiple GPUs at once and from batch efficiencies present in TP but not PP. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the documentation, it recommends using tensor parallelism for cases with a single node and multiple GPUs
If your model is too large to fit in a single GPU
.My question is: why is tensor parallelism preferred over pipeline parallelism in this setup, even though tensor parallelism involves more communication?
What are the specific advantages of using tensor parallelism in this setup?
Beta Was this translation helpful? Give feedback.
All reactions