Add documentation describing what modifications you need (head/loss/inputs) to support different parallelisms of transformer block #170

jstjohn · 2024-09-17T22:08:23Z

Describe everything a "model fine-tuner" at least would need to know to:

Implement custom fine-tune architectures that put different heads on a trunk
Implement custom losses
Implement custom datasets (eg batch support for CP and/or SP?)

Future:
4. Everything a user would need to know to implement a custom layer that supports parallelism? This is more advanced but we can have it on the roadmap.

From @pstjohn:
A part of this I'm still shaky on is what kinds of modifications to the actual models do we need to support these different model-parallel strategies? Some mention that you need to write the underlying models specifically for megatron might be helpful.

Are there common abstractions that allow a model to use all types of megatron parallelization? Or do some models only support a subset of tensor / pipeline / sequence / context parallelism?

Are there any docs on how to tune the combination of these parallel choices for maximum throughput? Or is that done under-the-hood by megatron?

Originally posted by @pstjohn in #153 (comment)

jstjohn mentioned this issue Sep 17, 2024

Add documentation covering megatron and code structure rationalle #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation describing what modifications you need (head/loss/inputs) to support different parallelisms of transformer block #170

Add documentation describing what modifications you need (head/loss/inputs) to support different parallelisms of transformer block #170

jstjohn commented Sep 17, 2024

Add documentation describing what modifications you need (head/loss/inputs) to support different parallelisms of transformer block #170

Add documentation describing what modifications you need (head/loss/inputs) to support different parallelisms of transformer block #170

Comments

jstjohn commented Sep 17, 2024