You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe everything a "model fine-tuner" at least would need to know to:
Implement custom fine-tune architectures that put different heads on a trunk
Implement custom losses
Implement custom datasets (eg batch support for CP and/or SP?)
Future:
4. Everything a user would need to know to implement a custom layer that supports parallelism? This is more advanced but we can have it on the roadmap.
From @pstjohn:
A part of this I'm still shaky on is what kinds of modifications to the actual models do we need to support these different model-parallel strategies? Some mention that you need to write the underlying models specifically for megatron might be helpful.
Are there common abstractions that allow a model to use all types of megatron parallelization? Or do some models only support a subset of tensor / pipeline / sequence / context parallelism?
Are there any docs on how to tune the combination of these parallel choices for maximum throughput? Or is that done under-the-hood by megatron?
Describe everything a "model fine-tuner" at least would need to know to:
Future:
4. Everything a user would need to know to implement a custom layer that supports parallelism? This is more advanced but we can have it on the roadmap.
From @pstjohn:
A part of this I'm still shaky on is what kinds of modifications to the actual models do we need to support these different model-parallel strategies? Some mention that you need to write the underlying models specifically for megatron might be helpful.
Are there common abstractions that allow a model to use all types of megatron parallelization? Or do some models only support a subset of tensor / pipeline / sequence / context parallelism?
Are there any docs on how to tune the combination of these parallel choices for maximum throughput? Or is that done under-the-hood by megatron?
Originally posted by @pstjohn in #153 (comment)
The text was updated successfully, but these errors were encountered: