How to manage submodule device manually? #15740
Replies: 4 comments
-
Bumping this once because I'm still hoping there might be a good framework-friendly way to do this before I have to implement some sort of frankenstein solution. |
Beta Was this translation helpful? Give feedback.
-
I'm also interested in this. I'm able to set the devices during the call to LightningModule's training_step, so this bypasses transports done by Lightning at the beginning of fit. However, I'm facing issues when I try to resume from a checkpoint because my optimizer apparently places its internal state in the CPU, while my (first) training step moves the model to my two GPUs. I'm unsure what to do at the moment. Can anything elegant/"proper" be done here? |
Beta Was this translation helpful? Give feedback.
-
I believe model parallel is all you need. To use Sharded Training, you need to first install FairScale using the command below. |
Beta Was this translation helpful? Give feedback.
-
Final bump before I go and implement this myself - would be really cool if Lightning supported this. Edit: For anyone else who wants to do something like this, here's how I did it for myself by overriding def on_fit_start(self):
device_count = torch.cuda.device_count()
if self.device.type == 'cuda' and torch.cuda.device_count() >= 2:
print("Found multiple GPUs, using separate device for definition encoder")
def_encoder_device = (self.device.index + 1) % device_count
self.bi_encoder.definition_encoder.cuda(def_encoder_device) The only other thing to be careful of is that you will have to set the trainer to only use a single CUDA device (I.E. set devices=[0]) in order to prevent it from automatically trying to shard the model. |
Beta Was this translation helpful? Give feedback.
-
I have a Pytorch lightning module that has two big transformer encoders, which make totally separate forward passes and then the outputs are combined to produce a final result - a standard Bi-Encoder. I'd like to move just one of those encoders onto a separate GPU during training so that I can train with bigger encoders. Given the below pseudocode, I'm wondering what the best way to move just
encoder2
onto a separate device is.I looked for related discussions but the closest I could find was this question, which talks mostly about using a DeepSpeed integration to solve the problem. I'm really hoping for a simpler that just lets me place one encoder on a separate GPU, and I'm fine with having to write code in my
forward()
to make sure the input/output tensors get moved to proper devices.Obviously calling
encoder2.to(device)
will move the encoder, but I know that Lightning moves the whole module automatically at some point during training, and I'd rather not end up with an implementation that has me working against the Lightning trainer implementation to try and forceencoder2
onto a separate device or moving it back and forth.In short, is there a Pytorch-Lightning approved way to have submodules on a separate device? I looked around for a specific callback that dealt with device management, but couldn't find one.
Beta Was this translation helpful? Give feedback.
All reactions