Question about the SubTransformers sampling process. #17

Kevinpsk · 2023-11-16T14:56:02Z

Hi,

Thanks a lot for releasing this great project.
I have a question on the SubTransformers sampling process in the distributed training environment. I see you sample a random SubTransformer before each train step by doing the following, then in multi-GPU scenario, does each GPU has the same random SubTransformer or they each has a different random Subnetwork? Would reset_rand_seed force all GPUs to sample the same random SubTransformer from the SuperNet? And is trainer.get_num_updates() the same at each train step?

configs = [utils.sample_configs(utils.get_all_choices(args), reset_rand_seed=True, rand_seed=trainer.get_num_updates(), super_decoder_num_layer=args.decoder_layers)]

Thanks a lot for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the SubTransformers sampling process. #17

Question about the SubTransformers sampling process. #17

Kevinpsk commented Nov 16, 2023 •

edited

Loading

Question about the SubTransformers sampling process. #17

Question about the SubTransformers sampling process. #17

Comments

Kevinpsk commented Nov 16, 2023 • edited Loading

Kevinpsk commented Nov 16, 2023 •

edited

Loading