Optimal Configuration #31

HRezaei · 2024-08-31T08:39:45Z

This is not a real issue but a question. If you enable discussion sections of the repo, it'd fit there better:
I’m wondering what are the optimal configurations for running the experiments. I tried running train_language_agent.py using almost the same configuration as in experiments/configs/multi-node_slurm_cluster_config.yaml and experiments/campaign/Mixed_training/GFlan-T5_large.slurm on 8x A100 80GB GPUs but it was slow (almost 2 frames per second) and when I try to play with configurations to improve speed, for example increasing the mini-batch size, it faces the Cuda out of memory error or errors related to all-NaN tensors (I guess vanishing gradients?). So, it’d be appreciated if you could provide some hints about what configuration on what hardware yields results similar to the paper, and with what speed (frames per second), please?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal Configuration #31

Optimal Configuration #31

HRezaei commented Aug 31, 2024 •

edited

Loading

Optimal Configuration #31

Optimal Configuration #31

Comments

HRezaei commented Aug 31, 2024 • edited Loading

HRezaei commented Aug 31, 2024 •

edited

Loading