Bringing together results on ddp on a single machine #5886
Unanswered
Arij-Aladel
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have the same problem as mentioned in 1 and 2, the solution was to use dist.all_gather() inside validation_epoch_end . The difference is that I have as output in each validation_step a dictionary:
{"val_loss" : float_loss, "batch_length": int_len, "preds": text, "answer":text, "doc":
text}.
Is there away to collect text results from all processes?
I can collect loss using dist.all_gather(), but what about text results any suggestions?
Beta Was this translation helpful? Give feedback.
All reactions