You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the glue finetuning code for mosaicbert on Lambda Cloud 8xA100-40 GPUs when I get the following error on the results for loop in run_jobs_parallel:
RuntimeError: The server socket has failed to listen on any local network address. useIpv6: 0, code: -98, name: EADDRINUSE, me ssage: address already in use
I am not familiar with multiprocessing. But I figure with a batch size of 32 I am much slower in serial than I would be in parallel. Please let me know if there are any ways I should go about debugging or solving this issue.
The text was updated successfully, but these errors were encountered:
Hey @naston have you resolved this issue? Depending on your use case, you might find this new repo quite useful (which modernizes the MosaicBERT examples/stack here with lots of nice new features): https://github.com/AnswerDotAI/ModernBERT
I am running the glue finetuning code for mosaicbert on Lambda Cloud 8xA100-40 GPUs when I get the following error on the results for loop in
run_jobs_parallel
:I am not familiar with multiprocessing. But I figure with a batch size of 32 I am much slower in serial than I would be in parallel. Please let me know if there are any ways I should go about debugging or solving this issue.
The text was updated successfully, but these errors were encountered: