Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Finetuning Socket Error #458

Open
naston opened this issue Jan 7, 2025 · 2 comments
Open

Parallel Finetuning Socket Error #458

naston opened this issue Jan 7, 2025 · 2 comments

Comments

@naston
Copy link

naston commented Jan 7, 2025

I am running the glue finetuning code for mosaicbert on Lambda Cloud 8xA100-40 GPUs when I get the following error on the results for loop in run_jobs_parallel:

RuntimeError: The server socket has failed to listen on any local network address. useIpv6: 0, code: -98, name: EADDRINUSE, me ssage: address already in use

I am not familiar with multiprocessing. But I figure with a batch size of 32 I am much slower in serial than I would be in parallel. Please let me know if there are any ways I should go about debugging or solving this issue.

@jacobfulano
Copy link
Contributor

Hey @naston have you resolved this issue? Depending on your use case, you might find this new repo quite useful (which modernizes the MosaicBERT examples/stack here with lots of nice new features): https://github.com/AnswerDotAI/ModernBERT

@naston
Copy link
Author

naston commented Jan 24, 2025

I have been tracking that project. Looking at the finetuning code I don't really see anything that jumps out as different but I can look again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants