Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement distributed training using horovod #1865

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NanoNabla
Copy link
Contributor

I implemented distributed training using Horovod similar to the one that made it already into DeepSpeech.
I already opened a discussion #1849 a while ago if this feature is wanted by you but there isn't any answer, yet.

I tried to keep the changes as minimal as possible. It is still possible to run your undistributed code version. However I also noticed a slightly performance improvement by using Horovod on one of our IBM machines with 6 V100 cards.

I didn't added any CI because I don't have any knowledge of it.

If you need any help with Horovod don't hesitate to ask.

@CLAassistant
Copy link

CLAassistant commented Aug 3, 2021

CLA assistant check
All committers have signed the CLA.

@NanoNabla
Copy link
Contributor Author

My PR seems to be unregarded since I made it in May.
Are you interested in parallel training as in DeepSpeech?

If you are interested in it I would try to get my PR able to merge again. Otherwise feel free to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants