-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker shared memory issue and solution #369
Comments
Huh, nope, never seen this -- and I've done a little bit of work with
Pytorch in Docker, too...
Would adding that patch command to the docker entrypoint (or build script)
help?
…On Wed, Feb 13, 2019, 19:08 Pete Florence ***@***.*** wrote:
@manuelli <https://github.com/manuelli> @gizatt
<https://github.com/gizatt> @weigao95 <https://github.com/weigao95>
has anybody else seen this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC7LTxbloo-6OLXWb6c7GDPOZ-bOF9B2ks5vNKkAgaJpZM4a6hz_>
.
|
Yes that would work but first would like to ascertain if anybody else has this issue. I've done a lot of work with PyTorch in Docker before but haven't had this, so would like to understand what's different. Is easy to test your own docker setup, just run:
|
Yeah, 64m here.
@d354535de71e:~/spartan$ df -h | grep shm
shm 64M 0 64M 0% /dev/shm
…On Wed, Feb 13, 2019, 19:13 Pete Florence ***@***.*** wrote:
Yes that would work but first would like to ascertain if anybody else has
this issue.
I've done a lot of work with PyTorch in Docker before but haven't had
this, so would like to understand what's different.
Is easy to test your own docker setup, just run:
df -h | grep shm
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC7LTxLvWwVrA3bKMPdWGdZZ2hB6GQfLks5vNKo5gaJpZM4a6hz_>
.
|
Interesting!
Greg what if you do it in a Docker container you've used with PyTorch?
…On Wed, Feb 13, 2019 at 7:39 PM Greg Izatt ***@***.***> wrote:
Yeah, 64m here.
@d354535de71e:~/spartan$ df -h | grep shm
shm 64M 0 64M 0% /dev/shm
On Wed, Feb 13, 2019, 19:13 Pete Florence ***@***.*** wrote:
> Yes that would work but first would like to ascertain if anybody else has
> this issue.
>
> I've done a lot of work with PyTorch in Docker before but haven't had
> this, so would like to understand what's different.
>
> Is easy to test your own docker setup, just run:
>
> df -h | grep shm
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#369 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AC7LTxLvWwVrA3bKMPdWGdZZ2hB6GQfLks5vNKo5gaJpZM4a6hz_
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFYQqNMWhX-0XeQtjdY3s_kOWpXqpMz5ks5vNLBVgaJpZM4a6hz_>
.
|
That will be a little harder to revive on a phone, I'll get back to you!
…On Wed, Feb 13, 2019, 19:53 Pete Florence ***@***.*** wrote:
Interesting!
Greg what if you do it in a Docker container you've used with PyTorch?
On Wed, Feb 13, 2019 at 7:39 PM Greg Izatt ***@***.***>
wrote:
> Yeah, 64m here.
>
> @d354535de71e:~/spartan$ df -h | grep shm
> shm 64M 0 64M 0% /dev/shm
>
>
>
> On Wed, Feb 13, 2019, 19:13 Pete Florence ***@***.***
wrote:
>
> > Yes that would work but first would like to ascertain if anybody else
has
> > this issue.
> >
> > I've done a lot of work with PyTorch in Docker before but haven't had
> > this, so would like to understand what's different.
> >
> > Is easy to test your own docker setup, just run:
> >
> > df -h | grep shm
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
>
#369 (comment)
> >,
> > or mute the thread
> > <
>
https://github.com/notifications/unsubscribe-auth/AC7LTxLvWwVrA3bKMPdWGdZZ2hB6GQfLks5vNKo5gaJpZM4a6hz_
> >
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <
#369 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AFYQqNMWhX-0XeQtjdY3s_kOWpXqpMz5ks5vNLBVgaJpZM4a6hz_
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC7LTx-D60oaqRnBKNMnA8UaodqgAiErks5vNLOZgaJpZM4a6hz_>
.
|
why not use: docker run --shm-size 8G |
Yeah I tried that and for some reason it didn't work for me. I think maybe
just the docker run string wasn't formatted correctly. I'll report back if
I fix it
…On Wed, Feb 13, 2019 at 11:21 PM Pat Marion ***@***.***> wrote:
why not use: docker run --shm-size 8G
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFYQqDWNYXptba2TDX2qd479zxfTqlUCks5vNOQ9gaJpZM4a6hz_>
.
|
Yeah I have it inside my spartan container as well.
but inside
So we must have something different between |
Thanks for checking. Yeah I think won’t be hard to switch to sharing
all/more memory, like the command @patmarion mentioned
I am curious to try to learn if this has been affecting any robot software
in general
…On Thu, Feb 14, 2019 at 9:41 AM Lucas Manuelli ***@***.***> wrote:
Yeah I have it inside my spartan container as well.
***@***.***:~/spartan$ df -h | grep shm
shm 64M 0 64M 0% /dev/shm
but inside pdc container I have 31G.
***@***.***:~/code$ df -h | grep shm
tmpfs 32G 882M 31G 3% /dev/shm
So we must have something different between pdc and spartan docker
containers that is causing this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#369 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFYQqHyq7tsy1-oHLWYzhj9-AXM5OXkYks5vNXWVgaJpZM4a6hz_>
.
|
Resolved by either passing I did have the arg in the wrong spot in the |
Looked at it with @manuelli this morning We might just want to add |
@peteflorence If both |
Both solutions worked for me (though in a separate container that runs PyTorch). Root cause is still unknown? Otherwise perhaps this issue is resolved. |
Is there a way to override the path used by Pytorch multiprocess (/dev/shm). Unfortunately, increasing shared memory is not possible for me. |
I am not sure if this is happening in our various other configurations, but it was happening in my
spartan
Docker container inside which I put PyTorch and was trying to do some training.Symptom
I was getting an error something like, "Bus error (core dumped) model share memory". It's related to this issue: pytorch/pytorch#2244
Cause
Following the comments by apaszke (a PyTorch author) are helpful here (pytorch/pytorch#1355 (comment)) in which, running inside the Docker container, it appears the only available shared memory is 64 megs:
Temp Solution
As mentioned by apaszke,
(choose more than 8G if you'd like)
This fixes it, as visible here:
Other notes
Some places on the internet you will find that
--ipc=host
is supposed to avoid this issue, as can other flags to the docker run process, but those didn't work for me, and involve re-opening the container. I suspect something about my configuration is wrong. The above issue fixes it even while inside the container.Long term solution
It would first be useful to identify if anybody else's docker containers have this issue, which can be simply evaluated by
df -h | grep shm
inside the container. Then we could diagnose who it is happening to and why. It might just be me.The text was updated successfully, but these errors were encountered: