-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numerical instability in Google Colab - Part 4 of Makemore #13
Comments
oh oh |
I'm guessing it has something to do with the python versions? |
Yes. I have an issue with colab. But I don't have an issue with the local VScode Jupyter notebook. If the diff number is too small, maybe it is fine to use some way to accept it? Maybe the issue is Pytorch version? |
I used the t.grad.sum() and dt.sum() to compare the sum between colab and the local notebook. I posted it on Pytorch forum, and I got no answer: https://discuss.pytorch.org/t/numerical-instability-in-google-colab/163610 |
I am getting exactly same &
|
I've got a strange observation (using the colab version)
In this exapmple - if shapes of the gradients are not equal, but the comparison is made after broadcasting (I guess) - there is a residual difference, otherwise the values equal exactly. Somehow it might have to do with the accuracy limitations of the floating point operations. In this case values are float32 and 10^-8 is close to the precision limit for float32 operations. I've made a PR for the cmp function to output comparison of shapes, it could probably be useful: #36 Another thing is that maybe what matters is the order of the arithmetic operations. Apparently addition and multiplications of the floats are not associative https://pytorch.org/docs/stable/notes/numerical_accuracy.html Also the doc says that there results my be inconsistent across devices, and commits in the software. |
I had the same difference problem between gradients when running locally, because I used GPU to store tensors and perform computations. Once I changed to CPU, I had the difference in the later computations because of the ordering of operations. I managed to get the exact gradients running on CPU and reordering computations to be the same as in the lecture. |
I encountered the same issue on a Linux machine with CPU. Setting the following environment variable resolved the problem:
To fix the issue in the notebook, add these lines at the very beginning of the notebook, before importing PyTorch:
However, this solution does not address issues with Nvidia GPUs, which remain affected. |
Fixes karpathy#13 karpathy#45, where the `dhpreact` was not exactly matching the `hpreact.grad`.
I ran into an interesting issue in makemore 4 backpro ninja where the dhpreact was not exactly matching the hpreact.grad.
However, this was only in the collab notebook because when I put the same code into a local jupyter notebook it works fine.
Not sure why this would be the case but just an odd curiosity.
The text was updated successfully, but these errors were encountered: