Numerical instability in Google Colab - Part 4 of Makemore #13

sachag678 · 2022-10-13T18:00:45Z

I ran into an interesting issue in makemore 4 backpro ninja where the dhpreact was not exactly matching the hpreact.grad.

However, this was only in the collab notebook because when I put the same code into a local jupyter notebook it works fine.

Not sure why this would be the case but just an odd curiosity.

karpathy · 2022-10-13T18:15:50Z

oh oh

sachag678 · 2022-10-13T20:33:23Z

I'm guessing it has something to do with the python versions?

JonathanSum · 2022-10-14T01:40:17Z

Yes. I have an issue with colab. But I don't have an issue with the local VScode Jupyter notebook.
The local Jupyter notebook version is
Python 3.7.13
The tested colab notebook version is
3.7.14 (default, Sep 8 2022, 00:06:44)
[GCC 7.5.0]

If the diff number is too small, maybe it is fine to use some way to accept it?
Colab tested notebook: https://colab.research.google.com/drive/1HmZ8bgtAfvyMaZyu3Sr1Bgxsj35jitTs?usp=sharing

Maybe the issue is Pytorch version?

JonathanSum · 2022-10-14T02:04:23Z

I used the t.grad.sum() and dt.sum() to compare the sum between colab and the local notebook.
colab.txt
local.txt

I posted it on Pytorch forum, and I got no answer: https://discuss.pytorch.org/t/numerical-instability-in-google-colab/163610
I am planning to post it on Colab Git Issues.

mriganktiwari · 2022-11-07T13:54:02Z

Yes. I have an issue with colab. But I don't have an issue with the local VScode Jupyter notebook. The local Jupyter notebook version is Python 3.7.13 The tested colab notebook version is 3.7.14 (default, Sep 8 2022, 00:06:44) [GCC 7.5.0]

If the diff number is too small, maybe it is fine to use some way to accept it? Colab tested notebook: https://colab.research.google.com/drive/1HmZ8bgtAfvyMaZyu3Sr1Bgxsj35jitTs?usp=sharing

Maybe the issue is Pytorch version?

I am getting exactly same maxdiff for hpreact, and my notebook is running on local machine.
Python 3.9.13

&

torch.version
'1.12.1'

evgenyfadeev · 2023-04-29T16:26:58Z

I've got a strange observation (using the colab version)

dlogit_maxes = - dnorm_logits.sum(dim=1, keepdim=True) gives me exact equality
dlogit_maxes = - dnorm_logits.sum(dim=1) gives approximate equality with a maxdiff ~ 10^-8

In this exapmple - if shapes of the gradients are not equal, but the comparison is made after broadcasting (I guess) - there is a residual difference, otherwise the values equal exactly. Somehow it might have to do with the accuracy limitations of the floating point operations. In this case values are float32 and 10^-8 is close to the precision limit for float32 operations.

I've made a PR for the cmp function to output comparison of shapes, it could probably be useful: #36

Another thing is that maybe what matters is the order of the arithmetic operations. Apparently addition and multiplications of the floats are not associative https://pytorch.org/docs/stable/notes/numerical_accuracy.html

Also the doc says that there results my be inconsistent across devices, and commits in the software.

vdyma · 2024-04-25T18:47:37Z

I had the same difference problem between gradients when running locally, because I used GPU to store tensors and perform computations. Once I changed to CPU, I had the difference in the later computations because of the ordering of operations. I managed to get the exact gradients running on CPU and reordering computations to be the same as in the lecture.

conscell · 2024-12-27T05:32:30Z

I encountered the same issue on a Linux machine with CPU. Setting the following environment variable resolved the problem:

ATEN_CPU_CAPABILITY=default

To fix the issue in the notebook, add these lines at the very beginning of the notebook, before importing PyTorch:

import os
os.environ['ATEN_CPU_CAPABILITY'] = 'default'

However, this solution does not address issues with Nvidia GPUs, which remain affected.

Fixes karpathy#13 karpathy#45, where the `dhpreact` was not exactly matching the `hpreact.grad`.

JonathanSum mentioned this issue Oct 31, 2022

On Karpathy's Pytorch course, they found out Numerical instability in Google Colab, but no issues on local device. googlecolab/colabtools#3199

Open

conscell added a commit to conscell/nn-zero-to-hero that referenced this issue Dec 27, 2024

fixes karpathy#13 karpathy#45

7d7e9c7

Fixes karpathy#13 karpathy#45, where the `dhpreact` was not exactly matching the `hpreact.grad`.

conscell linked a pull request Dec 27, 2024 that will close this issue

fixes #13 #45 #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical instability in Google Colab - Part 4 of Makemore #13

Numerical instability in Google Colab - Part 4 of Makemore #13

sachag678 commented Oct 13, 2022

karpathy commented Oct 13, 2022

sachag678 commented Oct 13, 2022

JonathanSum commented Oct 14, 2022 •

edited

Loading

JonathanSum commented Oct 14, 2022 •

edited

Loading

mriganktiwari commented Nov 7, 2022

evgenyfadeev commented Apr 29, 2023 •

edited

Loading

vdyma commented Apr 25, 2024 •

edited

Loading

conscell commented Dec 27, 2024

Numerical instability in Google Colab - Part 4 of Makemore #13

Numerical instability in Google Colab - Part 4 of Makemore #13

Comments

sachag678 commented Oct 13, 2022

karpathy commented Oct 13, 2022

sachag678 commented Oct 13, 2022

JonathanSum commented Oct 14, 2022 • edited Loading

JonathanSum commented Oct 14, 2022 • edited Loading

mriganktiwari commented Nov 7, 2022

evgenyfadeev commented Apr 29, 2023 • edited Loading

vdyma commented Apr 25, 2024 • edited Loading

conscell commented Dec 27, 2024

JonathanSum commented Oct 14, 2022 •

edited

Loading

JonathanSum commented Oct 14, 2022 •

edited

Loading

evgenyfadeev commented Apr 29, 2023 •

edited

Loading

vdyma commented Apr 25, 2024 •

edited

Loading