-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LayerNorm unit test fails in NGC Docker 24.10 environment #2
Comments
Ah yes I change the layernorm kernel few weeks ago I might have to check the test again Thank you so much reporting |
Hey , I was able to fix it. you can now run the test and it should work ============================= test session starts ==============================
platform linux -- Python 3.10.14, pytest-8.3.3, pluggy-1.5.0
plugins: time-machine-2.14.1, typeguard-4.3.0, anyio-4.4.0
collected 16 items
test_layernorm.py ................ [100%]
============================== 16 passed in 8.63s ==============================
add Codeadd Markdown |
Thank you for your prompt response. I have tested the fix, but unfortunately, I encountered a bug in the test code after the recent commit. The issue occurs in the following following lines, where the assertion always passes, even though it shouldn't. After I fixed this, the test still fails. Additionally, I believe it would be beneficial to add a test case that explicitly compares the gradients of weight and bias between the Triton-based TritonLayerNorm and PyTorch's torch.nn.LayerNorm. This would ensure that the gradient calculations are consistent across both implementations. Thanks again for your support. |
I see ok I will try to add a test case that explicitly compares the gradients of weight and bias between the Triton-based TritonLayerNorm and PyTorch's torch.nn.LayerNorm. and I'll check my implementation again |
Looking forward to your updates, and thank you very much for your open-source. |
I am encountering an issue where the LayerNorm unit tests fail during execution in the ngc docker 24.10 environment. Specifically, the gradient matching between the Triton-based TritonLayerNorm and the PyTorch standard torch.nn.LayerNorm is not passing. It seems that the gradients for the weight and bias parameters in the custom Triton-based LayerNorm implementation are not being calculated properly. TThe assertion error message is:
tests/test_layernorm.py:69: AssertionError ========================================================================== short test summary info =========================================================================== FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[1-128-256] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[8-512-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[16-256-512] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[4-1024-768] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[8-1024-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[16-1024-1024] - AssertionError: LayerNorm weight gradients don't match! FAILED tests/test_layernorm.py::TestLayerNorm::test_backward_match[32-512-1024] - AssertionError: LayerNorm weight gradients don't match! ================================================================== 7 failed, 36 passed in 93.55s (0:01:33) ===================================================================
The text was updated successfully, but these errors were encountered: