You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just install te using pip install transformer_engine[pytorch] in a clean environment. And the test_numerical test is failed. Could anyone help me on this case?
Error Message with case tests/pytorch/test_numerical.py
FAILED test_numerical.py::test_gpt_full_activation_recompute[True-True-False-126m-1-dtype2] - AssertionError: Mismatch in tensor 0
FAILED test_numerical.py::test_linear_accuracy[small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [78] with -10.82479190826416 vs -10.818109512329102 (diff 0.006682395935058594).
FAILED test_numerical.py::test_linear_accuracy[small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=0. Maximum difference at location [66] with 16.163537979125977 vs 16.155488967895508 (diff 0.00804901123046875).
FAILED test_numerical.py::test_layernorm_linear_accuracy[True-LayerNorm-small-1-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[True-LayerNorm-small-2-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[True-RMSNorm-small-1-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[True-RMSNorm-small-2-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[False-LayerNorm-small-1-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[False-LayerNorm-small-2-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[False-RMSNorm-small-1-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_linear_accuracy[False-RMSNorm-small-2-dtype0] - TypeError: unsupported operand type(s) for*: 'NoneType' and 'Tensor'
FAILED test_numerical.py::test_layernorm_mlp_accuracy[LayerNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=4. Maximum difference at location [0, 105] with 0.13338899612426758 vs 0.15530425310134888 (diff 0.0219152569770813).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[LayerNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=2. Maximum difference at location [0, 105] with 0.13338899612426758 vs 0.15530425310134888 (diff 0.0219152569770813).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[LayerNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=103. Maximum difference at location [40] with 0.15618646144866943 vs 0.09805639088153839 (diff 0.05813007056713104).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[LayerNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=53. Maximum difference at location [16] with -1.7313060760498047 vs -1.428515911102295 (diff 0.30279016494750977).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[RMSNorm-relu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=79. Maximum difference at location [0, 61] with -0.027338851243257523 vs -0.006757093593478203 (diff 0.02058175764977932).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[RMSNorm-relu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=39. Maximum difference at location [1, 61] with -0.027338841930031776 vs -0.006757088005542755 (diff 0.02058175392448902).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[RMSNorm-reglu-small-1-dtype0] - AssertionError: Outputs not close enough in tensor at idx=82. Maximum difference at location [127] with 0.8240067362785339 vs 0.7542246580123901 (diff 0.0697820782661438).
FAILED test_numerical.py::test_layernorm_mlp_accuracy[RMSNorm-reglu-small-2-dtype0] - AssertionError: Outputs not close enough in tensor at idx=30. Maximum difference at location [0] with -0.6747775077819824 vs -0.6976503729820251 (diff 0.022872865200042725).
FAILED test_numerical.py::test_transformer_layer_hidden_states_format[126m-1-dtype1] - ValueError: No dot product attention support for the provided inputs!
FAILED test_numerical.py::test_transformer_layer_hidden_states_format[126m-1-dtype2] - ValueError: No dot product attention support for the provided inputs!
FAILED test_numerical.py::test_transformer_layer_hidden_states_format[126m-2-dtype1] - ValueError: No dot product attention support for the provided inputs!
FAILED test_numerical.py::test_transformer_layer_hidden_states_format[126m-2-dtype2] - ValueError: No dot product attention support for the provided inputs!
Based on these two related issues #494#1165, it turns out that te use tf32 by default and pytorch use fp32 by default.
Thus, we can either let pytorch use tf32 by
Hi, I just install te using
pip install transformer_engine[pytorch]
in a clean environment. And the test_numerical test is failed. Could anyone help me on this case?Error Message with case
tests/pytorch/test_numerical.py
My environment
The text was updated successfully, but these errors were encountered: