Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_mlp benchmark got accuracy assert error #134

Open
ZJLi2013 opened this issue Jul 4, 2024 · 0 comments
Open

test_mlp benchmark got accuracy assert error #134

ZJLi2013 opened this issue Jul 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ZJLi2013
Copy link

ZJLi2013 commented Jul 4, 2024

hi, team,

I tried to benchmark on mlp implement with following:

env setup

GPU: MI308
rocm: 6.1.2.60102-119~20.04
pytorch: 2.4.0.dev20240501+rocm6.1

how to duplicate the process

cd apex/
pip install -r requirements.txt 
python3 setup.py install 
cd tests/run_mlp
python3 test_mlp.py

accuracy errors

FAIL: test_no_bias (__main__.TestMLP)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/apex-1.3.0-py3.9-linux-x86_64.egg/apex/testing/common_utils.py", line 32, in wrapper
    fn(*args, **kwargs)
  File "/workspace/apex/tests/L0/run_mlp/test_mlp.py", line 77, in test_no_bias
    np.testing.assert_allclose(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-07

Mismatched elements: 2 / 1024 (0.195%)
Max absolute difference: 1.8835999e-07
Max relative difference: 0.00286722
 x: array([[ 0.027259],
       [-0.054091],
       [-0.003985],...
 y: array([[ 0.027259],
       [-0.054091],
       [-0.003985],...

FAIL: test_no_grad (__main__.TestMLP)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/apex-1.3.0-py3.9-linux-x86_64.egg/apex/testing/common_utils.py", line 32, in wrapper
    fn(*args, **kwargs)
  File "/workspace/apex/tests/L0/run_mlp/test_mlp.py", line 163, in test_no_grad
    np.testing.assert_allclose(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-07

Mismatched elements: 140028 / 491520 (28.5%)
Max absolute difference: 7.2151306e-07
Max relative difference: 951.6179
 x: array([[-2.597046e-05,  4.191594e-06, -6.009603e-05, ...,  2.606537e-04,
         6.171300e-05, -6.382344e-05],
       [ 1.673573e-05, -5.885254e-05, -8.349993e-05, ..., -7.531334e-05,...
 y: array([[-2.654276e-05,  3.729273e-06, -6.010690e-05, ...,  2.608544e-04,
         6.124897e-05, -6.427429e-05],
       [ 1.673585e-05, -5.885251e-05, -8.349984e-05, ..., -7.531326e-05,...

FAIL: test_with_bias (__main__.TestMLP)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/apex-1.3.0-py3.9-linux-x86_64.egg/apex/testing/common_utils.py", line 32, in wrapper
    fn(*args, **kwargs)
  File "/workspace/apex/tests/L0/run_mlp/test_mlp.py", line 116, in test_with_bias
    np.testing.assert_allclose(
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-05, atol=1e-07

Mismatched elements: 3 / 1024 (0.293%)
Max absolute difference: 1.899898e-07
Max relative difference: 0.00063155
 x: array([[-0.128916],
       [-0.052111],
       [ 0.001069],...
 y: array([[-0.128916],
       [-0.052111],
       [ 0.001069],...

----------------------------------------------------------------------
Ran 6 tests in 16.497s

FAILED (failures=3)

is a special torch/rocm version required for this benchmark ?

many thanks
David

@ZJLi2013 ZJLi2013 added the bug Something isn't working label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant