Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] All the issue related with e2e shark test suite #812

Open
pdhirajkumarprasad opened this issue Aug 27, 2024 · 4 comments
Open

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Aug 27, 2024

Full ONNX FE tracker is at: #564

ONNX model Zoo model tracker : #886

Running model

In alt_e2e test suite:

setenv CACHE_DIR "some Path where model will be downloaded"

If building torch-mlir and iree from source:

source /path/to/iree-build/.env && export PYTHONPATH
export PYTHONPATH=/path/to/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir:/path/to/torch-mlir/test/python/fx_importer:$PYTHONPATH
export PATH=/path/to/iree-build/tools/:/path/to/torch-mlir/build/bin/:$PATH

python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t ModelName

For onnx/models/

critical issues

regression : #887

CPU

# device issue type issue no #model impacted list of model assignee status
1 CPU One or more operations with large vector sizes (8192 bytes) were found 19058 4 @pashu123
2 CPU "onnx.NonMaxSuppression" failed to legalize operation 'torch.operator' that was explicitly marked illegal 881 2 @jinchen62
3 CPU 'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 1048576 bytes 19027 2 modelList @pashu123
4 CPU onnx.LSTM 1 modelList
5 CPU torch.aten.convolution 1-d grouped 1 modelList @AmosLewis
6 CPU 'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect 876 1 modelList
7 CPU failed to legalize operation onnx.NonZero 820 1 modelList @renxida @AmosLewis will message xida
8 CPU boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp 3293 @renxida
9 CPU Add TorchToLinalg lowering for MaxUnpool operation 718 @jinchen62
10 CPU Fix Onnx.DFT Torch->Linalg lowering 800 @PhaneeshB

import and setup failures

# device issue type issue no #model impacted list of model assignee status

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

# device issue type issue no #model impacted list of model assignee Status
1 GPU func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes 18603 100+
2 GPU 'arith.extui' op operand type 'i64' and result type 'i32' are cast incompatible 19179 10 @pashu123
3 GPU stack frame size (294916) exceeds limit (131056) in function 'torch_jit$async_dispatch_1_softmax_64x4x144x144xf32_dispatch_tensor_store 19180 10 @pashu123
4 GPU error: :0:0: in function main_graph$async_dispatch_2_softmax_Dx9xf32_dispatch_tensor_store void (ptr addrspace(1), ptr addrspace(1), ptr addrspace(1), i32, i32): unsupported dynamic alloca 19181 4 @pashu123

iree runtime

# device issue type issue no #model impacted list of model assignee Status

numerics

# device issue type issue no #model impacted list of model assignee
1 CPU numeric need_to_analyze 101 modleList
2 [numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops 2 18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

@nod-ai nod-ai deleted a comment Aug 27, 2024
@nod-ai nod-ai deleted a comment from yiweifengyan Aug 27, 2024
@zjgarvey
Copy link
Collaborator

Can you update the model List links?

@jinchen62
Copy link
Contributor

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

@pdhirajkumarprasad
Copy link
Author

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

@jinchen62
Copy link
Contributor

jinchen62 commented Aug 29, 2024

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants