[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad · 2024-08-27T16:00:31Z

Full ONNX FE tracker is at: #564

ONNX model Zoo model tracker : #886

Running model

In alt_e2e test suite:

setenv CACHE_DIR "some Path where model will be downloaded"

If building torch-mlir and iree from source:

source /path/to/iree-build/.env && export PYTHONPATH
export PYTHONPATH=/path/to/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir:/path/to/torch-mlir/test/python/fx_importer:$PYTHONPATH
export PATH=/path/to/iree-build/tools/:/path/to/torch-mlir/build/bin/:$PATH

python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t ModelName

For onnx/models/

critical issues

regression : #887

CPU

#	device	issue type	issue no	#model impacted	list of model	assignee	status
1	CPU	One or more operations with large vector sizes (8192 bytes) were found	19058	4		@pashu123
2	CPU	"onnx.NonMaxSuppression" failed to legalize operation 'torch.operator' that was explicitly marked illegal	881	2		@jinchen62
3	CPU	'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 1048576 bytes	19027	2	modelList	@pashu123
4	CPU	onnx.LSTM		1	modelList
5	CPU	torch.aten.convolution	1-d grouped	1	modelList	@AmosLewis
6	CPU	'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect	876	1	modelList
7	CPU	failed to legalize operation onnx.NonZero	820	1	modelList	@renxida	@AmosLewis will message xida
8	CPU	boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp	3293			@renxida
9	CPU	Add TorchToLinalg lowering for MaxUnpool operation	718			@jinchen62
10	CPU	Fix Onnx.DFT Torch->Linalg lowering	800			@PhaneeshB

import and setup failures

#	device	issue type	issue no	#model impacted	list of model	assignee	status

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

#	device	issue type	issue no	#model impacted	assignee
1	GPU	func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes	18603	100+
2	GPU	'arith.extui' op operand type 'i64' and result type 'i32' are cast incompatible	19179	10	@pashu123
3	GPU	stack frame size (294916) exceeds limit (131056) in function 'torch_jit$async_dispatch_1_softmax_64x4x144x144xf32_dispatch_tensor_store	19180	10	@pashu123
4	GPU	error: :0:0: in function main_graph$async_dispatch_2_softmax_Dx9xf32_dispatch_tensor_store void (ptr addrspace(1), ptr addrspace(1), ptr addrspace(1), i32, i32): unsupported dynamic alloca	19181	4	@pashu123

iree runtime

#	device	issue type	issue no	#model impacted	list of model	assignee	Status

numerics

#	device	issue type	issue no	#model impacted	list of model	assignee
1	CPU	numeric	need_to_analyze	101	modleList
2	[numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops	2	18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

The text was updated successfully, but these errors were encountered:

zjgarvey · 2024-08-27T19:08:32Z

Can you update the model List links?

jinchen62 · 2024-08-27T21:00:44Z

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

pdhirajkumarprasad · 2024-08-28T04:38:56Z

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

jinchen62 · 2024-08-29T23:31:02Z

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

nod-ai deleted a comment Aug 27, 2024

nod-ai deleted a comment from yiweifengyan Aug 27, 2024

kumardeepakamd mentioned this issue Aug 29, 2024

[Tracker] Onnx FE Support #564

Open

kumardeepakamd mentioned this issue Sep 12, 2024

Turbine Camp #828

Open

25 tasks

PhaneeshB mentioned this issue Sep 12, 2024

Fix Onnx.DFT Torch->Linalg lowering #800

Open

vinayakdsci mentioned this issue Oct 9, 2024

failed to legalize operation 'hal.interface.constant.load' iree-org/iree#18487

Open

pdhirajkumarprasad mentioned this issue Oct 28, 2024

removing model which are not valid and duplicates nod-ai/SHARK-TestSuite#379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 •

edited

Loading

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

Comments

pdhirajkumarprasad commented Aug 27, 2024 • edited Loading

Running model

For onnx/models/

critical issues

CPU

import and setup failures

iree-compile

iree runtime

numerics

IREE EP only issues

low priority

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 • edited Loading

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

jinchen62 commented Aug 29, 2024 •

edited

Loading