Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

Open
ari-ruokamo opened this issue Feb 13, 2025 · 4 comments
Open

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

ari-ruokamo opened this issue Feb 13, 2025 · 4 comments
Labels
question Response providing clarification needed. Will not be assigned to a release. (type)

Comments

@ari-ruokamo
Copy link

ari-ruokamo commented Feb 13, 2025

After a long battle I have finally ported a Music Source Separation model from the python world to iOS. The Pytorch model has been exported using the CoreMLTools and the model provides correct results vs. the desktop+GPU+python runtime.

I have been trying to get the model to utilize iOS HW (iPhone 15 Pro Max) Neural Engine (NE) but so far not so good. The model either runs on CPU or CPU+GPU. Xcode model performance tester hints that the model is possible to run on NE, but I doubt the errors at MIL-to-runtime compilation prevent NE execution?

Any hints or tips, are there any options to check in model export phase to overcome this issue? Thanks!

MLModelConfiguration (.cpuAndNeuralEngine):

Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/themodel.mlmodelc/model.mil:2465:12
 @ CreateBnnsGraphProgramFromMIL
E5RT: Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/themodel.mlmodelc/model.mil:2465:12
 (9)
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis

model.mil: 2465:
tensor<fp32, [1, 64, 80, 226]> out_5_has_output_shape = conv_transpose(bias = decoder_0_1_convtrs_2_bias, dilations = out_5_dilations_0, groups = out_5_groups_0, output_shape = out_5_has_output_shape_output_shape_0, pad = out_5_pad_0, pad_type = out_5_pad_type_0, strides = out_5_strides_0, weight = decoder_0_1_convtrs_2_weight, x = var_1932)[name = tensor<string, []>("out_5_has_output_shape")];

@ari-ruokamo ari-ruokamo added the question Response providing clarification needed. Will not be assigned to a release. (type) label Feb 13, 2025
@TobyRoseman
Copy link
Collaborator

I also doubt these errors are what's preventing your model from running on the NE. Without more details about your network and how you are exporting, it's difficult to help. Have you taken a look at the Apple Machine Learning Research Post: Deploying Transformers on the Apple Neural Engine?

@ari-ruokamo
Copy link
Author

ari-ruokamo commented Feb 14, 2025

The problem seems to be in Pytorch-model upsampling path. Why would that not be possible on ANE?

torch.nn.ConvTranspose2d(32, 8, kernel_size=(16, 1), stride=(16, 1)): input=torch.Size([1, 32, 56, 226]), output=torch.Size([1, 8, 896, 226])

@TobyRoseman
Copy link
Collaborator

I'm not sure either. What dtype is being used here? Are you using any flexible shapes?

You could try converting a simple network with just that PyTorch op, and taking a look the MIL ops that get generated.

@ari-ruokamo
Copy link
Author

ari-ruokamo commented Feb 15, 2025

Input shape in the real model is fixed.

I made a quick test with a dummy model. Model converts to CoreML, however, when instantiating runtime in iOS I get the same error messages.

Hmmm, that 16 stride is ANE's kryptonite?

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import coremltools as ct

class TransposeModel(nn.Module):
    def __init__(self):
        super(TransposeModel, self).__init__()
        self.conv_transpose = nn.ConvTranspose2d(
            in_channels=128, 
            out_channels=64, 
            kernel_size=(16,1),
            stride=(16,1), 
            padding=(0,0), 
            dilation=(1,1)
        )

    def forward(self, x):
        return self.conv_transpose(x)

model = TransposeModel()
input = torch.randn(1, 128, 5, 226)
traced_model = torch.jit.trace(model, input)

coreMLModel = ct.convert(
     traced_model,
     compute_precision=ct.precision.FLOAT32,
     compute_units=ct.ComputeUnit.CPU_AND_NE,
    inputs=[ct.TensorType(shape=input.shape, name="input", dtype=ct.converters.mil.mil.types.fp32)],
    outputs=[ct.TensorType(name="output", dtype=ct.converters.mil.mil.types.fp32)],
    minimum_deployment_target=ct.target.iOS17)

save_path = os.path.join(os.getcwd(), "dummy-export.mlpackage")
coreMLModel.save(save_path)

MIL:

    func main<ios17>(tensor<fp32, [1, 128, 5, 226]> input) {
            tensor<fp32, [64]> conv_transpose_bias = const()[name = tensor<string, []>("conv_transpose_bias"), val = tensor<fp32, [64]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(64)))];
            tensor<fp32, [128, 64, 16, 1]> conv_transpose_weight = const()[name = tensor<string, []>("conv_transpose_weight"), val = tensor<fp32, [128, 64, 16, 1]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(384)))];
            tensor<string, []> var_14_pad_type_0 = const()[name = tensor<string, []>("op_14_pad_type_0"), val = tensor<string, []>("valid")];
            tensor<int32, [2]> var_14_strides_0 = const()[name = tensor<string, []>("op_14_strides_0"), val = tensor<int32, [2]>([16, 1])];
            tensor<int32, [4]> var_14_pad_0 = const()[name = tensor<string, []>("op_14_pad_0"), val = tensor<int32, [4]>([0, 0, 0, 0])];
            tensor<int32, [2]> var_14_dilations_0 = const()[name = tensor<string, []>("op_14_dilations_0"), val = tensor<int32, [2]>([1, 1])];
            tensor<int32, []> var_14_groups_0 = const()[name = tensor<string, []>("op_14_groups_0"), val = tensor<int32, []>(1)];
            tensor<int32, [4]> var_14_has_output_shape_output_shape_0 = const()[name = tensor<string, []>("op_14_has_output_shape_output_shape_0"), val = tensor<int32, [4]>([1, 64, 80, 226])];
            tensor<fp32, [1, 64, 80, 226]> output = conv_transpose(bias = conv_transpose_bias, dilations = var_14_dilations_0, groups = var_14_groups_0, output_shape = var_14_has_output_shape_output_shape_0, pad = var_14_pad_0, pad_type = var_14_pad_type_0, strides = var_14_strides_0, weight = conv_transpose_weight, x = input)[name = tensor<string, []>("op_14_has_output_shape")];
        } -> (output);

iOS runtime:

    func dummyTest() {
        let configuration = MLModelConfiguration()
        configuration.computeUnits = .cpuAndNeuralEngine
        dummyModel = try? dummy_model(configuration: configuration)
    }

Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/dummy-model.mlmodelc/model.mil:13:12
 @ CreateBnnsGraphProgramFromMIL
E5RT: Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/dummy-model.mlmodelc/model.mil:13:12
 (9)
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Response providing clarification needed. Will not be assigned to a release. (type)
Projects
None yet
Development

No branches or pull requests

2 participants