CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

ari-ruokamo · 2025-02-13T14:57:02Z

After a long battle I have finally ported a Music Source Separation model from the python world to iOS. The Pytorch model has been exported using the CoreMLTools and the model provides correct results vs. the desktop+GPU+python runtime.

I have been trying to get the model to utilize iOS HW (iPhone 15 Pro Max) Neural Engine (NE) but so far not so good. The model either runs on CPU or CPU+GPU. Xcode model performance tester hints that the model is possible to run on NE, but I doubt the errors at MIL-to-runtime compilation prevent NE execution?

Any hints or tips, are there any options to check in model export phase to overcome this issue? Thanks!

MLModelConfiguration (.cpuAndNeuralEngine):

Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/themodel.mlmodelc/model.mil:2465:12
 @ CreateBnnsGraphProgramFromMIL
E5RT: Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/themodel.mlmodelc/model.mil:2465:12
 (9)
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis

model.mil: 2465:
tensor<fp32, [1, 64, 80, 226]> out_5_has_output_shape = conv_transpose(bias = decoder_0_1_convtrs_2_bias, dilations = out_5_dilations_0, groups = out_5_groups_0, output_shape = out_5_has_output_shape_output_shape_0, pad = out_5_pad_0, pad_type = out_5_pad_type_0, strides = out_5_strides_0, weight = decoder_0_1_convtrs_2_weight, x = var_1932)[name = tensor<string, []>("out_5_has_output_shape")];

The text was updated successfully, but these errors were encountered:

TobyRoseman · 2025-02-13T19:09:39Z

I also doubt these errors are what's preventing your model from running on the NE. Without more details about your network and how you are exporting, it's difficult to help. Have you taken a look at the Apple Machine Learning Research Post: Deploying Transformers on the Apple Neural Engine?

ari-ruokamo · 2025-02-14T17:42:09Z

The problem seems to be in Pytorch-model upsampling path. Why would that not be possible on ANE?

torch.nn.ConvTranspose2d(32, 8, kernel_size=(16, 1), stride=(16, 1)): input=torch.Size([1, 32, 56, 226]), output=torch.Size([1, 8, 896, 226])

TobyRoseman · 2025-02-14T22:44:38Z

I'm not sure either. What dtype is being used here? Are you using any flexible shapes?

You could try converting a simple network with just that PyTorch op, and taking a look the MIL ops that get generated.

ari-ruokamo · 2025-02-15T09:49:26Z

Input shape in the real model is fixed.

I made a quick test with a dummy model. Model converts to CoreML, however, when instantiating runtime in iOS I get the same error messages.

Hmmm, that 16 stride is ANE's kryptonite?

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import coremltools as ct

class TransposeModel(nn.Module):
    def __init__(self):
        super(TransposeModel, self).__init__()
        self.conv_transpose = nn.ConvTranspose2d(
            in_channels=128, 
            out_channels=64, 
            kernel_size=(16,1),
            stride=(16,1), 
            padding=(0,0), 
            dilation=(1,1)
        )

    def forward(self, x):
        return self.conv_transpose(x)

model = TransposeModel()
input = torch.randn(1, 128, 5, 226)
traced_model = torch.jit.trace(model, input)

coreMLModel = ct.convert(
     traced_model,
     compute_precision=ct.precision.FLOAT32,
     compute_units=ct.ComputeUnit.CPU_AND_NE,
    inputs=[ct.TensorType(shape=input.shape, name="input", dtype=ct.converters.mil.mil.types.fp32)],
    outputs=[ct.TensorType(name="output", dtype=ct.converters.mil.mil.types.fp32)],
    minimum_deployment_target=ct.target.iOS17)

save_path = os.path.join(os.getcwd(), "dummy-export.mlpackage")
coreMLModel.save(save_path)

MIL:

    func main<ios17>(tensor<fp32, [1, 128, 5, 226]> input) {
            tensor<fp32, [64]> conv_transpose_bias = const()[name = tensor<string, []>("conv_transpose_bias"), val = tensor<fp32, [64]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(64)))];
            tensor<fp32, [128, 64, 16, 1]> conv_transpose_weight = const()[name = tensor<string, []>("conv_transpose_weight"), val = tensor<fp32, [128, 64, 16, 1]>(BLOBFILE(path = tensor<string, []>("@model_path/weights/weight.bin"), offset = tensor<uint64, []>(384)))];
            tensor<string, []> var_14_pad_type_0 = const()[name = tensor<string, []>("op_14_pad_type_0"), val = tensor<string, []>("valid")];
            tensor<int32, [2]> var_14_strides_0 = const()[name = tensor<string, []>("op_14_strides_0"), val = tensor<int32, [2]>([16, 1])];
            tensor<int32, [4]> var_14_pad_0 = const()[name = tensor<string, []>("op_14_pad_0"), val = tensor<int32, [4]>([0, 0, 0, 0])];
            tensor<int32, [2]> var_14_dilations_0 = const()[name = tensor<string, []>("op_14_dilations_0"), val = tensor<int32, [2]>([1, 1])];
            tensor<int32, []> var_14_groups_0 = const()[name = tensor<string, []>("op_14_groups_0"), val = tensor<int32, []>(1)];
            tensor<int32, [4]> var_14_has_output_shape_output_shape_0 = const()[name = tensor<string, []>("op_14_has_output_shape_output_shape_0"), val = tensor<int32, [4]>([1, 64, 80, 226])];
            tensor<fp32, [1, 64, 80, 226]> output = conv_transpose(bias = conv_transpose_bias, dilations = var_14_dilations_0, groups = var_14_groups_0, output_shape = var_14_has_output_shape_output_shape_0, pad = var_14_pad_0, pad_type = var_14_pad_type_0, strides = var_14_strides_0, weight = conv_transpose_weight, x = input)[name = tensor<string, []>("op_14_has_output_shape")];
        } -> (output);

iOS runtime:

    func dummyTest() {
        let configuration = MLModelConfiguration()
        configuration.computeUnits = .cpuAndNeuralEngine
        dummyModel = try? dummy_model(configuration: configuration)
    }

Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/dummy-model.mlmodelc/model.mil:13:12
 @ CreateBnnsGraphProgramFromMIL
E5RT: Error(s) occurred compiling MIL to BNNS graph:
[CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at <path>.app/dummy-model.mlmodelc/model.mil:13:12
 (9)
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis
Deconv stride should be 4 on both width and height axis

ari-ruokamo added the question Response providing clarification needed. Will not be assigned to a release. (type) label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

ari-ruokamo commented Feb 13, 2025 •

edited

Loading

TobyRoseman commented Feb 13, 2025

ari-ruokamo commented Feb 14, 2025 •

edited

Loading

TobyRoseman commented Feb 14, 2025

ari-ruokamo commented Feb 15, 2025 •

edited

Loading

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

CreateBnnsGraphProgramFromMIL fails on model init on iOS HW #2450

Comments

ari-ruokamo commented Feb 13, 2025 • edited Loading

TobyRoseman commented Feb 13, 2025

ari-ruokamo commented Feb 14, 2025 • edited Loading

TobyRoseman commented Feb 14, 2025

ari-ruokamo commented Feb 15, 2025 • edited Loading

ari-ruokamo commented Feb 13, 2025 •

edited

Loading

ari-ruokamo commented Feb 14, 2025 •

edited

Loading

ari-ruokamo commented Feb 15, 2025 •

edited

Loading