[RFC] Adding an ONNX to Torch conversion #1639

qedawkins · 2022-08-25T23:30:14Z

Hello everyone, in a recent RFC in torch-mlir a conversion from ONNX dialect to the Torch dialect in torch-mlir was proposed (llvm/torch-mlir#1255). After feedback we are looking at adding that conversion pass to onnx-mlir.

Proposal

Add torch-mlir as a submodule: https://github.com/llvm/torch-mlir
Add a new conversion pipeline from ONNX to Torch (e.g. "--convert-onnx-to-torch-pipeline") to manage both op and type conversion.
Conversion tests in "test/mlir/conversion/onnx_to_torch/"
Ideally target the torch-mlir backend contract which comes after shape inference and maximizing value semantics

Proof of Concept

A proof of concept for the conversion can be found at: https://github.com/nod-ai/onnx-mlir/tree/convert-to-torch
A conversion for ONNXAddOp and ONNXConstantOp is included along with unit tests for each. This includes five passes in the conversion pipeline:

"convert-onnx-to-torch" handles the op by op lowering to torch dialect
"convert-function-types-to-torch-types" converts function arguments to torch types (e.g. torch.tensor) that wasn't converted in the previous pass
"finalize-torch-type-conversion" finalizes the conversion to torch types and appropriately removes any UnrealizedCastConversionOp's
"erase-onnx-entry-point" right now this entry point op gets removed for compatibility with torch-mlir. If this poses a problem on the onnx-mlir side feedback would be much appreciated.
"refine-torch-func-value-semantics" changes the types of tensors for public functions without uses to have value semantics. This is a compatibility requirement for torch-mlir but also a reflection of this PoC conversion targeting non-value semantic tensors. If tensors in onnx-mlir implicitly have value semantics then this pass can be removed and the type conversions handling in passes 1, 2, and 3 can be changed to match the type conversion in torch-mlir more closely.

Looking forward to any and all feedback!

cc @powderluv @silvasean @sstamenova @ashay

AlexandreEichenberger · 2022-08-26T15:18:59Z

We are generally receptive, and we already have an effort from onnx to mhlo and onnx to tossa. Have you considered piggybacking on these? Namely from mhlo or toss to torch-mlid?
Adding new conversions is fine pending on these two conditions: one is that there is the support behind it to carry to a fully (or close to fully) supported pass. Our goal here is to avoid semi-complete solutions that add code burden to the project without providing useful service to our customers. Second that we resolve our dependences with LLVM in a somewhat concerted fashion. On that latter point, right now we attempt to synchronize our upgrade to LLVM on a weekly basis with MHLO... you can check our pinned issue #1624.

silvasean · 2022-08-26T19:25:44Z

We are generally receptive, and we already have an effort from onnx to mhlo and onnx to tossa. Have you considered piggybacking on these? Namely from mhlo or toss to torch-mlid?

We have Torch -> {Linalg,MHLO,TOSA} conversions on the Torch-MLIR side, so ideally we can centralize all efforts on the ONNX -> Torch conversion and remove the need for ONNX -> {MHLO,TOSA} direct lowerings. Conversely, if significant effort is going to be invested in ONNX -> {MHLO,TOSA} then ONNX -> Torch makes less sense.

On that latter point, right now we attempt to synchronize our upgrade to LLVM on a weekly basis with MHLO... you can check our pinned issue #1624.

We are all basing off of the green commit in llvm/torch-mlir#1178 so this should not be an issue.

5. "refine-torch-func-value-semantics" changes the types of tensors for public functions without uses to have value semantics. This is a compatibility requirement for torch-mlir but also a reflection of this PoC conversion targeting non-value semantic tensors. If tensors in onnx-mlir implicitly have value semantics then this pass can be removed and the type conversions handling in passes 1, 2, and 3 can be changed to match the type conversion in torch-mlir more closely.

This shouldn't be needed. Builtin tensors, which onnx-mlir appears to use, already have value semantics, so you should just directly convert builtin tensors to !torch.vtensor<...> appropriately.

powderluv · 2022-08-27T21:14:13Z

Adding new conversions is fine pending on these two conditions: one is that there is the support behind it to carry to a fully (or close to fully) supported pass. Our goal here is to avoid semi-complete solutions that add code burden to the project without providing useful service to our customers.

Sounds good. We have few customers who would like to see this unified to avoid having two separate paths. nod.ai team can contribute to support the required passes (as we do in torch-mlir) .

If this sounds reasonable we can get @qedawkins to refine the RFC with specific commits and start putting them up for review.

AlexandreEichenberger · 2022-08-29T14:15:15Z

Right now, the MHLO is making rapid progress, which would integrate the ONNX->TF ecosystem. If there is sufficient support on your end, I can see the benefit of having a ONNX -> Torch ecosystem as well. Can you comment on the technical merit of having a direct ONNX->Torch-MLIR vs ONNX->MHLO for your particular use?

silvasean · 2022-08-29T20:17:14Z

Can you comment on the technical merit of having a direct ONNX->Torch-MLIR vs ONNX->MHLO for your particular use?

So for Torch-MLIR it would be a no-op technically. The idea is that since Torch-MLIR already has these lowerings written, then onnx-mlir could just use them. A few technical advantages of ONNX -> Torch -> MHLO for the ecosystem would be:

as MHLO evolves to StableHLO for frontends to target, onnx-mlir would be shielded from all those changes (additionally, as an ecosystem, we would only need to update one place)
onnx-mlir would benefit from all the e2e correctness testing that Torch-MLIR does for the lowerings
I see that some of the contributors to ONNX -> MHLO lowerings are from the same teams as Torch -> MHLO lowerings @Connor-XY @yaochengji (cc @fortianyou and @ZihengJiang for visibility from the TorchToMhlo side). This may reduce the work for those teams.
ONNX-mlir would get TOSA and direct Linalg lowerings "for free".

A few potential downsides:

onnx-mlir development would be more coupled to Torch-MLIR
onnx-mlir developers would have to be more aware of PyTorch op semantics to write ONNX -> Torch lowerings

AlexandreEichenberger · 2022-08-30T13:17:10Z

Thanks for the feedback. We are definitely encouraging the MHLO onnx-mlir developers to migrate to StableHLO and I believe they agreed once it becomes stable... think it is almost there.

I agree also that with a stronger connection to MLIR via MHLO and possibly Torch, onnx to tosa might become redundant and since I have not seen lots of progress lately (apologies if mischaracterized) that might be a better way to go about that.

Given the small teams working on these projects, if there is a way to have common infrastructure (reusing when possible) between the MHLO and Torch, that would be great. Exploring structures to enable this kind of reuse would strengthen both effort and reduce redundancy, as AI models (ONNX, Torch, TF) do a lot of incremental changes that all need to be maintained and upgraded over time. So more reuse means less updating of redundant activities...

Along a similar way, we have a unified way to do shape inference to detect compile time shapes, and reuse the same code (in a different context) to generate the runtime code for runtime shapes. MHLO has started approaching custom code for dyn shapes, and we are encouraging to reuse the common infrastructure between shape inference for compile time and run time shapes.

silvasean · 2022-08-30T18:40:56Z

Along a similar way, we have a unified way to do shape inference to detect compile time shapes, and reuse the same code (in a different context) to generate the runtime code for runtime shapes. MHLO has started approaching custom code for dyn shapes, and we are encouraging to reuse the common infrastructure between shape inference for compile time and run time shapes.

We have the exact same thing: https://github.com/llvm/torch-mlir/blob/main/docs/shape_lib.md 🤣

AlexandreEichenberger · 2022-08-31T14:25:08Z

Good to know... looks like we went through the same pain points, having 2 similar code for shape inference and later code gen, to unify both for lower maintenance.

We also have not generated guard yet, though it is also in the back of our mind.

tungld · 2022-09-01T02:44:30Z

As I understand the propose here is to integrate ONNX->Torch into onnx-mlir only and does not utilize onnx-mlir further e.g. to LLVM level. Also this erase-onnx-entry-point indicates onnx-mlir driver is not needed. Then onnx-mlir can be a third_party somewhere for torch-mlir to import onnx models into torch-mlir dialect. In that sense, shape inference and other passes in onnx-mlir are not needed, and there is no need to unify them between torch-mlir and onnx-mlir.

silvasean · 2022-09-01T03:11:18Z

The idea is that ONNX->Torch could save effort for the onnx-mlir community, by providing ONNX->Torch->{Linalg,MHLO,TOSA} which gives Linalg, MHLO, and TOSA "for free" (and eventually TCP, StableHLO, etc.). As Torch-MLIR we don't really have any interest per se in importing ONNX, so we would not take a dependency on onnx-mlir. We are simply offering to collaborate and provide a service if onnx-mlir doesn't want to maintain Linalg, MHLO, and TOSA (+ TCP, StableHLO, ...) lowerings yourselves. I'll let the onnx-mlir community decide if that is appealing, and we are willing to collaborate if you folks want that.

tungld · 2022-09-01T03:46:59Z

The idea is that ONNX->Torch could save effort for the onnx-mlir community, by providing ONNX->Torch->{Linalg,MHLO,TOSA} which gives Linalg, MHLO, and TOSA "for free" (and eventually TCP, StableHLO, etc.)

Just want to see the big picture. Could you elevate a bit about how end-users use this pipeline? For example, how they prepare inputs for running the compiled model and get outputs. Your vision is to use onnx-mlir driver for that purpose or expect a new driver, or the driver already exists.

We are simply offering to collaborate and provide a service if onnx-mlir doesn't want to maintain Linalg, MHLO, and TOSA (+ TCP, StableHLO, ...) lowerings yourselves

Yes, sure, it makes sense.

silvasean · 2022-09-01T03:49:23Z

Just want to see the big picture. Could you elevate a bit about how end-users use this pipeline? For example, how they prepare inputs for running the compiled model and get outputs. Your vision is to use onnx-mlir driver for that purpose or expect a new driver, or the driver already exists.

I'm imagining that onnx-mlir would just link to Torch-MLIR and run a few of Torch-MLIR's passes as an internal implementation detail to get from ONNX -> {MHLO,TOSA,Linalg} depending on the user wants. It shouldn't touch things like preparing inputs/drivers/etc.

powderluv · 2022-09-01T04:16:39Z

We (nod.ai) have couple customers that require this - and we plan to support them somehow. And that is why we added the initial RFC with implementation in torch-mlir (which included onnx / onnx-mlir as deps) and then this RFC+implementation with torch-mlir as a dep in onnx-mlir.

Basically the customers want to avoid two paths to MHLO (which they care about) and two kinds / levels of ops/shape support. We (nod.ai) team contribute a lot to torch-mlir to maintain the torch-dialect->{MHLO, TOSA, Linalg} and if we can leverage an ONNX entry point into torch-dialect and then leverage all the other lowerings it is one set of backend lowerings with two "frontends" torch and onnx.

The end product would like like a compiler/binary that can consume ONNX modules as an Onnxruntime EP or can consume Torch in some other serving solution like Triton Inference server. The lowering to torch-dialect may be different but after that it is all the same allowing backend Accelerators to easily consume from both frameworks.

tungld · 2022-09-01T12:23:33Z

@Connor-XY @yaochengji Could you please chime in? Would like to hear your comments about ONNX->Torch->MHLO vs ONNX-MHLO. Thanks!

Connor-XY · 2022-09-01T15:24:15Z

I think it would be great to have the conversion from ONNX to Torch, but I also believe that it is better to have the conversion from ONNX to MHLO directly. First, we could do the conversion directly so it isn’t necessary to use Torch as the bridge. Second, if we adopt ONNX->Torch->MHLO and there is a version upgrade, we need to modify both ONNX->Torch and Torch->MHLO. The conversion would rely on Torch (onnx-milr development would rely on Torch more).

yaochengji · 2022-09-01T17:57:21Z

To add what @Connor-XY said, thanks for the help of @tungld, @AlexandreEichenberger and all other community members, we are currently applying onnx-mlir on our business models in ByteDance.

We'll definitely try the ONNX->Torch route after it is mature. And before that, we'll keep maintaining and developing the ONNX-MHLO route.

Svoch · 2022-09-06T19:32:12Z

The idea of a robust conversions between ONNX and Torch front-ends seems interesting. Excited to see how this RFC evolves. It seems reasonable to avoid duplicating the effort to support intermediate dialects and features like shape inference.

AlexandreEichenberger · 2022-09-07T19:02:46Z

Do you think you could present the benefits of this direct approach at our Tu evening meeting, either next week or in 2 weeks?
And if you could give us a good link to read about the MLIR/Torch effort, that would be great

qedawkins · 2022-09-08T16:18:11Z

Do you think you could present the benefits of this direct approach at our Tu evening meeting, either next week or in 2 weeks? And if you could give us a good link to read about the MLIR/Torch effort, that would be great

We can aim for a presentation + discussion at next week's meeting if that works for you. Torch-MLIR has documentation you can reference for the time being here: https://github.com/llvm/torch-mlir/tree/main/docs

The architecture document (https://github.com/llvm/torch-mlir/blob/main/docs/architecture.md) might be a good place to start.

qedawkins · 2022-09-12T18:18:29Z

@AlexandreEichenberger What is the status on this? Is tomorrow's meeting still a good time to discuss this?

AlexandreEichenberger · 2022-09-12T20:13:01Z

Absolutely, you can have a 15 min slot at our 8-9pm est meeting.

https://github.com/onnx/onnx-mlir/wiki/Informal-meeting-agenda-and-notes

AlexandreEichenberger · 2022-09-13T00:36:43Z

Would be great if you could explain the value add of the onnx-torch lowering in the picture of the torch-mlir effort:
https://github.com/llvm/torch-mlir/blob/main/Torch-MLIR.png

silvasean · 2022-09-13T00:38:44Z

There is no value-add for Torch-MLIR (it is actually a minor cost for us to support this). It is purely ONNX-MLIR reusing work already done for Torch-MLIR. It is a win for the ecosystem to maintain fewer lowerings, in theory.

AlexandreEichenberger · 2022-09-13T00:52:51Z

Got it. Found your https://github.com/llvm/torch-mlir/blob/main/docs/architecture.md instructing. If you could give us an idea of where you would intersect ONNX dialect with the dialects in that doc, that would let us focus our attention a bit more on the relevant dialect, tx

silvasean · 2022-09-13T01:22:35Z

ONNX-MLIR would produce the torch dialect in the "backend contract" form.

qedawkins · 2022-09-13T19:49:26Z

This diagram shows a preview of how the conversion pass slots in on the Torch-MLIR side for reference before the meeting.

The ONNX-MLIR side in this diagram leaves out some components that can be filled in later.

AlexandreEichenberger · 2022-09-13T19:51:10Z

Is there a similar effort to have a "stable torch dialect", like "stableMHLO"?

silvasean · 2022-09-13T19:57:34Z

No, our dialect is only as stable as the torch op set (which is fairly stable, but not guaranteed)

AlexandreEichenberger · 2022-09-14T17:25:02Z

@qedawkins would you be able to add a pdf of your presentation yesterday? for these that were not able to be present. Tx

qedawkins · 2022-09-14T17:36:02Z

Sure!
Here it is: ONNX To Torch Conversion.pdf

AlexandreEichenberger · 2022-09-14T18:53:26Z

I have a summary of our Tuesday evening discussion in the wiki pages from this project here.

As a follow on to this discussion, we would like to have feedback from the teams developing onnx->tosa and onnx->mhlo converters, as this RFC may also be able to generate tossa or mhlo output via the torch-mlir backends currently available.

I am listing below developers that I see associated with these two projects; if you know of any other, please add references here and/or contact them directly.

Thanks

@Connor-XY @yaochengji @chenchongsong @BraunPhilipp @sjarus Please provide feedback as you see fit within a week, would be great to have it before the next meeting where we will take a position on this RFC. You are obviously welcome to participate at the meeting too, now open to the whole onnx-mlir community. See wiki page for info on how to join.

sjarus · 2022-09-14T23:09:46Z

Adding @eric-k256 @stephenneuendorffer and @ljfitz for input on ONNX-TOSA in the context of the likely presence of ONNX-Torch and the existing Torch-TOSA.

yaochengji · 2022-09-15T04:45:59Z

Please provide feedback as you see fit within a week

ONNX2Torch route should be useful, especially when we are considering changing to stablehlo. The biggest concern we have is that ONNX -> Torch -> MHLO route is more complex than ONNX -> MHLO, which introduces more potential issues.

And the ONNX->MHLO route is already applied on the business models in Bytedance. The model coverage is quite high ( about 50%), considering we've only worked on this for 2 months. Therefore we will continue to work on the ONNX->MHLO route to increase the model coverage in Bytedance. For the ONNX -> Torch -> MHLO route, if we later find that it is mature enough, we will switch to it.

AlexandreEichenberger · 2022-09-15T18:09:36Z

Thanks for your feedback @yaochengji, this is very useful. Primary concern to us is to have mature implementation of dialects, so knowing that your team will continue working on MHLO is good to know. Hopefully you can also resolve the handling of dynamic shapes in a way that reuses onnx-mlir infra would be great and unblock some of the MHLO PRs that have been blocked for longer than I like.

yaochengji · 2022-09-15T18:17:16Z

Hopefully you can also resolve the handling of dynamic shapes in a way that reuses onnx-mlir infra would be great and unblock some of the MHLO PRs that have been blocked for longer than I like.

Yes, we're actively working on this. I've already discussed with @Connor-XY offline and figured out how to use this correctly. I guess @Connor-XY could resolve this soon.

AlexandreEichenberger · 2022-09-15T18:32:52Z

I mentioned that we have a way to annotate code/makefile/xxx and then gather annotations to generate a report of coverage. We do this for our support of lowering to CPU and to our NNPA based accelerator. This same mechanism can be used to generate a report of coverage of a converter from ONNX to TOSA/MHLO/Torch-MLIR.

PR describes format and will let you know which file/makefile to edit. We added link to reports on main README.md page. PR #1475. But check current code as we may have modified the format a bit.

yaochengji · 2022-09-15T19:00:30Z

I mentioned that we have a way to annotate code/makefile/xxx and then gather annotations to generate a report of coverage.

Thanks. We'll take a look at this.

AlexandreEichenberger · 2022-09-20T13:14:54Z

Any additional feedback? So far, there is mostly positive feedback.

One thing we learned from MHLO effort is that it is nice to have a switch that disable building the MHLO part. CI all work with everything on, but certain users may elect to not have parts that are not needed. This would also be a feature that would be good to have for torch-mlir converter.

silvasean · 2022-09-20T13:51:40Z

Yes, Torch-MLIR has a build flag to disable building the MHLO path.

AlexandreEichenberger · 2022-09-20T13:54:34Z

Yes, Torch-MLIR has a build flag to disable building the MHLO path.

Was asking if you would also support a flag disabling building Torch-MLIR within onnx-mlir

silvasean · 2022-09-20T13:57:22Z

Ah yes, that's a good idea.

AlexandreEichenberger · 2022-09-20T14:07:46Z

As previously stated, I believe that unless we hear any additional concerns by today, we should go ahead with your RFC.

AlexandreEichenberger · 2022-09-21T14:12:56Z

I believe you should go ahead and start PRs to implement your converter. Good luck.

qedawkins · 2022-09-23T16:25:59Z

Thank you all for the feedback and helpful discussion. I've opened a PR #1731 where we can begin ironing out the details and get an initial version of this merged.

qedawkins mentioned this issue Aug 25, 2022

[RFC] Add ONNX import support to torch-mlir llvm/torch-mlir#1255

Closed

qedawkins mentioned this issue Sep 23, 2022

Initialize conversion passes from ONNX to Torch-MLIR backend contract #1731

Draft

qedawkins mentioned this issue Sep 23, 2022

[RFC] Support for an embedded build of Torch-MLIR llvm/torch-mlir#1411

Open

powderluv mentioned this issue Oct 19, 2022

[downstream project integration] Enable use of LLVM_EXTERNAL_PROJECTS #1795

Open

[RFC] Adding an ONNX to Torch conversion #1639

[RFC] Adding an ONNX to Torch conversion #1639

Comments

qedawkins commented Aug 25, 2022 • edited Loading

Proposal

Proof of Concept

AlexandreEichenberger commented Aug 26, 2022

silvasean commented Aug 26, 2022

powderluv commented Aug 27, 2022

AlexandreEichenberger commented Aug 29, 2022

silvasean commented Aug 29, 2022

AlexandreEichenberger commented Aug 30, 2022

silvasean commented Aug 30, 2022

AlexandreEichenberger commented Aug 31, 2022

tungld commented Sep 1, 2022

silvasean commented Sep 1, 2022

tungld commented Sep 1, 2022

silvasean commented Sep 1, 2022

powderluv commented Sep 1, 2022

tungld commented Sep 1, 2022

Connor-XY commented Sep 1, 2022

yaochengji commented Sep 1, 2022

Svoch commented Sep 6, 2022 • edited Loading

AlexandreEichenberger commented Sep 7, 2022 • edited Loading

qedawkins commented Sep 8, 2022

qedawkins commented Sep 12, 2022

AlexandreEichenberger commented Sep 12, 2022

AlexandreEichenberger commented Sep 13, 2022

silvasean commented Sep 13, 2022

AlexandreEichenberger commented Sep 13, 2022

silvasean commented Sep 13, 2022 • edited Loading

qedawkins commented Sep 13, 2022 • edited Loading

AlexandreEichenberger commented Sep 13, 2022

silvasean commented Sep 13, 2022

AlexandreEichenberger commented Sep 14, 2022

qedawkins commented Sep 14, 2022

AlexandreEichenberger commented Sep 14, 2022

sjarus commented Sep 14, 2022

yaochengji commented Sep 15, 2022

AlexandreEichenberger commented Sep 15, 2022

yaochengji commented Sep 15, 2022

AlexandreEichenberger commented Sep 15, 2022 • edited Loading

yaochengji commented Sep 15, 2022

AlexandreEichenberger commented Sep 20, 2022

silvasean commented Sep 20, 2022

AlexandreEichenberger commented Sep 20, 2022

silvasean commented Sep 20, 2022

AlexandreEichenberger commented Sep 20, 2022 • edited Loading

AlexandreEichenberger commented Sep 21, 2022

qedawkins commented Sep 23, 2022

qedawkins commented Aug 25, 2022 •

edited

Loading

Svoch commented Sep 6, 2022 •

edited

Loading

AlexandreEichenberger commented Sep 7, 2022 •

edited

Loading

silvasean commented Sep 13, 2022 •

edited

Loading

qedawkins commented Sep 13, 2022 •

edited

Loading

AlexandreEichenberger commented Sep 15, 2022 •

edited

Loading

AlexandreEichenberger commented Sep 20, 2022 •

edited

Loading