[SwiftBindings] Binding process doc #2779

stephen-hawley · 2024-11-15T18:33:13Z

This is a document to describe the process of binding Swift entities in C# and propose an architecture for handling it cleanly.

matouskozak

Nice high-level description!

matouskozak · 2024-11-18T07:37:41Z

src/docs/process-binding.md

+
+1. Generate and consume the abi.json file
+2. Extract and demangle the symbols from the binary
+3. Iterate over every type and function and gnerate C# and/or supporting Swift code


Suggested change

3. Iterate over every type and function and gnerate C# and/or supporting Swift code

3. Iterate over every type and function and generate C# and/or supporting Swift code

matouskozak · 2024-11-18T07:38:03Z

src/docs/process-binding.md

+1. Generate and consume the abi.json file
+2. Extract and demangle the symbols from the binary
+3. Iterate over every type and function and gnerate C# and/or supporting Swift code
+4. Generate type datatbase entries


Suggested change

4. Generate type datatbase entries

4. Generate type database entries

How do you envision the type database structure and usage? In the context of this PR, are the handlers supposed to populate the database as they process the entity?

I envision there being likely two type databases. The first is a bind-time database which would need maximal information about the types: Swift module, Swift name, C# namespace, C# name, entity type, blitability. But this is something that will be in its own document. The database would be populated with new entries after the ability.json file has been parsed but before binding. Older entries can be read in before the abi.json file is read. The second database is run-time. Anything we can do to minimize the size and start-up overhead. For the most part, the run-time database is to support the generic programming model, but again, this would be it's own document.

matouskozak · 2024-11-18T07:44:36Z

src/docs/process-binding.md

+The complexities fall into several broad categories:
+- Type and member naming
+- Multiple types being defined in multiple languages concurrently
+- Marhshaling handled differently based on the type of the parameter and the type of the function


Suggested change

- Marhshaling handled differently based on the type of the parameter and the type of the function

- Marshaling handled differently based on the type of the parameter and the type of the function

kotlarmilos · 2024-11-18T13:46:36Z

src/docs/process-binding.md

+
+In theory, the process of binding a Swift binary into C# should be as simple as:
+
+1. Generate and consume the abi.json file


I would like to see next level of details here, similar to https://github.com/dotnet/runtimelab/tree/feature/swift-bindings/docs#functional-outline. It is not clear how dependencies are resolved.

kotlarmilos · 2024-11-18T13:47:43Z

src/docs/process-binding.md

+In theory, the process of binding a Swift binary into C# should be as simple as:
+
+1. Generate and consume the abi.json file
+2. Extract and demangle the symbols from the binary


Do we want to consider this step as optional? For simpler bindings, this may not be necessary. Also, users might not know how to retrieve it from a framework.

Im not sure about the UX of making it optional. The user will run the code without this step and then get a failure?

I figured that since we need to know the path to the dylib to get the abi.json, we're already there. Since we will need metadata accessors, we need the demangling there.

kotlarmilos · 2024-11-18T13:52:32Z

src/docs/process-binding.md

+
+All of this makes the process of generating code challenging.
+
+The complexities fall into several broad categories:


I think it's important to reflect on the components of our tooling. While having everything in a single project might seem easier, splitting them into multiple projects would force cleaner integration and make testing more granular.

kotlarmilos · 2024-11-18T13:57:04Z

src/docs/process-binding.md

+- Implicit arguments
+- Versioning based on the `@available` attribute.
+
+For this reason I strongly recommend using code-generation tools that can work in a non-linear fashion. There are several ways to achieve this, but I would strongly recommend using the Dynamo framework from Binding Tools for Swift as it can handle both C# and Swift and generates non-linearly. In addition, it shouldn't be a stretch to generate code in parallel on type boundaries.


I would start by identifying limitations with the string-based emitter. Based on those, we can add a thin model layer if needed. I think that the emitter itself shouldn't handle marshalling; that should be done before emitting.

I am all for reusing code which already works. However, I think that two things would be useful first:

Describing the limitations as Milos noted

Describing what Dynamo is, and how it will solve those limitations

kotlarmilos · 2024-11-18T13:58:03Z

src/docs/process-binding.md

+
+Because of this, I think we should adopt a strategy and factory pattern for handlers at various levels.
+
+The general pattern would work like this:


I like this approach with custom handlers :)

kotlarmilos · 2024-11-18T13:59:33Z

src/docs/process-binding.md

+public func generateAClass(String name) -> SomeClass { }
+```
+The process would look at this and identify this as a top-level function and will select a handler factory for it.
+The handler will create a context for the object which would include a class for the top-level object to live in (C# doesn't have top-level functions) and a class to hold top-level pinvokes and a function generation context which would include a place to place function argument declarations, generic declarations, function argument pre-marshaling code, pinvoke argument declarations, pinvoke argument expressions, post-marshaling code, return type declaration, and a return expression. 


We should be able to hold the most of information in declarations and the registrar.

jkurdek · 2024-11-18T14:09:38Z

src/docs/process-binding.md

+The handler will create a context for the object which would include a class for the top-level object to live in (C# doesn't have top-level functions) and a class to hold top-level pinvokes and a function generation context which would include a place to place function argument declarations, generic declarations, function argument pre-marshaling code, pinvoke argument declarations, pinvoke argument expressions, post-marshaling code, return type declaration, and a return expression. 
+
+The handler will execute a step to name the function and the associated pinvoke, including the entry point and library.
+Then for each argument, it will gather information about each argument and from the function handler get a factory to build an argument handler for type `String`. This will in turn name the argument, generate the C# type and add it to the C# argument declaration. It will define the argument type for the pinvoke and add it to the C# pinvoke argument list. If needed, it will generate premarshal code and add it to the premarshal list and post marshal code, and finally an expression for calling the pinvoke.


Suggested change

Then for each argument, it will gather information about each argument and from the function handler get a factory to build an argument handler for type `String`. This will in turn name the argument, generate the C# type and add it to the C# argument declaration. It will define the argument type for the pinvoke and add it to the C# pinvoke argument list. If needed, it will generate premarshal code and add it to the premarshal list and post marshal code, and finally an expression for calling the pinvoke.

Then for each argument, it will gather the necessary information and from the function handler get a factory to build an argument handler for type `String`. This will in turn name the argument, generate the C# type and add it to the C# argument declaration. It will define the argument type for the pinvoke and add it to the C# pinvoke argument list. If needed, it will generate premarshal code and add it to the premarshal list and post marshal code, and finally an expression for calling the pinvoke.

jkurdek · 2024-11-18T14:16:29Z

src/docs/process-binding.md

+In theory, the process of binding a Swift binary into C# should be as simple as:
+
+1. Generate and consume the abi.json file
+2. Extract and demangle the symbols from the binary


Im not sure about the UX of making it optional. The user will run the code without this step and then get a failure?

jkurdek · 2024-11-18T14:20:27Z

src/docs/process-binding.md

+
+All of this makes the process of generating code challenging.
+
+The complexities fall into several broad categories:


Do those complexities impose additional steps over the five you described, or they just make the steps more complicated?

jkurdek · 2024-11-18T14:22:42Z

src/docs/process-binding.md

+- Implicit arguments
+- Versioning based on the `@available` attribute.
+
+For this reason I strongly recommend using code-generation tools that can work in a non-linear fashion. There are several ways to achieve this, but I would strongly recommend using the Dynamo framework from Binding Tools for Swift as it can handle both C# and Swift and generates non-linearly. In addition, it shouldn't be a stretch to generate code in parallel on type boundaries.


I am all for reusing code which already works. However, I think that two things would be useful first:

Describing the limitations as Milos noted

Describing what Dynamo is, and how it will solve those limitations

jkurdek · 2024-11-18T14:23:28Z

src/docs/process-binding.md

+2. Aggregate information about that entity
+3. Select a factory to create a handler for that entity
+4. The handler will generate a context object for handling that entity
+5. Execute a series of steps through the handler that will do work apropriate for each step.


NIT, same on the line below :)

Suggested change

5. Execute a series of steps through the handler that will do work apropriate for each step.

5. Execute a series of steps through the handler that will do work apropriate for each step

jkurdek · 2024-11-18T14:31:16Z

src/docs/process-binding.md

+The handler will create a context for the object which would include a class for the top-level object to live in (C# doesn't have top-level functions) and a class to hold top-level pinvokes and a function generation context which would include a place to place function argument declarations, generic declarations, function argument pre-marshaling code, pinvoke argument declarations, pinvoke argument expressions, post-marshaling code, return type declaration, and a return expression. 
+
+The handler will execute a step to name the function and the associated pinvoke, including the entry point and library.
+Then for each argument, it will gather information about each argument and from the function handler get a factory to build an argument handler for type `String`. This will in turn name the argument, generate the C# type and add it to the C# argument declaration. It will define the argument type for the pinvoke and add it to the C# pinvoke argument list. If needed, it will generate premarshal code and add it to the premarshal list and post marshal code, and finally an expression for calling the pinvoke.


I got a bit lost I think. So the handler doing all the job will be the Top Level functions handler? And then the Top Level function handler will call function handler to get a factory which will build an argument handler?

jkurdek · 2024-11-18T14:32:57Z

src/docs/process-binding.md

+
+A similar process will be done for handling the return type and value. In this case, the pinvoke return type will be a `NativeHandle` and it will be used in conjunction with a registry to either retrieve an already existing C# object that is bound to that handle or it will build one through a factory.
+
+After all this is done, the function handler will finish up by aggregating all the information, writing the C# method and writing the C# pinvoke.


In this example we do not need to generate Swift code, but what about cases where we will need to do this. Will it happen here as well?

It makes sense to do so. The reason being that in any cases where marshaling is not 1:1 with C# capabilities, we will need to change the way that parameters are handled. I'll expound on this more in the docs because it will make it clear why non-linear code writing will make tasks much easier.

vitek-karas · 2024-11-20T17:58:54Z

A general concern I have is that we haven't really described when/how are we going to generate Swift and how are we going to use/ship it. On top of that, trimming is still a major concern for me. Especially the fact that we don't have a good way to trim generated swift code.

Other interops solve this problem by generating the native code (in this case Swift) as basically the last step of building an app after it's known what interop is needed by the app. This means that the swift generation would happen at app-build time and not during the projection tool runtime. There are probably other ways to solve it, but we haven't really discussed these yet.

I think we should solve these:

What is our position towards trimmability and the expectations of the behavior of that aspect of the product. Basically how are we solving the scaling problem of pregenerated interop (the extreme case of this is Win32 API which has ~100K methods and the whole thing simply doesn't scale if pregenerated since it's way too big).
How we generate Swift and how do we ship it. For example, if we decide to pregenerate swift upfront, how are we going to ship that in a NuGet package so that the user can consume a prepared binding library easily. How is that going to be integrated with the final app's build? Or do we postpone Swift generation to the app's build so that we don't have to distribute it with the binding library.
I would be interested in the general desired UX for consuming bindings. I expected this to be the same as referencing a NuGet package, but maybe we need something else?

stephen-hawley · 2024-11-21T15:02:40Z

@vitek-karas - I understand your concern about trimming and packaging. I think both of those things are beyond the scope of this particular document.
For the sake of addressing these things, I would worry about binding on demand as it would adversely affect build/deploy/launch times. One of the main takeaways from this past team week was customer concerns about this.
There are several ways to approach trimming swift:

Trim on dylib boundaries - easy, but only does so much, especially with larger frameworks
Deliver .o files or static libraries and link to them with NativeAOT which should trim out all dead Swift code
Write every Swift entity in a way that has conditional compilation switches on them and based on type usage in C#, recompile the already done binding such that it will exclude unused types. This is obviously sub-optimal in terms of build times.

top-level-process docs

881ad6c

stephen-hawley added the area-SwiftBindings Swift bindings for .NET label Nov 15, 2024

stephen-hawley requested review from kotlarmilos, matouskozak and jkurdek November 15, 2024 18:33

matouskozak approved these changes Nov 18, 2024

View reviewed changes

kotlarmilos reviewed Nov 18, 2024

View reviewed changes

jkurdek reviewed Nov 18, 2024

View reviewed changes

Fix typos, add more info, add info about Dynamo

fd3e1d7

dalexsoto approved these changes Nov 19, 2024

View reviewed changes

matouskozak mentioned this pull request Nov 21, 2024

[SwiftBindings] Describe projecting generic structs #2796

Open

jkurdek approved these changes Nov 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SwiftBindings] Binding process doc #2779

[SwiftBindings] Binding process doc #2779

stephen-hawley commented Nov 15, 2024

matouskozak left a comment

matouskozak Nov 18, 2024

matouskozak Nov 18, 2024

matouskozak Nov 18, 2024

stephen-hawley Nov 19, 2024

matouskozak Nov 18, 2024

kotlarmilos Nov 18, 2024

kotlarmilos Nov 18, 2024

jkurdek Nov 18, 2024

stephen-hawley Nov 18, 2024

kotlarmilos Nov 18, 2024

kotlarmilos Nov 18, 2024

jkurdek Nov 18, 2024

stephen-hawley Nov 18, 2024

kotlarmilos Nov 18, 2024

kotlarmilos Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

jkurdek Nov 18, 2024

stephen-hawley Nov 18, 2024

vitek-karas commented Nov 20, 2024

stephen-hawley commented Nov 21, 2024

	3. Iterate over every type and function and gnerate C# and/or supporting Swift code
	3. Iterate over every type and function and generate C# and/or supporting Swift code

	4. Generate type datatbase entries
	4. Generate type database entries

	- Marhshaling handled differently based on the type of the parameter and the type of the function
	- Marshaling handled differently based on the type of the parameter and the type of the function


		In theory, the process of binding a Swift binary into C# should be as simple as:

		1. Generate and consume the abi.json file


		All of this makes the process of generating code challenging.

		The complexities fall into several broad categories:


		Because of this, I think we should adopt a strategy and factory pattern for handlers at various levels.

		The general pattern would work like this:

	5. Execute a series of steps through the handler that will do work apropriate for each step.
	5. Execute a series of steps through the handler that will do work apropriate for each step


		A similar process will be done for handling the return type and value. In this case, the pinvoke return type will be a `NativeHandle` and it will be used in conjunction with a registry to either retrieve an already existing C# object that is bound to that handle or it will build one through a factory.

		After all this is done, the function handler will finish up by aggregating all the information, writing the C# method and writing the C# pinvoke.

[SwiftBindings] Binding process doc #2779

Are you sure you want to change the base?

[SwiftBindings] Binding process doc #2779

Conversation

stephen-hawley commented Nov 15, 2024

matouskozak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vitek-karas commented Nov 20, 2024

stephen-hawley commented Nov 21, 2024