Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which memory dynamic-linking libraries should use for host functions? #232

Open
yamt opened this issue Aug 7, 2024 · 16 comments
Open

which memory dynamic-linking libraries should use for host functions? #232

yamt opened this issue Aug 7, 2024 · 16 comments

Comments

@yamt
Copy link
Contributor

yamt commented Aug 7, 2024

https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md says:

Regardless of the kind, all modules accessing WASI APIs also export a linear memory with the name memory. Data pointers in WASI API calls are relative to this memory's index space.

i guess it's the convention used by many of host functions, not only WASI.

however, wasi-libc's libc.so doesn't export "memory".

actually, wasm-ld rejects -shared -export-memory.
https://github.com/llvm/llvm-project/blob/06a808c4f4014edfeb2517bddd0bcb63eb4a260b/lld/wasm/Driver.cpp#L658

am i missing something?

a. wasm-ld should be fixed.

b. application-abi.md doesn't apply to a module which is a shared library.
the memory exported from the "main" module should always be used.
i feel this a bit strange because the module calling wasi (eg. libc.so) doesn't know about the "main" module.

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory.

Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.

@yamt
Copy link
Contributor Author

yamt commented Aug 7, 2024

do you mean b. is correct?

well, i agree that the memory exported by the main module and the memory imported by libc.so are normally same.
is it something a runtime linker should actively validate?

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

I suppose it could/should validate that there is exactly one export called memory. Two different modules in the same program both exporting the linear memory would be an error I believe.

A while back we tried to rename memory to __linear_memory to be more descriptive (and match __indirect_function_table with two leading underscores) but for now we are still using just memory for the linear memory.

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

https://github.com/WebAssembly/WASI/blob/main/legacy/application-abi.md was written before WASI had any concept of dynamic linking, so easy program is just one module. I will likely need to be updated if/when we want to support dynamic linking.

@yamt
Copy link
Contributor Author

yamt commented Aug 7, 2024

even with dynamic-linking, the main module still should export memory and __indirect_table, right?
currently it seems impossible for pie exeuctable because wasm-ld rejects -pie --export-table.
https://github.com/llvm/llvm-project/blob/04e7eaf91f3e9b6830feb2e08bb5b0ddb14c3174/lld/wasm/Driver.cpp#L632
do you consider it a bug?

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

At yes, that does look odd. In emscripten for some reason we defined the table on the JS side and import it, but your are correct, the main module should really be exporting it.

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

I took another look at emscripten and even attempted to convert it to using main-module-exported memory: emscripten-core/emscripten#22332.

However, don't think this can work today, at least as long as the main module is being built as -pie. This is because in -pie mode, like -shared, the memory and table offsets used for the static data and element segments are based on the imported __table_base and __memory_base globals. Since these are not known to the linker its not possible to great an initial table or memory that is big enough for the static data (since it would need to do __table_base + segments size and/or __memory_base + semgnet size to determine this value).

In theory it could work if we pass --initial-table=xx to the linker, but in practice we want the initial table to be just big enough to for elements in the main module, and we don't know how man table elements there are until after the linker has run.

The solution here I think is to avoid building the main module with PIC (emscripten-core/emscripten#12682).. since then the linker knowns exactly how big to make the initial table and memory.

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2024

I do think we could remove the restriction in https://github.com/llvm/llvm-project/blob/04e7eaf91f3e9b6830feb2e08bb5b0ddb14c3174/lld/wasm/Driver.cpp#L632 but that still leaves the problem of what value to use for --initial-table. The current default of "use the size of the segments" doesn't work since it doesn't take into account __table_base.

@yamt
Copy link
Contributor Author

yamt commented Aug 8, 2024

I took another look at emscripten and even attempted to convert it to using main-module-exported memory: emscripten-core/emscripten#22332.

However, don't think this can work today, at least as long as the main module is being built as -pie. This is because in -pie mode, like -shared, the memory and table offsets used for the static data and element segments are based on the imported __table_base and __memory_base globals. Since these are not known to the linker its not possible to great an initial table or memory that is big enough for the static data (since it would need to do __table_base + segments size and/or __memory_base + semgnet size to determine this value).

i'm not familiar enough with emscripten to grasp the diff quickly.
from your description, i guess you are trying to make it stop importing, right?

what's wrong with making it both imported and exported? (as we did for wasi-threads)
at least for memory, i implemented it that way in toywasm. i haven't noticed any problems with it so far.

@yamt
Copy link
Contributor Author

yamt commented Aug 8, 2024

The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory.

Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.

isn't it more consistent to make shared libraries export the imported memory as well?
i feel it's better to have less special cases for shared libraries.

@sbc100
Copy link
Member

sbc100 commented Aug 8, 2024

what's wrong with making it both imported and exported? (as we did for wasi-threads)

Nothing stops you from doing this but it seems redundant. As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted? I also have them, right? Its just add unnecessary exports.

At least for emscripten, I can't see a reason to both import and export.

@sbc100
Copy link
Member

sbc100 commented Aug 8, 2024

It seems like a reasonable solution here would be to say "A module must either import or export something called memory which will be the memory that wasi API pointers refer to".

For PIE/dynamic linking its up the embedder to create the memory before loading the module, and to set __memory_base and __table_base accordingly.

@yamt
Copy link
Contributor Author

yamt commented Aug 9, 2024

As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted?

"avoid being different from statically linked modules" is a good enough reason, IMO.

@yamt
Copy link
Contributor Author

yamt commented Aug 9, 2024

The memory for a given program must be defined in exactly one module, from which it is exported. All the other modules then import that memory.
Currently the way this works is the main module always defines and exports the memory and all other modules (i.e. all shared libraries import the memory). In this case libc.so should be importing its memory.

isn't it more consistent to make shared libraries export the imported memory as well? i feel it's better to have less special cases for shared libraries.

eg. otherwise caller.get_export("memory") style host function implementations will be unhappy.

@sbc100
Copy link
Member

sbc100 commented Aug 9, 2024

As the embedder, if I am providing the memory and table as imports on the wasm file, why would I then want them to be re-expoted?

"avoid being different from statically linked modules" is a good enough reason, IMO.

I'm not sure I agree. Adding a little complexity to the host seem better than ask all modules to redundantly export something to the host (if nothing else this add some unnecessary size to every single module).

yamt added a commit to yamt/tool-conventions that referenced this issue Sep 3, 2024
This is my attempt to capture the discussion in
WebAssembly#232
@yamt
Copy link
Contributor Author

yamt commented Sep 3, 2024

@sbc100 i tried to document it in #234. please take a look if/when you have time. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants