-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding multiple sections for JavaScript to read #128
Comments
Duplicating the sections feels very weird. It's also probably worse to examine a module with O(N) sections with binary-examining tools like wasm-objdump or llvm's disassembler. We also can't assume gzip will fix our redundancies, because we still need to decode the wasm module into memory. Given that the only data we want to encode here is a list of strings, JSON seems like overkill. And given that we want to run some JS to instrument these specified exports, we already are adding more code than a one-liner. So I think given the constraints, a trivial binary format is the right way to go. Custom section preamble, a single LEB count of how many names are in this section, then for each of the names have an LEB for the name length, followed by that many bytes of string data. |
On one hand, I somewhat agree about the "feels" part, on another, the fact that it's allowed and even has dedicated
One-liner was referring to specifically the decoding part: The actual wrapping is pretty simple in our case to the level where implementing or pulling something for binary decoding would probably end up larger and more complicated than the actual library code, hence JSON as an alternative (since it can at least be decoded natively in JS). |
Agree. I'm just thinking the ratio of section overhead to payload here isn't worth it. Each section will just have a single string name, so for every name we have 11 bytes of extra memory (1 byte for section ID, 1 byte for section length, 1 byte for custom section name length, 8 bytes for "asyncify" string). Ballparking (based on nothing) the average export name length at 10-20 chars, we're using 1.5-2x the space with this scheme. It's possible that that's fine, but the O(N) extra space makes me more nervous than O(1) schemes, whether that's JSON or custom.
Right, I moreso mean "100 bytes of overhead in a 200 byte file is worse than 100 bytes in a 10 MB file". How much work+code is going to matter proportional to the effort+size of the rest of the program.
I think it's pretty trivial if the format is simple enough: (Pulling and tweaking const result = [];
WebAssembly.customSections(m, "asyncify").map(section => {
let idx = 0;
function readLEB() {
const bytes = [];
while (true) {
const byte = section[idx];
idx++;
bytes.push(byte & 0x7f);
if (!(byte & 0x80)) {
break;
}
}
// Bytes are stored little-endian, so read them in significance order
let val = 0;
for (let i = 0; i < bytes.length; ++i) {
const byte = bytes[bytes.length - i - 1];
val <<= 7;
val |= byte & 0x7f;
}
return val;
}
// Actual parsing logic
const countStrings = readLEB();
for (let i = 0; i < countStrings; ++i) {
const len = readLEB();
const b = new Uint8Array(section, idx, len);
idx += len;
result.push(textDecoder.decode(b));
}
}); So because we're talking about bytes, this is 789 characters. Throwing it at a minifier (https://javascript-minifier.com) squishes it down to 340. If this avoids 11 bytes per export, this pays for itself after 33 asyncified exports. This is less zippable, but resident memory impact should be similar. So it boils down to how many asyncified exports we expect on average, and how much we're willing to pay in the small cases. FWIW the only reason I think JSON isn't fine here is because of needing to add escaping logic to the generators (I mean it's also 3 bytes per additional name instead of 1...). It's also possible to just not have that logic in the producer and assert that those characters aren't part of the export names. C++ tools won't generate |
@jgravelle-google These are good points. I agree that the proportion of section name to contents doesn't look great in this case. As I said, my hope is/was that Gzip/Brotli would alleviate the file size concern, and for decoding most parsers would relatively quickly skip over custom sections. That said, I'm getting inclined to change it to a single section myself for this case, just want to do some quick measurements on JSON vs LEB128 approach. |
This discussion originated in WebAssembly/binaryen#2327 and @kripken suggested to discuss with the wider audience to see if there's any consensus.
We want to add a custom tool-specific section(s) that could be easily readable by JavaScript and, in our case, would contain names of exported functions. The JavaScript wrapper would read names of these functions and instrument corresponding exports.
Currently, I'm emitting a same-named custom section (called
"asyncify"
) per each such export. My motivation was:WebAssembly.customSections(module, "asyncify")
API that already allows to easily enumerate all of such sections, which suggests that it wasn't a coincidence and that it's okay to duplicate sections when it makes sense.Other potential alternatives could be either:
Current solution seems to be the best of both worlds in that it's both very easy to encode on C++ side and very easy to decode on JS side, but the only concern is that it might be not very idiomatic to have duplicate sections.
Would love to hear any thoughts.
The text was updated successfully, but these errors were encountered: