Skip to content

Implement changes of function reference proposal #2562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zherczeg
Copy link
Contributor

No description provided.

@zherczeg
Copy link
Contributor Author

@sbc100 @keithw since I don't know enough about the project, these are just some random changes in the parser.

The aim is parsing https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

It is doing something (parsing correctly is surely an exaggeration) with call_ref $ii and ref null $ii. The next item is ref.null $ii. It looks like it will need API changes, since a token type will not be enough. Would adding an index to it be a good idea?

Could you check that these changes makes sense? There is a Type::Reference enum in the code, but function reference introduces two other enums, and does not use Reference. This is strange to me.

Thank you for the help.

@zherczeg
Copy link
Contributor Author

This patch goes to a new direction. The 32 bit type filed of Var is split into two 16 bit fields. The second can be used to store WebAssembly types (negative 7 bit numbers). This way Var can be used in a more generic way.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from c804a98 to c40c098 Compare March 13, 2025 12:53
@zherczeg
Copy link
Contributor Author

There is something I don't understand. There is a Reference = -0x15 in type.h. However, in
https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md
(ref ht) is defined as -0x1c and (ref null ht) is -0x1d. Is the old code obsolete?

The key change of the patch is that Var can also behave like Type, and transformation between the two is possible after names are resolved.

Btw, may I ask how actively maintained the wabt project?

@sbc100
Copy link
Member

sbc100 commented Mar 13, 2025

I believe ht refers to Heap Type which would not be needed for function references. I believe heap types refer to things like structs and arrays in the GC proposal.

@zherczeg
Copy link
Contributor Author

That is possible. Unfortunately not everything is described clearly in the text. For example, ref.null ht: [] -> [(ref null ht)] Which encoding is used for ht here?

https://github.com/WebAssembly/function-references/blob/main/proposals/function-references/Overview.md

@zherczeg
Copy link
Contributor Author

I have tried to track down the v8 implementation for ref.null.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L386

It calls read_heap_type. It is in the same file.

https://github.com/v8/v8/blob/main/src/wasm/function-body-decoder-impl.h#L217

Starts with:
decoder->read_i33v(pc, "heap type");

So it looks like it uses the s33 decoding which the specification mentioned.

@zherczeg zherczeg force-pushed the function_ref branch 5 times, most recently from c94e5a1 to a959acb Compare March 14, 2025 16:37
@zherczeg
Copy link
Contributor Author

I have some good news! This test is fully working now (including the interpreter):
https://github.com/WebAssembly/function-references/blob/main/test/core/call_ref.wast

Only 9 fails remained, they all look like CallRef syntax change related, so some tests need to be updated.

The patch is quite large. There are some parts which are questionable. I will comment about these parts later to help review.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 78596d6 to b79aead Compare March 15, 2025 06:49
@@ -181,7 +181,7 @@ class BinaryReaderLogging : public BinaryReaderDelegate {
Result OnCatchExpr(Index tag_index) override;
Result OnCatchAllExpr() override;
Result OnCallIndirectExpr(Index sig_index, Index table_index) override;
Result OnCallRefExpr() override;
Result OnCallRefExpr(Type sig_type) override;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think type sounds better than Index, although it could be an Index.

Index,
Name,
};

struct Var {
// Var can represent variables or types.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a major internal change. It does not increase the size of Var. I think it is better than constructing a Var/Type pair.


Location loc;

private:
void Destroy();

VarType type_;
// Can be set to Type::Enum types, 0 represent no optional type.
int16_t opt_type_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range should be enough for a long time.

@@ -63,6 +63,7 @@ void WriteS32Leb128(Stream* stream, T value, const char* desc) {
size_t ReadU32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadU64Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
size_t ReadS32Leb128(const uint8_t* p, const uint8_t* end, uint32_t* out_value);
size_t ReadS33Leb128(const uint8_t* p, const uint8_t* end, uint64_t* out_value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec adds this new type, the output is uint64_t. It could be merged with ReadS32Leb128 by adding a bool* argument.

@@ -220,18 +221,6 @@ class SharedValidator {
Result OnUnreachable(const Location&);

private:
struct FuncType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to type checker.

@@ -46,6 +46,8 @@ class Type {
FuncRef = -0x10, // 0x70
ExternRef = -0x11, // 0x6f
Reference = -0x15, // 0x6b
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is completely missing from the proposal. Maybe it was removed at some point? Currently Ref and Reference are considered equal, and randomly used one or the other. I wanted to wait a feedback before doing anything with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v8 it is kStructRefCode:
https://chromium.googlesource.com/v8/v8/+/refs/heads/main/src/wasm/wasm-constants.h#46

Probably used by GC. Probably kArrayRefCode will be needed as well.

Result BinaryReaderLogging::name(Type type) { \
LOGF(#name "(%s)\n", type.GetName().c_str()); \
return reader_->name(type); \
#define DEFINE_TYPE(name) \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this might be reverted. I will check.

@@ -897,7 +897,7 @@ Result BinaryReaderObjdumpDisassemble::OnOpcodeType(Type type) {
if (!in_function_body) {
return Result::Ok;
}
if (current_opcode == Opcode::SelectT) {
if (current_opcode == Opcode::SelectT || current_opcode == Opcode::CallRef) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more opcodes could go here (ref.null?). This is confusing for me, the two toString variants works differently, although they could be merged.

@@ -288,6 +286,40 @@ size_t ReadS32Leb128(const uint8_t* p,
}
}

size_t ReadS33Leb128(const uint8_t* p,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These readers don't check that the minimum amount of bytes is used for encoding. This is important for utf, not sure for leb.

@@ -545,6 +546,17 @@ Result ResolveFuncTypes(Module* module, Errors* errors) {
}

if (func) {
if (!func->local_type_list.empty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a bugfix. Named references has been never resolved for local types. Maybe we could do it on the compressed list, but that can be done in another patch.

@zherczeg zherczeg force-pushed the function_ref branch 4 times, most recently from 6587cf7 to 0f7353f Compare March 16, 2025 04:16
@@ -0,0 +1,19 @@
;;; TOOL: run-interp-spec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out that call_ref.wast is already in the repository, and it is passing now. There are other tests in the directory (e.g. binary.wast) which is also passing now, they could be updated later as well.

@zherczeg zherczeg marked this pull request as ready for review March 16, 2025 04:37
@zherczeg
Copy link
Contributor Author

I think the patch is ready. There are lots of follow-up work ahead.

@@ -189,13 +189,32 @@ void ScriptValidator::PrintError(const Location* loc, const char* format, ...) {
errors_->emplace_back(ErrorLevel::Error, *loc, buffer);
}

static Result CheckType(Type actual, Type expected) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is only used by comparing expected results. This means "module" context is not present, and expected values are immediate values. So maybe an even restricted checker could work.

@zherczeg zherczeg force-pushed the function_ref branch 3 times, most recently from 58411a5 to bf0400f Compare March 18, 2025 05:06
@zherczeg zherczeg force-pushed the function_ref branch 3 times, most recently from 2c100eb to f4dec03 Compare March 25, 2025 13:53
@@ -282,7 +284,7 @@ Result SharedValidator::OnElemSegmentElemType(const Location& loc,
if (elem.is_active) {
// Check that the type of the elem segment matches the table in which
// it is active.
result |= CheckType(loc, elem.table_type, elem_type, "elem segment");
result |= CheckType(loc, elem_type, elem.table_type, "elem segment");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just found this bug. The actual/expected arguments are swapped. The tests are updated accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this is pre-existing bug? If so maybe it can be split out into a separate PR?

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 1ce2902 to 0f93fb0 Compare March 26, 2025 03:34
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 7da12b4 to dae2b75 Compare April 4, 2025 09:52
@zherczeg
Copy link
Contributor Author

zherczeg commented Apr 4, 2025

Patch is rebased on the top of #2571 Thanks to the fixes, it is now < 900 lines of new code including test changes.

@zherczeg zherczeg force-pushed the function_ref branch 3 times, most recently from 3e533cb to 15fb9eb Compare April 4, 2025 13:08
@zherczeg zherczeg force-pushed the function_ref branch 4 times, most recently from 7de4b28 to 494ceb1 Compare April 15, 2025 04:56
@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from 14ba901 to 3ee857b Compare April 17, 2025 09:51
@zherczeg
Copy link
Contributor Author

This patch now fails with a fuzzer error. Reason:

https://github.com/WebAssembly/wabt/blob/main/fuzzers/read_binary_interp_fuzzer.cc#L37
The comment explicitly says that stop_on_first_error must be false.

However:
https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L634
This check can fail now. The failed fuzzer test has a (ref 5) global type, and there is no type 5, so this is correct. Then the global is not added to module_.globals. Furthermore, the binary parsing does not stop as well.

This time the exports are parsed and this line crashes:
https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L634

The validator_.OnExport does not fail, since it thinks 21 globals are available, although the parsing is failed after the first global, and all globals are skipped.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from d655b40 to acc1b66 Compare April 25, 2025 05:36
@zherczeg
Copy link
Contributor Author

I have a question. In this variant of the patch, Globals / Tables / etc. uses Type structures, and the parser may update the named references, and validate the index references. However, call_ref and ref.null uses Var structures. I don't know how types should be handled in general, since many WebAssembly structures uses Var, but types are optionally use references.

@zherczeg zherczeg force-pushed the function_ref branch 2 times, most recently from ea9bd86 to 09523fe Compare April 28, 2025 04:21
@zherczeg
Copy link
Contributor Author

This is the next patch for supporting the function references proposal. It focuses on improving the wabt supported elements of WebAssembly. It is a feature patch now, the bugfixes were extracted and landed as separate patches.

The key concept changes of this patch:

  • Type comparison (type-checker) requires the types hashmap (shared-validator). This circular dependency can be broken by a new type comparison class, which is used by both type-checker and shared-validator.

  • Vars can optionally store types now without increasing its structure size. This simplify passing information, but does not necessary fit for the design concepts of the wabt project.

  • RefNull and CallRef use (the new) Vars, instead of types. Many structures uses Vars in the internal representation, so this is not necessary new. However, using Vars for types is something new and debatable.

Please let me know your opinion about these design concepts. Since the parser can store pointers to types, and check/resolve them later, it is possible to use types only. But using Vars directly is not necessary a problem.

Support named references for globals, locals, tables, elems
Support named references for call_ref, ref_null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants