Skip to content

[Strings] Make string a subtype of ext, not any #7373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ Current Trunk
- Add an option to preserve imports and exports in the fuzzer (for fuzzer
harnesses where they only want Binaryen to modify their given testcases, not
generate new things in them).
- `string` is now a subtype of `ext` (rather than `any`). This allows better
transformations for strings, like an inverse of StringLowering, but will
error on codebases that depend on being able to pass strings into anyrefs.

v122
----
Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,13 @@ There are a few differences between Binaryen IR and the WebAssembly language:
Binaryen IR is more structured than WebAssembly as noted earlier). Note
that Binaryen does support unreachable code in .wat text files, since as we
saw Binaryen only supports s-expressions there, which are structured.
* Binaryen supports a `stringref` type. This is similar to the currently-
frozen [stringref proposal], with the difference that the string type is a
subtype of `externref` rather than `anyref`. Doing so allows toolchains to
emit code in a form that uses [js string builtins] which Binaryen can then
"lift" into stringref in its internal IR, optimize (for example, a
concatenation of "a" and "b" can be optimized at compile time to "ab"), and
then "lower" that into js string builtins once more.
* Blocks
* Binaryen IR has only one node that contains a variable-length list of
operands: the block. WebAssembly on the other hand allows lists in loops,
Expand Down Expand Up @@ -1039,3 +1046,5 @@ Windows and OS X as most of the core devs are on Linux.
[minification]: https://kripken.github.io/talks/binaryen.html#/2
[unreachable]: https://github.com/WebAssembly/binaryen/issues/903
[binaryen_ir]: https://github.com/WebAssembly/binaryen/issues/663
[stringref proposal]: https://github.com/WebAssembly/stringref/blob/main/proposals/stringref/Overview.md
[js string builtins]: https://github.com/WebAssembly/js-string-builtins/blob/main/proposals/js-string-builtins/Overview.md
1 change: 1 addition & 0 deletions scripts/bundle_clusterfuzz.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@
'--disable-shared-everything',
'--disable-fp16',
'--disable-custom-descriptors',
'--disable-strings',
]

with tarfile.open(output_file, "w:gz") as tar:
Expand Down
1 change: 1 addition & 0 deletions scripts/clusterfuzz/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@
'--disable-shared-everything',
'--disable-fp16',
'--disable-custom-descriptors',
'--disable-strings',
]


Expand Down
13 changes: 8 additions & 5 deletions scripts/fuzz_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,10 +151,12 @@ def randomize_feature_opts():

# The shared-everything feature is new and we want to fuzz it, but it
# also currently disables fuzzing V8, so disable it most of the time.
# Same with custom descriptors.
# Same with custom descriptors and strings - all these cannot be run in
# V8 for now.
if random.random() < 0.9:
FEATURE_OPTS.append('--disable-shared-everything')
FEATURE_OPTS.append('--disable-custom-descriptors')
FEATURE_OPTS.append('--disable-strings')

print('randomized feature opts:', '\n ' + '\n '.join(FEATURE_OPTS))

Expand Down Expand Up @@ -815,8 +817,9 @@ def run(self, wasm, extra_d8_flags=[]):
def can_run(self, wasm):
# V8 does not support shared memories when running with
# shared-everything enabled, so do not fuzz shared-everything
# for now. It also does not yet support custom descriptors.
return all_disallowed(['shared-everything', 'custom-descriptors'])
# for now. It also does not yet support custom descriptors, nor
# strings.
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])

def can_compare_to_self(self):
# With nans, VM differences can confuse us, so only very simple VMs
Expand Down Expand Up @@ -1573,7 +1576,7 @@ def can_run_on_wasm(self, wasm):
return False

# see D8.can_run
return all_disallowed(['shared-everything', 'custom-descriptors'])
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])


# Check that the text format round-trips without error.
Expand Down Expand Up @@ -1758,7 +1761,7 @@ def can_run_on_wasm(self, wasm):
return False
if NANS:
return False
return all_disallowed(['shared-everything', 'custom-descriptors'])
return all_disallowed(['shared-everything', 'custom-descriptors', 'strings'])


# Test --fuzz-preserve-imports-exports, which never modifies imports or exports.
Expand Down
3 changes: 3 additions & 0 deletions src/literal.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ class Literal {
// Externalized i31 references have a gcData containing the internal i31
// reference as its sole value even though internal i31 references do not
// have a gcData.
//
// Note that strings can be internalized, in which case they keep the same
// gcData, but their type becomes anyref.
std::shared_ptr<GCData> gcData;
// A reference to Exn data.
std::shared_ptr<ExnData> exnData;
Expand Down
12 changes: 10 additions & 2 deletions src/tools/execution-results.h
Original file line number Diff line number Diff line change
Expand Up @@ -303,11 +303,19 @@ struct ExecutionResults {
}

void printValue(Literal value) {
// Unwrap an externalized value to get the actual value.
if (Type::isSubType(value.type, Type(HeapType::ext, Nullable))) {
// Unwrap an externalized GC value to get the actual value, but not strings,
// which are normally a subtype of ext.
if (Type::isSubType(value.type, Type(HeapType::ext, Nullable)) &&
!value.type.isString()) {
value = value.internalize();
}

// An anyref literal is a string.
if (value.type.isRef() &&
value.type.getHeapType().isMaybeShared(HeapType::any)) {
value = value.externalize();
}

// Don't print most reference values, as e.g. funcref(N) contains an index,
// which is not guaranteed to remain identical after optimizations. Do not
// print the type in detail (as even that may change due to closed-world
Expand Down
27 changes: 14 additions & 13 deletions src/tools/fuzzing/fuzzing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3399,6 +3399,10 @@ Expression* TranslateToFuzzReader::makeBasicRef(Type type) {
auto share = heapType.getShared();
switch (heapType.getBasic(Unshared)) {
case HeapType::ext: {
if (wasm.features.hasStrings() && share == Unshared && oneIn(2)) {
// Shared strings not yet supported.
return makeConst(Type(HeapType::string, NonNullable));
}
auto null = builder.makeRefNull(HeapTypes::ext.getBasic(share));
// TODO: support actual non-nullable externrefs via imported globals or
// similar.
Expand Down Expand Up @@ -3429,10 +3433,6 @@ Expression* TranslateToFuzzReader::makeBasicRef(Type type) {
HeapType::i31,
HeapType::struct_,
HeapType::array);
if (share == Unshared) {
// Shared strings not yet supported.
subtypeOpts.add(FeatureSet::Strings, HeapType::string);
}
auto subtype = pick(subtypeOpts).getBasic(share);
return makeConst(Type(subtype, nullability));
}
Expand Down Expand Up @@ -5376,11 +5376,16 @@ HeapType TranslateToFuzzReader::getSubType(HeapType type) {
.getBasic(share);
case HeapType::cont:
return pick(HeapTypes::cont, HeapTypes::nocont).getBasic(share);
case HeapType::ext:
return pick(FeatureOptions<HeapType>()
.add(FeatureSet::ReferenceTypes, HeapType::ext)
.add(FeatureSet::GC, HeapType::noext))
.getBasic(share);
case HeapType::ext: {
auto options = FeatureOptions<HeapType>()
.add(FeatureSet::ReferenceTypes, HeapType::ext)
.add(FeatureSet::GC, HeapType::noext);
if (share == Unshared) {
// Shared strings not yet supported.
options.add(FeatureSet::Strings, HeapType::string);
}
return pick(options).getBasic(share);
}
case HeapType::any: {
assert(wasm.features.hasReferenceTypes());
assert(wasm.features.hasGC());
Expand All @@ -5391,10 +5396,6 @@ HeapType TranslateToFuzzReader::getSubType(HeapType type) {
HeapType::struct_,
HeapType::array,
HeapType::none);
if (share == Unshared) {
// Shared strings not yet supported.
options.add(FeatureSet::Strings, HeapType::string);
}
return pick(options).getBasic(share);
}
case HeapType::eq:
Expand Down
2 changes: 1 addition & 1 deletion src/tools/fuzzing/heap-types.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ struct HeapTypeGeneratorImpl {
candidates.push_back(HeapTypes::any.getBasic(share));
break;
case HeapType::string:
candidates.push_back(HeapTypes::any.getBasic(share));
candidates.push_back(HeapTypes::ext.getBasic(share));
break;
case HeapType::none:
return pickSubAny(share);
Expand Down
27 changes: 23 additions & 4 deletions src/wasm/literal.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,13 @@ Literal::Literal(std::shared_ptr<GCData> gcData, HeapType type)
type(type, gcData ? NonNullable : Nullable, gcData ? Inexact : Exact) {
// TODO: Use exact types for more than just nulls.
// The type must be a proper type for GC data: either a struct, array, or
// string; or an externalized version of the same; or a null.
// string; or an externalized version of the same; or a null; or an
// internalized string (which appears as an anyref).
assert((isData() && gcData) ||
(type.isMaybeShared(HeapType::ext) && gcData) ||
(type.isBottom() && !gcData));
(type.isBottom() && !gcData) ||
(type.isMaybeShared(HeapType::any) && gcData &&
gcData->type.isMaybeShared(HeapType::string)));
}

Literal::Literal(std::shared_ptr<ExnData> exnData)
Expand Down Expand Up @@ -153,6 +156,11 @@ Literal::Literal(const Literal& other) : type(other.type) {
case HeapType::nocont:
WASM_UNREACHABLE("null literals should already have been handled");
case HeapType::any:
// This must be an anyref literal, which is an internalized string.
assert(other.gcData &&
other.gcData->type.isMaybeShared(HeapType::string));
new (&gcData) std::shared_ptr<GCData>(other.gcData);
return;
case HeapType::eq:
case HeapType::func:
case HeapType::cont:
Expand All @@ -169,7 +177,8 @@ Literal::~Literal() {
if (type.isBasic()) {
return;
}
if (isNull() || isData() || type.getHeapType().isMaybeShared(HeapType::ext)) {
if (isNull() || isData() || type.getHeapType().isMaybeShared(HeapType::ext) ||
type.getHeapType().isMaybeShared(HeapType::any)) {
gcData.~shared_ptr();
} else if (isExn()) {
exnData.~shared_ptr();
Expand Down Expand Up @@ -652,13 +661,14 @@ std::ostream& operator<<(std::ostream& o, Literal literal) {
case HeapType::exn:
o << "exnref";
break;
case HeapType::any:
case HeapType::eq:
case HeapType::func:
case HeapType::cont:
case HeapType::struct_:
case HeapType::array:
WASM_UNREACHABLE("invalid type");
case HeapType::any:
// Anyref literals contain strings.
case HeapType::string: {
auto data = literal.getGCData();
if (!data) {
Expand Down Expand Up @@ -2868,6 +2878,11 @@ Literal Literal::externalize() const {
return Literal(std::make_shared<GCData>(heapType, Literals{*this}),
extType);
}
if (heapType.isMaybeShared(HeapType::any)) {
// Anyref literals turn into strings (if we add any other anyref literals,
// we will need to be more careful here).
return Literal(gcData, HeapTypes::string.getBasic(share));
}
return Literal(gcData, extType);
}

Expand All @@ -2883,6 +2898,10 @@ Literal Literal::internalize() const {
assert(gcData->values[0].type.getHeapType().isMaybeShared(HeapType::i31));
return gcData->values[0];
}
if (gcData->type.isMaybeShared(HeapType::string)) {
// Strings turn into anyref literals.
return Literal(gcData, HeapTypes::any.getBasic(share));
}
return Literal(gcData, gcData->type);
}

Expand Down
22 changes: 17 additions & 5 deletions src/wasm/wasm-type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,11 @@ std::optional<HeapType> getBasicHeapTypeLUB(HeapType::BasicHeapType a,
HeapType lubUnshared;
switch (HeapType(a).getBasic(Unshared)) {
case HeapType::ext:
if (bUnshared != HeapType::string) {
return std::nullopt;
}
lubUnshared = HeapType::ext;
break;
case HeapType::func:
case HeapType::cont:
case HeapType::exn:
Expand Down Expand Up @@ -437,9 +442,14 @@ std::optional<HeapType> getBasicHeapTypeLUB(HeapType::BasicHeapType a,
}
break;
case HeapType::array:
case HeapType::string:
lubUnshared = HeapType::any;
break;
case HeapType::string:
// String has already been handled: we sorted before in a way that ensures
// the type the string is compared to is of a higher index, which means it
// is a bottom type (string is the last type that is not a bottom type),
// but we have handled the case of either a or b being a bottom type
// earlier already.
case HeapType::none:
case HeapType::noext:
case HeapType::nofunc:
Expand Down Expand Up @@ -953,8 +963,9 @@ std::optional<HeapType> HeapType::getSuperType() const {
case none:
case exn:
case noexn:
case string:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! This doesn't look like it was correct before, since it should have returned any. Do you know how this wasn't caught by tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was just wrong, but it agreed with the code 😄

See line 1683 in the gtest modified in this PR.

return {};
case string:
return HeapType(ext).getBasic(share);
case eq:
return HeapType(any).getBasic(share);
case i31:
Expand Down Expand Up @@ -1021,12 +1032,12 @@ size_t HeapType::getDepth() const {
case HeapType::exn:
break;
case HeapType::eq:
case HeapType::string:
depth++;
break;
case HeapType::i31:
case HeapType::struct_:
case HeapType::array:
case HeapType::string:
depth += 2;
break;
case HeapType::none:
Expand Down Expand Up @@ -1070,9 +1081,9 @@ HeapType::BasicHeapType HeapType::getUnsharedBottom() const {
case i31:
case struct_:
case array:
case string:
case none:
return none;
case string:
case noext:
return noext;
case nofunc:
Expand Down Expand Up @@ -1530,8 +1541,9 @@ bool SubTyper::isSubType(HeapType a, HeapType b) {
aUnshared == HeapType::struct_ || aUnshared == HeapType::array ||
a.isStruct() || a.isArray();
case HeapType::i31:
case HeapType::string:
return aUnshared == HeapType::none;
case HeapType::string:
return aUnshared == HeapType::noext;
case HeapType::struct_:
return aUnshared == HeapType::none || a.isStruct();
case HeapType::array:
Expand Down
Loading