Skip to content

Fix: Regression in benchmarks JetStream, ARES, speedometer #1691#1695

Open
justinmichaud wants to merge 1 commit into
WebPlatformForEmbedded:wpe-2.46from
justinmichaud:eng/2.46-perf-fixes
Open

Fix: Regression in benchmarks JetStream, ARES, speedometer #1691#1695
justinmichaud wants to merge 1 commit into
WebPlatformForEmbedded:wpe-2.46from
justinmichaud:eng/2.46-perf-fixes

Conversation

@justinmichaud

@justinmichaud justinmichaud commented Jul 2, 2026

Copy link
Copy Markdown

#1691

This patch makes some simple fixes to improve 32-bit performance on 2.46.

  1. Disable concurrent JIT if we cannot write 64-bit values atomically.

Most of the chips we run on have no problem doing 64-bit atomic writes, and in that case, we completely recover the perf lost from enabling concurrent JIT (and in fact gain perf).

We detect LPAE, and if it is present, skip the store barriers that were previously needed to guarantee we didn't dereference a bad cell.

  1. Clear value profiles

The 0 value is not the empty JSValue, poluting our profiles. We manually clear them, and our profiling works much better.

  1. Tune various options

I tuned many thresholds based on what worked on my machine. This probably requires a second round of tuning on a smaller device.

  1. Turn off SIMD

SIMD::find is a regression on 32-bit due to some fallback paths.

Overall, on my Neoverse N1, I get that after this patch, 2.46 is 26% faster than 2.38 when excluding wasm subtests. There are still some regressed subtests worth investigating, with as much as an 11% regression.

93f3188

Build-Tests Layout-Tests
✅ 🛠 wpe-246-amd64-build ✅ 🧪 wpe-246-amd64-layout
✅ 🛠 wpe-246-arm32-build ❌ 🧪 wpe-246-arm32-layout

…mForEmbedded#1691

This patch makes some simple fixes to improve 32-bit performance on 2.46.

1) Disable concurrent JIT if we cannot write 64-bit values atomically.

Most of the chips we run on have no problem doing 64-bit atomic writes,
and in that case, we completely recover the perf lost from enabling
concurrent JIT (and in fact gain perf).

We detect LPAE, and if it is present, skip the store barriers that
were previously needed to guarantee we didn't dereference a bad cell.

2) Clear value profiles

The 0 value is not the empty JSValue, poluting our profiles. We manually
clear them, and our profiling works much better.

3) Tune various options

I tuned many thresholds based on what worked on my machine. This probably
requires a second round of tuning on a smaller device.

4) Turn off SIMD

SIMD::find is a regression on 32-bit due to some fallback paths.

Overall, on my Neoverse N1, I get that after this patch, 2.46 is 26% faster
than 2.38 when excluding wasm subtests. There are still some regressed
subtests worth investigating, with as much as an 11% regression.
finish(dst.withOffset(TagOffset));
// CJIT is only enabled when LPAE is enabled (such as for armv8l). In this case,
// 64-bit aligned stores are atomic: https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Application-Level-Memory-Model/Memory-types-and-attributes-and-the-memory-order-model/Atomicity-in-the-ARM-architecture
// > In an implementation that includes the Large Physical Address Extension, LDRD and STRD accesses to 64-bit aligned locations are 64-bit single-copy atomic as seen by translation table walks and accesses to translation tables.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think the part we want to quote here is

The system designer must ensure that all writable memory locations that might be used to hold translations, such as bulk SDRAM, can be accessed with 64-bit single-copy atomicity.

(later in the same document)

});
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch.

@aoikonomopoulos

Copy link
Copy Markdown

Looks great to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants