You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This may reduce 1 instruction in the interpreter's hot inner loop for both amd64/arm64 (by using SUB instruction), see https://godbolt.org/z/MvPGYscaP as a PoC. But to do this, I will need to stop propagating mcycle on every memory access instruction, and maybe introduce an mtime CSR that gets incremented every RTC tick, in order to remove the need to propagate mcycle to client device when using rtc_cycle_to_time(a->read_mcycle()).
Furthermore, this will free up a register currently reserved for mcycle_tick_end, making it usable inside the interpreter's hot loop, allowing the optimizer to perform better register allocation inside the hot loop.
When doing this, it's worth experimenting with increasing RTC_FREQ_DIV_DEF from 8192 to 16384, since the interpreter outer loop will start performing a write to mtime every tick. Also, because the interpreter recently got 2x speedups, to the point where time inside the machine is advancing too fast when doing intensive computations, ideally the RTC frequency should have a value that attempts to make time pass closer to what would pass in the host.
This idea is something I've had for a while, and it has been briefly discussed internally. I am writing it down as an issue so I do not forget to attempt it someday.
The text was updated successfully, but these errors were encountered:
Context
Currently the interpreter hot loop does:
But it could be simplified to something like:
This may reduce 1 instruction in the interpreter's hot inner loop for both
amd64
/arm64
(by usingSUB
instruction), see https://godbolt.org/z/MvPGYscaP as a PoC. But to do this, I will need to stop propagatingmcycle
on every memory access instruction, and maybe introduce anmtime
CSR that gets incremented every RTC tick, in order to remove the need to propagatemcycle
to client device when usingrtc_cycle_to_time(a->read_mcycle())
.Furthermore, this will free up a register currently reserved for
mcycle_tick_end,
making it usable inside the interpreter's hot loop, allowing the optimizer to perform better register allocation inside the hot loop.When doing this, it's worth experimenting with increasing
RTC_FREQ_DIV_DEF
from8192
to16384
, since the interpreter outer loop will start performing a write tomtime
every tick. Also, because the interpreter recently got 2x speedups, to the point where time inside the machine is advancing too fast when doing intensive computations, ideally the RTC frequency should have a value that attempts to make time pass closer to what would pass in the host.This idea is something I've had for a while, and it has been briefly discussed internally. I am writing it down as an issue so I do not forget to attempt it someday.
The text was updated successfully, but these errors were encountered: