Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opportunity to save 1 instruction from mcycle checks #308

Open
edubart opened this issue Jan 25, 2025 · 1 comment · May be fixed by #309
Open

Opportunity to save 1 instruction from mcycle checks #308

edubart opened this issue Jan 25, 2025 · 1 comment · May be fixed by #309
Assignees
Labels
optimization Optimization

Comments

@edubart
Copy link
Collaborator

edubart commented Jan 25, 2025

Context

Currently the interpreter hot loop does:

while (mcycle < mcycle_tick_end) {
    // Fetch, decode, execute
    mcycle++;
}

But it could be simplified to something like:

uint64_t remaining = mcycle_tick_end - mcycle;
mcycle += remaining;
for (; remaining > 0; remaining--) {
    // Fetch, decode, execute
}

This may reduce 1 instruction in the interpreter's hot inner loop for both amd64/arm64 (by using SUB instruction), see https://godbolt.org/z/MvPGYscaP as a PoC. But to do this, I will need to stop propagating mcycle on every memory access instruction, and maybe introduce an mtime CSR that gets incremented every RTC tick, in order to remove the need to propagate mcycle to client device when using rtc_cycle_to_time(a->read_mcycle()).

Furthermore, this will free up a register currently reserved for mcycle_tick_end, making it usable inside the interpreter's hot loop, allowing the optimizer to perform better register allocation inside the hot loop.

When doing this, it's worth experimenting with increasing RTC_FREQ_DIV_DEF from 8192 to 16384, since the interpreter outer loop will start performing a write to mtime every tick. Also, because the interpreter recently got 2x speedups, to the point where time inside the machine is advancing too fast when doing intensive computations, ideally the RTC frequency should have a value that attempts to make time pass closer to what would pass in the host.

This idea is something I've had for a while, and it has been briefly discussed internally. I am writing it down as an issue so I do not forget to attempt it someday.

@edubart
Copy link
Collaborator Author

edubart commented Jan 25, 2025

This is also directly related to #104

@edubart edubart linked a pull request Jan 31, 2025 that will close this issue
@edubart edubart linked a pull request Jan 31, 2025 that will close this issue
@edubart edubart moved this from Todo to PR Available in Machine Emulator SDK Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Optimization
Projects
Status: PR Available
Development

Successfully merging a pull request may close this issue.

1 participant