-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Design
When ZJIT pushes an interpreter frame, it should write only one metadata pointer, and others should be lazily materialized.
Frame push
ZJIT should bump ec->cfp as usual, but only write the address of a native stack slot that points to the return address into cfp->jit_return.
- x86_64:
- Save the address of the stack slot pushed by
callinstruction, which has the return address.
- Save the address of the stack slot pushed by
- arm64:
- For JIT-to-JIT calls, save the address of the stack slot pushed by the callee's
Insn::FrameSetup, which saves the link register. - For any other C calls, write the return address somewhere in the caller's native stack, and save the address of the stack slot.
- For JIT-to-JIT calls, save the address of the stack slot pushed by the callee's
Prepare for calls
gen_prepare_call_with_gc and gen_prepare_non_leaf_call should update cfp->jit_return instead of saving PC/SP and spilling stack slots and locals.
How to materialize
When cfp->jit_return is not zero, ZJIT should retrieve compile-time metadata from it as follows:
- Read
cfp->jit_returnto get the return address. Look up a{ return_address => metadata }hash table to get metadata for the callsite.- The metadata should contain: PC, ISEQ, stack size, cme, env flags, location of self, type/location of specval
- The metadata should have the offset from the
cfp->jit_returnaddress to the frame's base pointer.- Using this base pointer and offsets in the metadata, ZJIT should be able to discover stack slots and locals from the native stack.
ZJIT should fully materialize the frame and set 0 to cfp->jit_return when it hands over the frame's execution to the interpreter. Otherwise, it may just query metadata and leave the frame un-materialized, e.g. for showing backtraces.
When to materialize
The frame metadata is supposed to be queried in the following conditions:
- On-Stack Replacement: An exception is raised, the longjmp expired a JIT frame, and the interpreter takes over the execution of the
cfp. - Backtraces: An exception is raised,
Kernel#calleris called, orrb_profile_framesis used by a profiler. - Binding:
rb_debug_inspectorAPI is used, and the Binding of a JIT frame is dynamically accessed.
Open questions
- When a C function pushes a frame on top of a lightweight frame, can we leave the lightweight frame unmaterialized?
- Do we need to reserve the VM stack slots (
VM_ENV_DATA_SIZE+ stack size) so that we won't need to move the next frame's env, which might be referenced by pointers on stack, when actually materializing the lightweight frame?
- Do we need to reserve the VM stack slots (
Prior art
Lazy frame push
This is what we successfully merged to YJIT and still exists in Ruby master. Unlike lightweight frames, it does not push a frame (does not bump ec->cfp) before the call, and lazily push the frame on rb_yjit_lazy_push_frame using the metadata queried by cfp->pc when the callee method is about to raise an exception.
In lightweight frames, because we intend to bump ec->cfp, we shouldn't need to do anything as of rb_yjit_lazy_push_frame, which would hopefully eliminate the check overhead in those places. When it actually queries backtraces to raise an exception, the frame metadata should be queried to retrieve line numbers for lightweight frames.
Frame outlining
This is what Alan and I experimented with in 2023. We used a tagged pointer in cfp->pc to mark it as an "outlined" frame. Every read of cfp->pc, cfp->sp, or cfp->iseq had a branch on whether cfp->pc is tagged or not. If it's a tagged pointer, it points to frame metadata to materialize the outlined frame. Because we made every read of pc/sp/iseq slower, the interpreter became slower. So we gave it up.
Unlike frame outlining, the idea of lightweight frames is to optimistically skip the cfp->jit_return check on most cfp reads to avoid the interpreter slowdown (we should have assertions on the debug mode) and check cfp->jit_return on the above "When to materialize" conditions. Hopefully, we will not need to add materialization checks in places that make the interpreter as slow as frame outlining.