Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring of state access mechanism #310

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

diegonehab
Copy link
Contributor

@diegonehab diegonehab commented Jan 31, 2025

The guiding principle is to move all complex logic to the machine class, so it is centralized.

One key change is to the way peek works.
It now can peek any number of bytes inside the range, not just pages.
This allows another key change that makes machine::read_memory() fully general.
It can read any physical address range.
This includes going past PMA boundaries and even holes in the address space.
We can now use machine::read_memory() much more freely inside the state access classes.
Also, machine::read_word() has been rewritten on top of machine::read_memory(), so it is also fully general.

By the same token, the new machine::write_word() method is the counterpart to machine::read_word().
The word can be in any writeable area of the address space.
This includes the shadow state and the shadow uarch state.
(But does NOT include memory-mapped devices, the shadow tlb, shadow PMAs, or unoccupied memory regions.)
We can now use machine::read_write() to simplify state access classes.

Added new machine::write_shadow_tlb() and machine::read_shadow_tlb() that hide the complexity of converting between target physical addresses and host addresses.
The machine::write_tlb() does not perform the translation.
State access methods that have a machine simply call them directly.
Initialization of TLB from disk has been simplified to use machine::write_shadow_tlb();

All state access classes that need to reset the uarch now call machine::reset_uarch() directly.
(Other than the replay, of course).
There is a single implementation for it.
Since machine::read_memory() is now generic, the implementation of this reset_uarch() is much simpler in state access classes that need to log the operation.

Moved uarch/uarch-printf.c to uarch/third-party/printf/printf.c to make it clear it's not our code.
Minimized the amount of change to uarch/third-party/printf/printf.c.

I moved the implementation of all ECALLs to a single separate file uarch-ecall.c/h.
They are called ua_halt_ECALL(), ua_putchar_ECALL(), ua_mark_dirty_page_ECALL(), and ua_write_tlb_ECALL().
Changed uarch/third-party/printf/printf.c to use the ua_putchar_ECALL() and uarch-run.cpp to use ua_halt_ECALL().

The new ECALLs, mark_dirty_page and write_tlb, correspond to i_state_access::mark_dirty_page() and
i_state_access::write_tlb().
The ECALLs are issued when uarch is running interpret specialized with machine_uarch_bridge_state_access.
uarch-step.cpp translate the ECALL RISC-V instructions into calls to i_uarch_state_access::mark_dirty_page() and
i_uarch_state_access::write_tlb().
When running in with uarch_state_access or uarch_record_state_access, they forward the calls to machine::mark_dirty_page() and
machine::write_shadow_tlb()
When running with uarch_replay_state_access, i_uarch_state_access::mark_dirty_page() is a NOOP and
i_uarch_state_access::write_tlb() consumes a write access that writes the corresponding slot to the shadow.
This means that the translation to Solidity will have to implement the write_tlb() ECALL properly, but not the
mark_dirty_page() ECALL.

All shadows can now identify what any physical address in it refers too and what its name is.
Shadow state, shadow uarch state, shadow tlb, shadow pmas.
The peeks have been refactored to use a single function in shadow-peek.h.
This receives a functor that, given the physical address, returns the appropriate register.
Each shadow passes a functor that uses the machine to read that register.
This greatly simplified all of them.
The only complex logic is in shadow-peek.h.

Changed strict-aliasing.h to use __builtin_memcpy.
Created a uarch-strict-aliasing.h, on top of strict-aliasing.h, to replace direct dereference of reinterpret_cast volatile pointers.
One function was mandatory: the one that read 32bit words aligned to 16bits, used by fetch_insn() in interpret.
Dereferencing would not work for misaligned accesses.
Even for aligned ones, it was not guaranteed to work.

Added static method machine::get_what_name(paddr) that returns the name of whatever paddr (aligned to word) refers to.
I am not too happy with the name of this method...

Created i-prefer-shadow-state.h (resp. i-prefer-shadow-uarch-state.h)
When a state access inherits from one of these interfaces, it does not need to implement all the read/write register methods typically required.
Instead, it implements read_shadow_state() and write_shadow_state() (resp. read_shadow_uarch_state(), write_shadow_uarch_state()) methods.
These typically use machine::read_word() and machine::write_word() and machine::get_what_name() directly.
As a result, record/replay-step-state-access and the machine-uarch-bridge-state-access, as well as the uarch-record/replay-state-access.h are greatly simplified.

Generalized machine::get_node_hash() to deal with nodes smaller than the page size.
This allows it to be used in uarch-record-state-access.h.
Otherwise we would need to use machine::get_proof() and ignore everything other than the target hash.

Removed uarch-machine class: the machine class took over its duties.

Removed uarch_machine_bridge: we now an simply use machine::write_word() and machine::get_what_name().

Added new poor's man type name class to return names of cstdint types in a way that can be used in the uarch.

Added new pma_peek_pristine() to be reused in drivers. CLINT, PLIC, HTIF, etc.

Made uarch_halt_flag an uint64_t just like every other register we have.
Removed set_/reset_ and changed to read_/write_.

The machine_reg type now uses the constants from shadow-state.h and shadow-uarch-state.h.

I don't see why memory PMAs should be protected from replace_memory_range, so I removed them from the list.

Changed tests/Makefile to allow handpicking NUM_JOBS.
This is good for debugging.

Changed tests/lua/cartesi/tests/util.lua to fail more gracefully when ENV has not been set properly.

Modernized to C++17 type_traits with _v and _t instead of ::value and ::type.

Split out an i-interactive-state-access.h interface with poll_external_interrupts(), get_soft_yield(), and getchar().
interpret.cpp (and device_state_access) only call them inside an if constexpr that checks that they exist using a type trait.
Only the state access classes that need them inherit from the interface.

Split out an i-counters.h interface with increment_counter(), read_counter(), and write_counter().
Added similar methods to the machine class.
interpret.cpp only uses this methods inside an if constexpr that checks if they exist.
state_access inherits from the interface and simply forwards the calls to the machine class.
This replaces the machine_statistics with a single unordered_map where all counters are stored.
There is no machine-statistics.h anymore.

Renamed register access methods in i-uarch-state-access.h so they do not collide with methods in i-state-access.h.
In other words, added the uarch_ prefix to all read/write methods (e.g., read_uarch_x(), read_uarch_halt_flag() etc)

Used macros to define all the register read/write register methods in the state access interfaces.
There is a single implementation in the macro, and a bunch of calls to the macro to define the methods in the interface.

Moved poll_external_interrupts() implementation from state-access.h to machine::poll_external_interrupts().
State-access.h simply calls it.

Changed uarch/Makefile to pass -std=c++20 to C++ code it compiles.

Changed some of the error messages when logs are being verified.
I regret this, because it is kind of a pain to update the tests.
I think we should revisit these messages to make them more uniform, simpler, and helpful.
It will be important when we are debugging real issues.

Added DUMP_STATE_ACCESS and DUMP_UARCH_STATE_ACCESS options
State access classes now know their names, so this dump can tell which one is being called.
These still need to be polished a bit, but they are really useful in debugging.
Makefile now passes all DUMP options when building uarch.

Split out an i-accept-scoped-note.h interface with push_begin_bracket(), push_end_bracket(), and make_scoped_note().
All state access classes can inherit from it.
Doing so enables dumping of brackets automatically, even without "overriding" any of the methods.
Added a new DUMP_SCOPED_NOTE to dump the begin/end events.

Started adding back scoped notes to interpret.cpp.
Changed dump_insn in interpret.cpp to return a scoped note so we again have begin/end for each instruction.
Likewise, uarch now uses dumpInsn that returns a scoped note instead of directly using a scoped note.
Split DUMP_INSN into DUMP_INSN and DUMP_UARCH_INSN so we have finer control.

Simplified the mechanism of scoped notes and bracket notes.
Split push_bracket into push_begin_bracket and push_end_bracket in state access.
The scoped_note is now a separate template class in scoped-note.h.
It forwards calls to push_begin_bracket and push_end_bracket to the state access it controls

Created dump.h with a special D_PRINTF that calls fprintf outside of MICROARCHITECTURE, and printf otherwise.
Replaced all use of fprintf in interpret.cpp with calls to D_PRINTF.
i-state-access now has its own DSA_PRINTF static method that does nothing unless DUMP_STATE_ACCESS is defined.
This is much cleaner than sprinkling ifdefs everywhere in i-state-access.h.
i-uarch-state-access does the same with DUSA_PRINTF and DUMP_UARCH_STATE_ACCESS.
I prefer variadic macros for them, but the linter prefers variadic templates.
I believe the compiler eliminates the dummy variadic template anyway.

Renamed DUMP_HIST to DUMP_INSN_HIST.
Renamed DUMP_COUNTERS to DUMP_STATS.

--------- TODO -----------

We need to talk about how we used to need to add a dummy "read" access before a write "write" that was logged when
recording/replaying.
This was because of Solidity.
What is up with the implementation, in Solidity, of the consumption of a write access to something smaller than the leaf?

We should change PMA peek, read, write, contains etc to deal with physical addresses, rather than offsets.
We often have to subtract pma.get_start() when dealing with these functions, and then add again inside!
This is pretty confusing.
But this is something we have to talk about and agree before I do it, otherwise you guys will go crazy.

We need to centralize all checks against writes to x0 in interpret.cpp to a single function:
template write_x(STATE_ACCESS a, int i, uint64_t val)

We need to use the DUMP_STATE_ACCESS and DUMP_UARCH_STATE_ACCESS to compare what happens when we are recording/replaying vs when we are running full speed.
It would be nice if the same accesses happened, whenever possible.

We should probably allocate a special DID for empty PMAs used as sentinels.
Would simplify some code.

We should add a machine::poll_external_interrupts() so state access can use it directly instead of implementing it there.

We should split the machine::machine constructor into a bunch of different methods.
It is too large and complicated.

We should add all the new methods in the machine class to the C API (and the Lua binding).
This will allow us to test them in CI.
But we should put them in a separate machine-c-api-test.h include.
Otherwise we will burden normal users with the knowledge of their existence.

We need to thoroughly test new machine::read_memory() capabilities.

I am getting tired of reading both _size/_SIZE and _length/_LENGTH.
We should rename all of them to _length/LENGTH once and for all everywhere.
Also, there is log2
<stuff_size> and log2 everywhere.
We should rename all of them to log2_stuff_length everywhere.

Need to add comments to all some of the new functionality.
Since it is in flux, I hold until it isn't.

Still need to refactor replay/record-send-cmio-response

We should probably change the ua_write_tlb_ECALL() to receive the paddr of the slot, rather than a slot set and index.
This will make it more "future proof" when we change design of the TLB.
It will also simplify the implementation in solidity.

@diegonehab diegonehab added the enhancement New feature or request label Jan 31, 2025
@diegonehab diegonehab self-assigned this Jan 31, 2025
@diegonehab diegonehab force-pushed the refactor/uarch-state-accesses branch 6 times, most recently from 0267399 to 060b286 Compare February 5, 2025 14:09
src/access-log.h Show resolved Hide resolved
src/uarch-replay-state-access.h Show resolved Hide resolved
src/i-prefer-shadow-state.h Outdated Show resolved Hide resolved
#include "dump-uarch-state-access.h"
#include "meta.h"
#include "shadow-uarch-state.h"
#include "tlb.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some unnecessary includes here and some other places, maybe later we have to turn on misc-include-cleaner in clang-tidy and clean some of them (may speed up compile times when developing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That thing was broken, right? I agree we can turn it on every so often so we can clean up some of these. But leaving it on doesn't work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, let's not leave it on, but could turn on locally after you finish refactoring, just to remove some unnecessary includes, like tlb.h is unnecessary here. This we way can slightly improve compile time when iterating, but not urgent.

src/i-uarch-state-access.h Outdated Show resolved Hide resolved
uarch/uarch-printf.c Outdated Show resolved Hide resolved
uarch/machine-uarch-bridge-state-access.h Show resolved Hide resolved
/// \returns 0 for success, non zero code for error.
/// \details The entire chunk must be inside the same memory range.
CM_API cm_error cm_read_memory(const cm_machine *m, uint64_t address, uint8_t *data, uint64_t length);
/// \details The data can be anywhere in the entire address space.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need tests for this new API behavior, do we have them?

Copy link
Contributor Author

@diegonehab diegonehab Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. No, we don't have tests. I wrote on the comment above that I still need to add test for the extra functionality.

@diegonehab diegonehab requested a review from edubart February 5, 2025 14:38
@diegonehab diegonehab force-pushed the refactor/uarch-state-accesses branch 4 times, most recently from 1b42f0e to 434e3d1 Compare February 7, 2025 11:05
@diegonehab diegonehab force-pushed the refactor/uarch-state-accesses branch from 434e3d1 to 71dbd0c Compare February 7, 2025 11:37
@diegonehab diegonehab force-pushed the refactor/uarch-state-accesses branch from 71dbd0c to e57e989 Compare February 7, 2025 15:46
// with this program (see COPYING). If not, see <https://www.gnu.org/licenses/>.
//

#include <tuple>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move inside the ifndef section?

#define I_ACCEPT_SCOPED_NOTE_H

/// \file
/// \brief State access interface
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is not the state access interface anymore, the same comment is found in i-interactive-state-access.h, which may also need adjustment.

}
}

// Define read and write methods for each register in the shadow state
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's becoming much easier to add/remove/rename CSRs with these recent refactors 😃

/// \tparam A Type to which \p paddr and \p haddr are known to be aligned.
/// \param faddr Implementation-defined fast address.
/// \param val Value to be written.
/// \details \p haddr is ONLY valid when there is a host machine.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haddr is gone, also there is no mention for pma_index in function and in related read function.

DUMP_DEFS+=-DDUMP_EXCEPTIONS
DUMP_DEFS+=-DDUMP_INTERRUPTS
DUMP_DEFS+=-DDUMP_MMU_EXCEPTIONS
DUMP_DEFS+=-DDUMP_INVALID_MEM_ACCESS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no more DUMP_INVALID_MEM_ACCESS , should remove it from here, or add it back to C++ code?

// In the current implementation, this check is unnecessary
// But we may in the future change the data field to point to independently allocated pages
// This would break the code that uses binary search to find the page based on the address of its data
if (i > 0 && +m_pages[i - 1].data >= +m_pages[i].data) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the + prefix doing anything here?

// We can only do /aligned/ atomic writes.
// Therefore, TLB slot cannot be misaligned.
struct PACKED shadow_tlb_slot {
uint64_t vaddr_page; ///< Tag is target virtual address of start of page
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field names are not tag or value, right? Same applies for tlb.h

auto reg = shadow_tlb_get_what(paddr, set_index, slot_index);
uint64_t val = 0;
if (reg != shadow_tlb_what::unknown_) {
val = m.read_shadow_tlb(set_index, slot_index, reg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using shadow_peek even for shadow TLB was clever.

};

static_assert(sizeof(uint64_t) >= sizeof(uintptr_t), "TLB expects host pointer fit in 64 bits");
//??D why?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to simplify TLB shadow peek. Since you rewrote it, maybe we can relax this constrain. But current shadow peek semantics needs at least to have a minimum size of 8 bytes.

if (reg == shadow_state_what::unknown_) {
throw std::runtime_error("unhandled write to shadow state");
}
if (reg == shadow_state_what::x0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant because write_reg already throws in this special case. Same for all other reg checks bellow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants