-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major refactoring of state access mechanism #310
base: main
Are you sure you want to change the base?
Conversation
0267399
to
060b286
Compare
#include "dump-uarch-state-access.h" | ||
#include "meta.h" | ||
#include "shadow-uarch-state.h" | ||
#include "tlb.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some unnecessary includes here and some other places, maybe later we have to turn on misc-include-cleaner
in clang-tidy
and clean some of them (may speed up compile times when developing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That thing was broken, right? I agree we can turn it on every so often so we can clean up some of these. But leaving it on doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, let's not leave it on, but could turn on locally after you finish refactoring, just to remove some unnecessary includes, like tlb.h
is unnecessary here. This we way can slightly improve compile time when iterating, but not urgent.
/// \returns 0 for success, non zero code for error. | ||
/// \details The entire chunk must be inside the same memory range. | ||
CM_API cm_error cm_read_memory(const cm_machine *m, uint64_t address, uint8_t *data, uint64_t length); | ||
/// \details The data can be anywhere in the entire address space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need tests for this new API behavior, do we have them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. No, we don't have tests. I wrote on the comment above that I still need to add test for the extra functionality.
Using __builtin_memcpy instead.
The machine class took over the role.
1b42f0e
to
434e3d1
Compare
And also i-prefer-uarch-state-access.h
Cleanup several of the DUMP_*.* implementations
434e3d1
to
71dbd0c
Compare
71dbd0c
to
e57e989
Compare
// with this program (see COPYING). If not, see <https://www.gnu.org/licenses/>. | ||
// | ||
|
||
#include <tuple> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move inside the ifndef
section?
#define I_ACCEPT_SCOPED_NOTE_H | ||
|
||
/// \file | ||
/// \brief State access interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is not the state access interface anymore, the same comment is found in i-interactive-state-access.h
, which may also need adjustment.
} | ||
} | ||
|
||
// Define read and write methods for each register in the shadow state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's becoming much easier to add/remove/rename CSRs with these recent refactors 😃
/// \tparam A Type to which \p paddr and \p haddr are known to be aligned. | ||
/// \param faddr Implementation-defined fast address. | ||
/// \param val Value to be written. | ||
/// \details \p haddr is ONLY valid when there is a host machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haddr
is gone, also there is no mention for pma_index
in function and in related read function.
DUMP_DEFS+=-DDUMP_EXCEPTIONS | ||
DUMP_DEFS+=-DDUMP_INTERRUPTS | ||
DUMP_DEFS+=-DDUMP_MMU_EXCEPTIONS | ||
DUMP_DEFS+=-DDUMP_INVALID_MEM_ACCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no more DUMP_INVALID_MEM_ACCESS
, should remove it from here, or add it back to C++ code?
// In the current implementation, this check is unnecessary | ||
// But we may in the future change the data field to point to independently allocated pages | ||
// This would break the code that uses binary search to find the page based on the address of its data | ||
if (i > 0 && +m_pages[i - 1].data >= +m_pages[i].data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the +
prefix doing anything here?
// We can only do /aligned/ atomic writes. | ||
// Therefore, TLB slot cannot be misaligned. | ||
struct PACKED shadow_tlb_slot { | ||
uint64_t vaddr_page; ///< Tag is target virtual address of start of page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field names are not tag
or value
, right? Same applies for tlb.h
auto reg = shadow_tlb_get_what(paddr, set_index, slot_index); | ||
uint64_t val = 0; | ||
if (reg != shadow_tlb_what::unknown_) { | ||
val = m.read_shadow_tlb(set_index, slot_index, reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using shadow_peek even for shadow TLB was clever.
}; | ||
|
||
static_assert(sizeof(uint64_t) >= sizeof(uintptr_t), "TLB expects host pointer fit in 64 bits"); | ||
//??D why? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was to simplify TLB shadow peek. Since you rewrote it, maybe we can relax this constrain. But current shadow peek semantics needs at least to have a minimum size of 8 bytes.
if (reg == shadow_state_what::unknown_) { | ||
throw std::runtime_error("unhandled write to shadow state"); | ||
} | ||
if (reg == shadow_state_what::x0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is redundant because write_reg
already throws in this special case. Same for all other reg checks bellow.
The guiding principle is to move all complex logic to the machine class, so it is centralized.
One key change is to the way peek works.
It now can peek any number of bytes inside the range, not just pages.
This allows another key change that makes machine::read_memory() fully general.
It can read any physical address range.
This includes going past PMA boundaries and even holes in the address space.
We can now use machine::read_memory() much more freely inside the state access classes.
Also, machine::read_word() has been rewritten on top of machine::read_memory(), so it is also fully general.
By the same token, the new machine::write_word() method is the counterpart to machine::read_word().
The word can be in any writeable area of the address space.
This includes the shadow state and the shadow uarch state.
(But does NOT include memory-mapped devices, the shadow tlb, shadow PMAs, or unoccupied memory regions.)
We can now use machine::read_write() to simplify state access classes.
Added new machine::write_shadow_tlb() and machine::read_shadow_tlb() that hide the complexity of converting between target physical addresses and host addresses.
The machine::write_tlb() does not perform the translation.
State access methods that have a machine simply call them directly.
Initialization of TLB from disk has been simplified to use machine::write_shadow_tlb();
All state access classes that need to reset the uarch now call machine::reset_uarch() directly.
(Other than the replay, of course).
There is a single implementation for it.
Since machine::read_memory() is now generic, the implementation of this reset_uarch() is much simpler in state access classes that need to log the operation.
Moved uarch/uarch-printf.c to uarch/third-party/printf/printf.c to make it clear it's not our code.
Minimized the amount of change to uarch/third-party/printf/printf.c.
I moved the implementation of all ECALLs to a single separate file uarch-ecall.c/h.
They are called ua_halt_ECALL(), ua_putchar_ECALL(), ua_mark_dirty_page_ECALL(), and ua_write_tlb_ECALL().
Changed uarch/third-party/printf/printf.c to use the ua_putchar_ECALL() and uarch-run.cpp to use ua_halt_ECALL().
The new ECALLs, mark_dirty_page and write_tlb, correspond to i_state_access::mark_dirty_page() and
i_state_access::write_tlb().
The ECALLs are issued when uarch is running interpret specialized with machine_uarch_bridge_state_access.
uarch-step.cpp translate the ECALL RISC-V instructions into calls to i_uarch_state_access::mark_dirty_page() and
i_uarch_state_access::write_tlb().
When running in with uarch_state_access or uarch_record_state_access, they forward the calls to machine::mark_dirty_page() and
machine::write_shadow_tlb()
When running with uarch_replay_state_access, i_uarch_state_access::mark_dirty_page() is a NOOP and
i_uarch_state_access::write_tlb() consumes a write access that writes the corresponding slot to the shadow.
This means that the translation to Solidity will have to implement the write_tlb() ECALL properly, but not the
mark_dirty_page() ECALL.
All shadows can now identify what any physical address in it refers too and what its name is.
Shadow state, shadow uarch state, shadow tlb, shadow pmas.
The peeks have been refactored to use a single function in shadow-peek.h.
This receives a functor that, given the physical address, returns the appropriate register.
Each shadow passes a functor that uses the machine to read that register.
This greatly simplified all of them.
The only complex logic is in shadow-peek.h.
Changed strict-aliasing.h to use __builtin_memcpy.
Created a uarch-strict-aliasing.h, on top of strict-aliasing.h, to replace direct dereference of reinterpret_cast volatile pointers.
One function was mandatory: the one that read 32bit words aligned to 16bits, used by fetch_insn() in interpret.
Dereferencing would not work for misaligned accesses.
Even for aligned ones, it was not guaranteed to work.
Added static method machine::get_what_name(paddr) that returns the name of whatever paddr (aligned to word) refers to.
I am not too happy with the name of this method...
Created i-prefer-shadow-state.h (resp. i-prefer-shadow-uarch-state.h)
When a state access inherits from one of these interfaces, it does not need to implement all the read/write register methods typically required.
Instead, it implements read_shadow_state() and write_shadow_state() (resp. read_shadow_uarch_state(), write_shadow_uarch_state()) methods.
These typically use machine::read_word() and machine::write_word() and machine::get_what_name() directly.
As a result, record/replay-step-state-access and the machine-uarch-bridge-state-access, as well as the uarch-record/replay-state-access.h are greatly simplified.
Generalized machine::get_node_hash() to deal with nodes smaller than the page size.
This allows it to be used in uarch-record-state-access.h.
Otherwise we would need to use machine::get_proof() and ignore everything other than the target hash.
Removed uarch-machine class: the machine class took over its duties.
Removed uarch_machine_bridge: we now an simply use machine::write_word() and machine::get_what_name().
Added new poor's man type name class to return names of cstdint types in a way that can be used in the uarch.
Added new pma_peek_pristine() to be reused in drivers. CLINT, PLIC, HTIF, etc.
Made uarch_halt_flag an uint64_t just like every other register we have.
Removed set_/reset_ and changed to read_/write_.
The machine_reg type now uses the constants from shadow-state.h and shadow-uarch-state.h.
I don't see why memory PMAs should be protected from replace_memory_range, so I removed them from the list.
Changed tests/Makefile to allow handpicking NUM_JOBS.
This is good for debugging.
Changed tests/lua/cartesi/tests/util.lua to fail more gracefully when ENV has not been set properly.
Modernized to C++17 type_traits with _v and _t instead of ::value and ::type.
Split out an i-interactive-state-access.h interface with poll_external_interrupts(), get_soft_yield(), and getchar().
interpret.cpp (and device_state_access) only call them inside an if constexpr that checks that they exist using a type trait.
Only the state access classes that need them inherit from the interface.
Split out an i-counters.h interface with increment_counter(), read_counter(), and write_counter().
Added similar methods to the machine class.
interpret.cpp only uses this methods inside an if constexpr that checks if they exist.
state_access inherits from the interface and simply forwards the calls to the machine class.
This replaces the machine_statistics with a single unordered_map where all counters are stored.
There is no machine-statistics.h anymore.
Renamed register access methods in i-uarch-state-access.h so they do not collide with methods in i-state-access.h.
In other words, added the uarch_ prefix to all read/write methods (e.g., read_uarch_x(), read_uarch_halt_flag() etc)
Used macros to define all the register read/write register methods in the state access interfaces.
There is a single implementation in the macro, and a bunch of calls to the macro to define the methods in the interface.
Moved poll_external_interrupts() implementation from state-access.h to machine::poll_external_interrupts().
State-access.h simply calls it.
Changed uarch/Makefile to pass -std=c++20 to C++ code it compiles.
Changed some of the error messages when logs are being verified.
I regret this, because it is kind of a pain to update the tests.
I think we should revisit these messages to make them more uniform, simpler, and helpful.
It will be important when we are debugging real issues.
Added DUMP_STATE_ACCESS and DUMP_UARCH_STATE_ACCESS options
State access classes now know their names, so this dump can tell which one is being called.
These still need to be polished a bit, but they are really useful in debugging.
Makefile now passes all DUMP options when building uarch.
Split out an i-accept-scoped-note.h interface with push_begin_bracket(), push_end_bracket(), and make_scoped_note().
All state access classes can inherit from it.
Doing so enables dumping of brackets automatically, even without "overriding" any of the methods.
Added a new DUMP_SCOPED_NOTE to dump the begin/end events.
Started adding back scoped notes to interpret.cpp.
Changed dump_insn in interpret.cpp to return a scoped note so we again have begin/end for each instruction.
Likewise, uarch now uses dumpInsn that returns a scoped note instead of directly using a scoped note.
Split DUMP_INSN into DUMP_INSN and DUMP_UARCH_INSN so we have finer control.
Simplified the mechanism of scoped notes and bracket notes.
Split push_bracket into push_begin_bracket and push_end_bracket in state access.
The scoped_note is now a separate template class in scoped-note.h.
It forwards calls to push_begin_bracket and push_end_bracket to the state access it controls
Created dump.h with a special D_PRINTF that calls fprintf outside of MICROARCHITECTURE, and printf otherwise.
Replaced all use of fprintf in interpret.cpp with calls to D_PRINTF.
i-state-access now has its own DSA_PRINTF static method that does nothing unless DUMP_STATE_ACCESS is defined.
This is much cleaner than sprinkling ifdefs everywhere in i-state-access.h.
i-uarch-state-access does the same with DUSA_PRINTF and DUMP_UARCH_STATE_ACCESS.
I prefer variadic macros for them, but the linter prefers variadic templates.
I believe the compiler eliminates the dummy variadic template anyway.
Renamed DUMP_HIST to DUMP_INSN_HIST.
Renamed DUMP_COUNTERS to DUMP_STATS.
--------- TODO -----------
We need to talk about how we used to need to add a dummy "read" access before a write "write" that was logged when
recording/replaying.
This was because of Solidity.
What is up with the implementation, in Solidity, of the consumption of a write access to something smaller than the leaf?
We should change PMA peek, read, write, contains etc to deal with physical addresses, rather than offsets.
We often have to subtract pma.get_start() when dealing with these functions, and then add again inside!
This is pretty confusing.
But this is something we have to talk about and agree before I do it, otherwise you guys will go crazy.
We need to centralize all checks against writes to x0 in interpret.cpp to a single function:
template write_x(STATE_ACCESS a, int i, uint64_t val)
We need to use the DUMP_STATE_ACCESS and DUMP_UARCH_STATE_ACCESS to compare what happens when we are recording/replaying vs when we are running full speed.
It would be nice if the same accesses happened, whenever possible.
We should probably allocate a special DID for empty PMAs used as sentinels.
Would simplify some code.
We should add a machine::poll_external_interrupts() so state access can use it directly instead of implementing it there.
We should split the machine::machine constructor into a bunch of different methods.
It is too large and complicated.
We should add all the new methods in the machine class to the C API (and the Lua binding).
This will allow us to test them in CI.
But we should put them in a separate machine-c-api-test.h include.
Otherwise we will burden normal users with the knowledge of their existence.
We need to thoroughly test new machine::read_memory() capabilities.
I am getting tired of reading both _size/_SIZE and _length/_LENGTH.
We should rename all of them to _length/LENGTH once and for all everywhere.
Also, there is log2<stuff_size> and log2 everywhere.
We should rename all of them to log2_stuff_length everywhere.
Need to add comments to all some of the new functionality.
Since it is in flux, I hold until it isn't.
Still need to refactor replay/record-send-cmio-response
We should probably change the ua_write_tlb_ECALL() to receive the paddr of the slot, rather than a slot set and index.
This will make it more "future proof" when we change design of the TLB.
It will also simplify the implementation in solidity.