Skip to content

Conversation

Lynx005F
Copy link

This PR adds redundancy to Redmule:

  1. It always implements basic datapath redundancy which can selected in software with redundancy = 1 in the redmule config:

    • For the compute elements, two rows of compute elements "mirror" the same computation
    • For the FIFOs and Buffers, data is replicated serially (in time) or parallelly depending on the location
    • Both of these are compared with a single output comparator of one data with of FFs
    • Parity bits are stored in the register file to ensure it does not get corrupted in between SW writes and HW reads.

    This part has very low data overhead, but only allows for rough redundancy - datapath faults will get detected
    but control is generally vulnerable. Also some parts of the W-Input datapath is vulnerable.

  2. It additionally implements control redundancy which can be enabled with the USE_REDUNDANCY parameter bit.

    • The redmule_scheduler and redmule_controller FSMs are replicated and outputs compared
    • The HCI / HWPE Modules (Muxes / Fifos) are replicated with smaller data-path on the replica, allowing their control FSMs to als be protected
    • For the vulnerable parts off W-Input datapath parity bits or full duplication is used to ensure faults can be detected

    This introduces about 8% area overhead and achieves a high level of fault tolerance. With the included fault injection scripts
    and injecting on any signal with equal likelyhood (e.g. control signals get overstressed compared to faults in the wild, control is typically harder to protect) it results in a correct termination for 99.99% of injected faults. To put that into perspective a non fault-tolerant RedMulE would correctly terminate for about 85% of injected faults due to masking.

Both parts use the existing registers for ECC faults which should be read from software and will abort an operation if a fault is detected to avoid stalls.

The redundancy spheres overlapp with the ECC Encode / Decode and are independent e.g. any combination of memory ECC and Redundancy results in a working design, even though some of them might not be reasonable from a fault-tolerance perspective.
Non-regression tests have been updated to test all combinations of HW and SW parameters.

Currently the soft clear signal in the register file can still cause wrong terminations (stall where fault is detected but no interrupt send in this case), and a lot of the dependencies are not yet reviewed, as such this is a draft PR.

@Lynx005F Lynx005F force-pushed the itemm/redundancy branch 6 times, most recently from af8aaf3 to 23c60df Compare July 4, 2024 17:19
@Lynx005F Lynx005F force-pushed the itemm/redundancy branch 3 times, most recently from fe15913 to 48b2eca Compare July 9, 2024 13:23
@Lynx005F
Copy link
Author

Lynx005F commented Jul 9, 2024

I moved it now to the refactored branch where a lot of the intermediate changes are no longer visible. If I do any further modifications, then they will most likely affect the streamer, and maybe the vulnerability analysis scripts.

@Lynx005F Lynx005F force-pushed the itemm/redundancy branch 2 times, most recently from ba580c6 to 6c8d4a9 Compare July 12, 2024 13:24
Maurus Item added 11 commits July 16, 2024 18:43
- Buffers now have a REP parameter which sets the number of replicas in the internal FSM.
- Replicas of FSM get voted per cycle
- Main data-path not replicated.
- Reprogrammed Address generators to fetch each element twice
- Changed SW to deal with new multiplication size
- Added voters for "double loaded" data
- Added option to synthesize a reduced datapath and create parity bits from it.
- Split control inputs so every even and odd row get another copy if replication is enabled
- Added w input parity path and voters.
- Abort the calculation if a fault has happened
- Check Regfile for faults
- Moved X Buffer Clock Gating inside X buffer
- Replicated X Buffer Clock Gating and adjusted FSM to fix vulnerability.
@Lynx005F
Copy link
Author

This now includes everything fully finished except the deduplicator, which I would like to merge seperately.

The deduplicator only improve performance on the memory side, functionality is the same, however that parts seems to be tricky to get right, so waiting for these additional 30 LOC would delay the whole thing by quite a bit.

@Lynx005F Lynx005F marked this pull request as ready for review July 30, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant