-
Notifications
You must be signed in to change notification settings - Fork 19
Add Redundancy to RedMulE #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: astral-hci-v2.1
Are you sure you want to change the base?
Add Redundancy to RedMulE #28
Conversation
af8aaf3
to
23c60df
Compare
fe15913
to
48b2eca
Compare
I moved it now to the refactored branch where a lot of the intermediate changes are no longer visible. If I do any further modifications, then they will most likely affect the streamer, and maybe the vulnerability analysis scripts. |
ba580c6
to
6c8d4a9
Compare
- Buffers now have a REP parameter which sets the number of replicas in the internal FSM. - Replicas of FSM get voted per cycle - Main data-path not replicated.
- Reprogrammed Address generators to fetch each element twice - Changed SW to deal with new multiplication size
- Added voters for "double loaded" data - Added option to synthesize a reduced datapath and create parity bits from it.
- Split control inputs so every even and odd row get another copy if replication is enabled - Added w input parity path and voters.
- Abort the calculation if a fault has happened - Check Regfile for faults
- Moved X Buffer Clock Gating inside X buffer - Replicated X Buffer Clock Gating and adjusted FSM to fix vulnerability.
6c8d4a9
to
3fa5718
Compare
This now includes everything fully finished except the deduplicator, which I would like to merge seperately. The deduplicator only improve performance on the memory side, functionality is the same, however that parts seems to be tricky to get right, so waiting for these additional 30 LOC would delay the whole thing by quite a bit. |
- Automatic fault injection and analysis
04a23d1
to
3d2d762
Compare
This PR adds redundancy to Redmule:
It always implements basic datapath redundancy which can selected in software with redundancy = 1 in the redmule config:
This part has very low data overhead, but only allows for rough redundancy - datapath faults will get detected
but control is generally vulnerable. Also some parts of the W-Input datapath is vulnerable.
It additionally implements control redundancy which can be enabled with the
USE_REDUNDANCY
parameter bit.This introduces about 8% area overhead and achieves a high level of fault tolerance. With the included fault injection scripts
and injecting on any signal with equal likelyhood (e.g. control signals get overstressed compared to faults in the wild, control is typically harder to protect) it results in a correct termination for 99.99% of injected faults. To put that into perspective a non fault-tolerant RedMulE would correctly terminate for about 85% of injected faults due to masking.
Both parts use the existing registers for ECC faults which should be read from software and will abort an operation if a fault is detected to avoid stalls.
The redundancy spheres overlapp with the ECC Encode / Decode and are independent e.g. any combination of memory ECC and Redundancy results in a working design, even though some of them might not be reasonable from a fault-tolerance perspective.
Non-regression tests have been updated to test all combinations of HW and SW parameters.
Currently the soft clear signal in the register file can still cause wrong terminations (stall where fault is detected but no interrupt send in this case), and a lot of the dependencies are not yet reviewed, as such this is a draft PR.