You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In attempting to use the SP's "reset reason" feature in production for the first time, we've noticed that it has... issues.
Because of the RoT's control of the reset line, the reset reason is almost always "Pin" in practice.
This is because the reporting of the reset reason collapses various potential sources into just one, when in fact it should be a set of reasons.
We also clear the reset reason immediately on boot, which means successive resets lose data. The hardware appears to accumulate reset reasons across reboots, so if we get a series of fast reboots we can learn at least some information about all of them. We probably want to make the clearing conditional or delayed.
There is no way to actually ask a production image for the reset reason. It's only exposed in an unused Idol call, so people have been getting it through humility ipc ... which of course is disabled in prod.
My current perspective is that we should do the following:
Always report the raw hardware reset bits, alongside a bitset of platform-independent interpreted reasons. This way a curious engineer with a datasheet can map them back to hardware behaviors.
Only clear the reset reason when we have reason to believe we're stable. The hackiest way of doing this would be doing it, like, 60 seconds after boot or whatever. A better way would be having the control plane collect and clear the data.
Speaking of which, we need a way of getting that data out over the network. Could be hacked into something like gimlet-inspector for now but ought to be available in a more standardized form. Perhaps an ereport?
The text was updated successfully, but these errors were encountered:
In attempting to use the SP's "reset reason" feature in production for the first time, we've noticed that it has... issues.
humility ipc
... which of course is disabled in prod.My current perspective is that we should do the following:
gimlet-inspector
for now but ought to be available in a more standardized form. Perhaps an ereport?The text was updated successfully, but these errors were encountered: