-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit error strategy #618
Comments
I believe we can broadly categorise node errors into user errors and operation errors. User facing errorsThese are caused by external forces acting upon the node aka some form of user interaction, and require that the failure is communicated back to the user. These are restricted to the RPC/web-interface. Here we need to be careful to not expose internal errors that the user can
Essentially, our RPC and web services errors are split into internal and user errors. User errors are likely limited to
and maybe some others I haven't thought of. Almost everything else would be Operation errorsInternal errors which can imo be split into two categories:
To allow for (1) we may need some error variants, but I suspect we'll find most of our existing operational errors lead to a node shutdown. In my experience it is often better to let components fail completely, and rather invest in a robust restart process. You require such a process in any case to cater for the (2) errors, and deciding on how to recover from (1) in a safe way can often be more trouble than its worth. Proposed approachStart at the top-level binary and work out which errors simply cause node death. An obvious example is our store initialization aka database setup and validation. If that fails there is literally nothing to be done. Replace such errors with |
We should also decide whether to treat each node component as isolated. Or put differently: does the I'm pretty strongly in the the latter camp since we already do things like assume the This implies that certain RPC interfaces are considered fully internal, and don't need to separate internal/user errors. Though we probably should be more specific here since |
Yes, I think we can assume that the same entity would run these components (potentially on different machines) and so they would fully trust each other. |
For those I would especially make sure that they contain helpful details and give the user some idea of what could be done to fix an RPC call, if anything. Not exactly sure how errors translate to RPC, but assuming they're stringified we would probably want to generate a report at this boundary (e.g. using
That's probably a good strategy. These errors might expose internal details of the node or the system the node is running on, which could be considered an information leak and potentially be exploited. |
We currently follow the
thiserror
for libs andanyhow
for binaries rule-of-thumb. However, our crates in this repo are more a way of separating concerns and components -- and are not really intended as libraries (excepting the RPC/proto maybe).This can be seen in our error handling where we almost never match on error variants, but instead wrap and bubble them up. One notable exception is in RPC methods where we need to differentiate between internal and user-facing errors.
We should re-evaluate our error handling strategy. At minimum we could probably merge a bunch of the various error enums e.g.
store
errors are per method, but these could probably just be a single enum.A more extreme approach could be to default to
anyhow::Result
unless required otherwise. RPC errors would likely just be split into external/internal wrappers of anyhow, and if a component becomes interested in a specific error condition then that variant could be added.The text was updated successfully, but these errors were encountered: