-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Simplify and optimize quorum calculation; fix blob bug. #4978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
linera-execution/src/system.rs
Outdated
| self.committees.set(committees); | ||
| let admin_id = self | ||
|
|
||
| let net_description = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it would still be a good idea to have this committee reading logic extracted into a function somewhere, so that we don't have to duplicate all this code wherever we need to read committees... But I guess it's good enough for now, with just two places where we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll give it a try and deduplicate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like your approach!
The current `Committee` implementation is a bit complicated, and sometimes sets the quorum threshold higher than necessary. The assumption is that $$N \geq 3 f + 1$$, where $$f$$ is the fault tolerance, i.e. the maximum total votes of the faulty validators. The _validity threshold_ is $$f + 1$$. Given $$N$$, the highest possible value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$. The _quorum threshold_ $$q$$ is minimal such that any two quorum intersect in at least a validity threshold (i.e. have at least one honest validator in common), i.e. $$2 q - N \geq f + 1$$. Thus $$q = \lceil (N + f + 1) / 2 \rceil$$. In particular, if e.g. $$N = 3$$, then $$f = 0$$ and $$q = 2$$. This change revealed an issue with the certificate handling logic in the worker: `committees_for` returns `ViewErrors` instead of `BlobsNotFound`/`EventsNotFound`, so the client fails to send the validators the admin chain if needed. This was fixed; I also kept minor cleanups I made while debugging this. I got a few stack overflows, so I doubled the client's Tokio thread stack size. CI - The fixes (but maybe not the quorum change) should be backported to `testnet_conway` and released in a new SDK and validator hotfix. - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
The current `Committee` implementation is a bit complicated, and sometimes sets the quorum threshold higher than necessary. The assumption is that $$N \geq 3 f + 1$$, where $$f$$ is the fault tolerance, i.e. the maximum total votes of the faulty validators. The _validity threshold_ is $$f + 1$$. Given $$N$$, the highest possible value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$. The _quorum threshold_ $$q$$ is minimal such that any two quorum intersect in at least a validity threshold (i.e. have at least one honest validator in common), i.e. $$2 q - N \geq f + 1$$. Thus $$q = \lceil (N + f + 1) / 2 \rceil$$. In particular, if e.g. $$N = 3$$, then $$f = 0$$ and $$q = 2$$. This change revealed an issue with the certificate handling logic in the worker: `committees_for` returns `ViewErrors` instead of `BlobsNotFound`/`EventsNotFound`, so the client fails to send the validators the admin chain if needed. This was fixed; I also kept minor cleanups I made while debugging this. I got a few stack overflows, so I doubled the client's Tokio thread stack size. CI - The fixes (but maybe not the quorum change) should be backported to `testnet_conway` and released in a new SDK and validator hotfix. - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
Partial backport of #4978. ## Motivation The current `Committee` implementation is a bit complicated, and sometimes sets the quorum threshold higher than necessary. ## Proposal #4978 revealed an issue with the certificate handling logic in the worker: `committees_for` returns `ViewErrors` instead of `BlobsNotFound`/`EventsNotFound`, so the client fails to send the validators the admin chain if needed. This was fixed; I also kept minor cleanups I made while debugging this. I got a few stack overflows, so I doubled the client's Tokio thread stack size. ## Test Plan CI ## Release Plan - The fixes (but maybe not the quorum change) should be - released in a new SDK and - released in a validator hotfix. ## Links - PR to main: #4978 - [reviewer checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
Motivation
The current
Committeeimplementation is a bit complicated, and sometimes sets the quorum threshold higher than necessary.Proposal
The assumption is that$$N \geq 3 f + 1$$ , where $$f$$ is the fault tolerance, i.e. the maximum total votes of the faulty validators. The validity threshold is $$f + 1$$ . Given $$N$$ , the highest possible value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$ .
The quorum threshold$$q$$ is minimal such that any two quorum intersect in at least a validity threshold (i.e. have at least one honest validator in common), i.e. $$2 q - N \geq f + 1$$ . Thus $$q = \lceil (N + f + 1) / 2 \rceil$$ .
In particular, if e.g.$$N = 3$$ , then $$f = 0$$ and $$q = 2$$ .
This change revealed an issue with the certificate handling logic in the worker:
committees_forreturnsViewErrorsinstead ofBlobsNotFound/EventsNotFound, so the client fails to send the validators the admin chain if needed.This was fixed; I also kept minor cleanups I made while debugging this. I got a few stack overflows, so I doubled the client's Tokio thread stack size.
Test Plan
CI
Release Plan
testnet_conwayand released in a new SDK and validator hotfix.Links