Skip to content

Conversation

@afck
Copy link
Contributor

@afck afck commented Nov 17, 2025

Motivation

The current Committee implementation is a bit complicated, and sometimes sets the quorum threshold higher than necessary.

Proposal

The assumption is that $$N \geq 3 f + 1$$, where $$f$$ is the fault tolerance, i.e. the maximum total votes of the faulty validators. The validity threshold is $$f + 1$$. Given $$N$$, the highest possible value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$.

The quorum threshold $$q$$ is minimal such that any two quorum intersect in at least a validity threshold (i.e. have at least one honest validator in common), i.e. $$2 q - N \geq f + 1$$. Thus $$q = \lceil (N + f + 1) / 2 \rceil$$.

In particular, if e.g. $$N = 3$$, then $$f = 0$$ and $$q = 2$$.

This change revealed an issue with the certificate handling logic in the worker: committees_for returns ViewErrors instead of BlobsNotFound/EventsNotFound, so the client fails to send the validators the admin chain if needed.

This was fixed; I also kept minor cleanups I made while debugging this. I got a few stack overflows, so I doubled the client's Tokio thread stack size.

Test Plan

CI

Release Plan

  • The fixes (but maybe not the quorum change) should be backported to testnet_conway and released in a new SDK and validator hotfix.

Links

@afck afck requested review from bart-linera and ma2bd November 17, 2025 17:00
@afck afck changed the title Simplify and optimize quorum calculation. Simplify and optimize quorum calculation; fix blob bug. Nov 19, 2025
self.committees.set(committees);
let admin_id = self

let net_description = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would still be a good idea to have this committee reading logic extracted into a function somewhere, so that we don't have to duplicate all this code wherever we need to read committees... But I guess it's good enough for now, with just two places where we do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll give it a try and deduplicate them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 66c6773. I also addressed @deuszx's comments on #4986 in this commit.

It's a bit painful to convert the errors, and to collect all the missing event IDs and blob IDs, but I guess it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like your approach!

@afck afck added this pull request to the merge queue Nov 19, 2025
Merged via the queue into linera-io:main with commit b654b69 Nov 19, 2025
34 checks passed
@afck afck deleted the quorum branch November 19, 2025 17:23
afck added a commit to afck/linera-protocol that referenced this pull request Nov 19, 2025
The current `Committee` implementation is a bit complicated, and
sometimes sets the quorum threshold higher than necessary.

The assumption is that $$N \geq 3 f + 1$$, where $$f$$ is the fault
tolerance, i.e. the maximum total votes of the faulty validators. The
_validity threshold_ is $$f + 1$$. Given $$N$$, the highest possible
value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$.

The _quorum threshold_ $$q$$ is minimal such that any two quorum
intersect in at least a validity threshold (i.e. have at least one
honest validator in common), i.e. $$2 q - N \geq f + 1$$. Thus $$q =
\lceil (N + f + 1) / 2 \rceil$$.

In particular, if e.g. $$N = 3$$, then $$f = 0$$ and $$q = 2$$.

This change revealed an issue with the certificate handling logic in the
worker: `committees_for` returns `ViewErrors` instead of
`BlobsNotFound`/`EventsNotFound`, so the client fails to send the
validators the admin chain if needed.

This was fixed; I also kept minor cleanups I made while debugging this.
I got a few stack overflows, so I doubled the client's Tokio thread
stack size.

CI

- The fixes (but maybe not the quorum change) should be backported to
`testnet_conway` and released in a new SDK and validator hotfix.

- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
afck added a commit to afck/linera-protocol that referenced this pull request Nov 19, 2025
The current `Committee` implementation is a bit complicated, and
sometimes sets the quorum threshold higher than necessary.

The assumption is that $$N \geq 3 f + 1$$, where $$f$$ is the fault
tolerance, i.e. the maximum total votes of the faulty validators. The
_validity threshold_ is $$f + 1$$. Given $$N$$, the highest possible
value for $$f$$ is therefore $$\lceil N / 3 \rceil - 1$$.

The _quorum threshold_ $$q$$ is minimal such that any two quorum
intersect in at least a validity threshold (i.e. have at least one
honest validator in common), i.e. $$2 q - N \geq f + 1$$. Thus $$q =
\lceil (N + f + 1) / 2 \rceil$$.

In particular, if e.g. $$N = 3$$, then $$f = 0$$ and $$q = 2$$.

This change revealed an issue with the certificate handling logic in the
worker: `committees_for` returns `ViewErrors` instead of
`BlobsNotFound`/`EventsNotFound`, so the client fails to send the
validators the admin chain if needed.

This was fixed; I also kept minor cleanups I made while debugging this.
I got a few stack overflows, so I doubled the client's Tokio thread
stack size.

CI

- The fixes (but maybe not the quorum change) should be backported to
`testnet_conway` and released in a new SDK and validator hotfix.

- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
afck added a commit that referenced this pull request Nov 19, 2025
Partial backport of #4978.

## Motivation

The current `Committee` implementation is a bit complicated, and
sometimes sets the quorum threshold higher than necessary.

## Proposal

#4978 revealed an issue with the certificate handling logic in the
worker: `committees_for` returns `ViewErrors` instead of
`BlobsNotFound`/`EventsNotFound`, so the client fails to send the
validators the admin chain if needed.

This was fixed; I also kept minor cleanups I made while debugging this.
I got a few stack overflows, so I doubled the client's Tokio thread
stack size.

## Test Plan

CI

## Release Plan

- The fixes (but maybe not the quorum change) should be
    - released in a new SDK and
    - released in a validator hotfix.

## Links

- PR to main: #4978
- [reviewer
checklist](https://github.com/linera-io/linera-protocol/blob/main/CONTRIBUTING.md#reviewer-checklist)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants