Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not pass value to instance #371

Open
wants to merge 30 commits into
base: alan
Choose a base branch
from

Conversation

GalRogozinski
Copy link
Contributor

@GalRogozinski GalRogozinski commented Mar 24, 2024

Description

Instance will now fetch consensus data only if it is the round leader.
Timer starts before data is fetched to ensure it is synchronized across committee nodes.

@GalRogozinski
Copy link
Contributor Author

#370 was opened while working on this

@@ -121,7 +132,7 @@ func (i *Instance) ProcessMsg(msg *SignedMessage) (decided bool, decidedValue []
}
return err
case RoundChangeMsgType:
return i.uponRoundChange(i.StartValue, msg, i.State.RoundChangeContainer, i.config.GetValueCheckF())
return i.uponRoundChange(msg, i.State.RoundChangeContainer, i.config.GetValueCheckF())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we're passing i.StartValue, i.State.RoundChangeContainer, and i.config.GetValueCheckF() as arguments to the function. Given these are accessed via the instance i, could someone help clarify the rationale behind explicitly passing these as parameters instead of accessing them directly within the uponRoundChange method?

Is there a specific design choice or benefit we're aiming for by structuring the call this way? For instance, does this approach enhance testability, readability, or separation of concerns in a way that accessing them directly within the function would not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto
i.uponProposal
i.uponPrepare
i.UponCommit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GalRogozinski I agree with this, either only pass and not save (if its only used within these function) or save and don't pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is out of scope of PR, but can be done

@@ -121,7 +132,7 @@ func (i *Instance) ProcessMsg(msg *SignedMessage) (decided bool, decidedValue []
}
return err
case RoundChangeMsgType:
return i.uponRoundChange(i.StartValue, msg, i.State.RoundChangeContainer, i.config.GetValueCheckF())
return i.uponRoundChange(msg, i.State.RoundChangeContainer, i.config.GetValueCheckF())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case CommitMsgType:
	decided, decidedValue, aggregatedCommit, err = i.UponCommit(msg, i.State.CommitContainer)
	if decided {
		i.State.Decided = decided
		i.State.DecidedValue = decidedValue
	}
	return err

not related to the change, but we should check for err before assign

@@ -24,6 +24,7 @@ type Instance struct {
// forceStop will force stop the instance if set to true
forceStop bool
StartValue []byte
CdFetcher *types.DataFetcher `json:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename CdFetcher to CDFetcher for clarity? 'CD' stands for 'Consensus Data', and capitalizing the acronym could improve readability and align with our naming conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I even prefer the whole ConsensusDataFetcher, or simply DataFetcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets to this, I prefer DataFetcher or Fetcher. anyway if using acronym always capital letters for all words.

ssv/attester.go Outdated
}

if err := r.BaseRunner.decide(r, input); err != nil {
if err := r.BaseRunner.decide(r, attestationFetcher); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing r to r.BaseRunner.decide? Can't it access r internally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for
aggregator
proposer
and more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the BaseRunner can't access the runner internally. But, also, the decide function doesn't need the runner as far as I understand. So, I think this could be simplified to r.BaseRunner.decide(fetcher) too

@@ -5,6 +5,7 @@ import (
)

// EmptyValue tests a starting an instance for an empty value (not passing value check)
// Instance starts but with an empty value and no proposal gets broadcasted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way no proposals gets broadcasted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olegshmuelov I assume if CDFetcher fails fetching..

@@ -6,6 +6,7 @@ import (
)

// InvalidValue tests a starting an instance for an invalid value (not passing value check)
// Instance starts but with an empty value and no proposal gets broadcasted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -5,6 +5,7 @@ import (
)

// NilValue tests a starting an instance for a nil value (not passing value check)
// Instance starts but with an empty value and no proposal gets broadcasted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@MatheusFranco99 MatheusFranco99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!! Just left a suggestion

Comment on lines 185 to 197
func chooseValueToPropose(roundChangeMsg *SignedMessage, cdFetcher *types.DataFetcher) ([]byte, error) {
var valueToPropose []byte
if roundChangeMsg.Message.RoundChangePrepared() {
valueToPropose = roundChangeMsg.FullData
} else {
cd, err := cdFetcher.GetConsensusData()
if err != nil {
return nil, err
}
valueToPropose = cd
}
return valueToPropose, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this function can be called more than once during the Round-Change phase. If so, we would also wait for the Beacon Node more than once. Is there a way to save the data, e.g. storing it in the i.StartValue, so it can be returned faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of cdFetcher is that implementation can do it in a way where it calls it asynchronously and even caches values. But this is an implementation detail.

@olegshmuelov

}
if err = valueCheckF(value); err != nil {
fmt.Printf("%s\n", err.Error())
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also return if CreateProposal returns an error?

@@ -46,6 +46,12 @@ func (c *Contributions) HashTreeRootWith(hh ssz.HashWalker) error {
return nil
}

// DataFetcher asynchronusly fetches data from the beacon node upon instantiation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DataFetcher asynchronusly fetches data from the beacon node upon instantiation
// DataFetcher asynchronously fetches data from the beacon node upon instantiation

@@ -46,6 +46,12 @@ func (c *Contributions) HashTreeRootWith(hh ssz.HashWalker) error {
return nil
}

// DataFetcher asynchronusly fetches data from the beacon node upon instantiation
type DataFetcher struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not interface? Using concrete types is not flexible and inconvenient for testing and when spec repo is used by other repositories

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkryuchkov @GalRogozinski
Usually I support using concrete types. As they are more predictable than interfaces. but this time you are using this concrete type as an interface so it might make more sense for it to be an interface ( but then also moved to the runners ).
the rule is -
Return concrete types
Receive interfaces

@GalRogozinski GalRogozinski changed the base branch from main to alan June 13, 2024 08:33
@@ -56,16 +57,26 @@ func (i *Instance) ForceStop() {
}

// Start is an interface implementation
func (i *Instance) Start(value []byte, height Height) {
func (i *Instance) Start(cdFetcher *types.DataFetcher, height Height, valueCheckF ProposedValueCheckF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the i.GetConfig().GetValueCheck() instead of using an argument? It seems a duplication to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, if value check only used here then maybe we don't need it as part of Config.

Comment on lines +72 to +73
fmt.Printf("%s\n", err.Error())
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should be returned and not printed, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently we printed errors in main before...
not sure why...
Maybe we should add errors and test them?
I will open an issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +176 to +178
if proposer(state, config, roundChangeMsg.QBFTMessage.Round) != state.CommitteeMember.OperatorID {
return nil, errors.New("not proposer")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same check is done below. Any reason for duplicating it or we can remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where? I don't see it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

196-197

Comment on lines +213 to +224
func chooseValueToPropose(roundChangeMsg *Message, cdFetcher *types.DataFetcher, preparedValue []byte) ([]byte, error) {
var valueToPropose []byte
if roundChangeMsg.RoundChangePrepared() {
valueToPropose = preparedValue
} else {
cd, err := cdFetcher.GetConsensusData()
if err != nil {
return nil, err
}
valueToPropose = cd
}
return valueToPropose, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CDFetcher may update its value, we need to make sure that two different proposals for the same round can't be done. One way to do that is to check, in the validation of Round-Change messages, that the msg.Round is strictly above our current round.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand.. we call this once per round

if err := c.GetConfig().GetValueCheckF()(value); err != nil {
return errors.Wrap(err, "value invalid")
}
func (c *Controller) StartNewInstance(height Height, cdFetcher *types.DataFetcher) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move the fetching logic outside of controller and pass an already fetched value? this of course also means determining that we are leaders outside of the controller. IMO it makes more sense than passing a fetcher. less changes to controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking again this might not work if we need to fetch data after round change? but probably we should fetch data at the beginning of the slot regardless.

// chooseValueToPropose
// If justifiedRoundChangeMsg has no prepare justification choose state value
// If justifiedRoundChangeMsg has prepare justification choose prepared value
func chooseValueToPropose(roundChangeMsg *Message, cdFetcher *types.DataFetcher, preparedValue []byte) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since its only used in one place, this shouldn't be a method IMO, or at least the name should reflect that a long-awaited fetch is part of this function.

Copy link
Contributor

@y0sher y0sher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you think its possible to do the fetching outside of the qbft controller?

@iurii-ssv
Copy link

iurii-ssv commented Oct 31, 2024

@GalRogozinski (if/once you get back to this code) could you clarify/confirm whether this PR (after finished and merged) will support and be 100% compatible with the solution for the issue I described here (which we'll probably want to address at some point) ?

I haven't looked through the code of this PR too much yet, but from cursory overview I think

  • it couples/bundles the start of QBFT instance and the call to fetch starting data (when leader)
  • while the more flexible/unassuming approach (for the case when we are QBFT leader especially, but also in general) would be to 1) start QBFT instance with non-blocking Instance.Start call and let it finish all the setup it needs for Instance to become functional in terms of being able to communicate with other instances in the cluster (and participate round-changes for example) 2) and make it such that Instance can just "get injected" with StartValue without knowing when/whether/whom by it will happen (ofc this way there will be a new set of conditions to handle - when StartValue is needed but is not there yet); StartValue name also kinda becomes just Value and Instance will need to process value injection as it sees fits depending on his internal state at the time of the injection (which would be just a simple thread-safe Instance.SetValue call) depending on whether he is still leader, how long he still has till timeout in the current round - for example he can even yield his round voluntarily if timeout happens soon (if we add it as an option, as I describe in proposer-runner: QBFT liveness choked by Beacon node ssv#1825)

lets consider the following scenario that shows when/why "value injection" might be needed:

  • we have block proposer duty, lets say we found a way to start all QBFT instances for the relevant cluster "roughly at the same time" (which btw current SSV node implementation doesn't do as my comment describes it proposer-runner: QBFT liveness choked by Beacon node ssv#1825 (comment))
  • to make it concrete, lets say cluster is 4 nodes for this example, all 4 have now started counting time to the first round timeout waiting for leader to fetch his data (which he is in the process of)
  • leader couldn't fetch his data, and even worse - he is synchronously blocked on a call to Beacon node
  • now 3 nodes start round 1 change, and stuck leader of round 1 can't further participate in QBFT for this cluster because he is just stuck (his whole duty is blocked/stuck btw waiting for Instance.Start func to return) - all because of QBFT instance took it upon itself to go fetch the data
  • and Instance can't really recover from this type of scenario (even if DataFetcher function has timeout/retry logic) to start/keep participating in QBFT because technically Instance has never started and is thus useless to QBFT cluster

@GalRogozinski
Copy link
Contributor Author

Hey @iurii-ssv,

Yeah, this PR is suppose to allow making async calls.
We stopped working on it due to Alan fork... which is a shame in my opinion because I think this change should be part of a fork so all nodes have similar behavior

So currently we need to wait for the next fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus consensus related issues fork
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants