Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data/transactions/logic/assembler.go
Original file line number Diff line number Diff line change
Expand Up @@ -2738,7 +2738,7 @@ func AssembleString(text string) (*OpStream, error) {
return AssembleStringWithVersion(text, assemblerNoVersion)
}

// MustAssemble assembles a program an panics on error. It is useful for
// MustAssemble assembles a program and panics on error. It is useful for
// defining globals.
func MustAssemble(text string) []byte {
ops, err := AssembleString(text)
Expand Down
176 changes: 176 additions & 0 deletions heartbeat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Block Payouts, Suspensions, and Heartbeats

Running a validator node on Algorand is a relatively lightweight operation. Therefore, participation
in consensus was not compensated. There was an expectation that financial motivated holders of Algos
would run nodes in order to help secure their holdings.

Although simple participation is not terribly resource intensive, running _any_ service with high
uptime becomes expensive when one considers that it should be monitored for uptime, be somewhat
over-provisioned to handle unexpected load spikes, and plans need to be in place to restart in the
face of hardware failure (or the accounts should leave consensus properly).

With those burdens in mind, fewer Algo holders chose to run participation nodes than would be
preferred to provide security against well-financed bad actors. To alleviate this problem, a
mechanism to reward block proposers has been created. With these _block payouts_ in place, large
Algo holders are incentivized to run participation nodes in order to earn more Algos, increasing
security for the entire Algorand network.

With the financial incentive to run participation nodes comes the risk that some nodes may be
operated without sufficient care. Therefore, a mechanism to _suspend_ nodes that appear to
performing poorly (or not at all). Appearances can be deceiving, however. Since Algorand is a
probabilistic consensus protocol, pure chance might lead to a node appearing to be delinquent. A new
transaction type, the _heartbeat_, allows a node to explicitly indicate that it is online even if it
does not propose blocks due to "bad luck".

# Payouts

Payouts are made in every block, if the proposer has opted into receiving them, has an Algo balance
in an appropriate range, and has not been suspended for poor behavior since opting-in. The size of
the payout is indicated in the block header, and comes from the `FeeSink`. The block payout consist
of two components. First, a portion of the block fees (currently 50%) are paid to the proposer.
This component incentives fuller blocks which lead to larger payouts. Second, a _bonus_ payout is
made according to a exponentially decaying formula. This bonus is (intentionally) unsustainable
from protocol fees. It is expected that the Algorand Foundation will seed the `FeeSink` with
sufficient funds to allow the bonuses to be paid out according to the formula for several years. If
the `FeeSink` has insufficient funds for the sum of these components, the payout will be as high as
possible while maintaining the `FeeSink`'s minimum balance. These calculations are performed in
`endOfBlock` in `eval/eval.go`.

To opt-in to receiving block payouts, an account includes an extra fee in the `keyreg`
transaction. The amount is controlled by the consensus parameter `Payouts.GoOnlineFee`. When such a
fee is included, a new account state bit, `IncentiveEligible` is set to true.

Even when an account is `IncentiveEligible` there is a proposal-time check of the account's online
stake. If the account has too much or too little, no payout is performed (though
`IncentiveEligible` remains true). As explained below, this check occurs in `agreement` code in
`payoutEligible()`. The balance check is performed on the _online_ stake, that is the stake from 320
rounds earlier, so a clever proposer can not move Algos in the round it proposes in order to receive
the payout. Finally, in an interesting corner case, a proposing account could be closed at proposal
time, since voting is based on the earlier balance. Such an account receives no payout, even if its
balances was in the proper range 320 rounds ago.

A surprising complication in the implementation of these payouts is that when a block is prepared by
a node, it does not know which account is the proposer. Until now, `algod` could prepare a single
block which would be used by any of the accounts it was participating for. The block would be
handed off to `agreement` which would manipulate the block only to add the appropriate block seed
(which depended upon the proposer). That interaction between `eval` and `agreement` was widened
(see `WithProposer()`) to allow `agreement` to modify the block to include the proper `Proposer`,
and to zero the `ProposerPayout` if the account that proposed was not actually eligible to receive a
payout.

# Suspensions

Accounts can be _suspended_ for poor behavior. There are two forms of poor behavior that can lead
to suspension. First, an account is considered _absent_ if it fails to propose as often as it
should. Second, an account can be suspended for failing to respond to a _challenge_ issued by the
network at random.

## Absenteeism

An account can be expected to propose once every `n = TotalOnlineStake/AccountOnlineStake` rounds.
For example, a node with 2% of online stake ought to propose once every 50 rounds. Of course the
actual proposer is chosen by random sortition. To make false positive suspensions unlikely, a node
is considered absent if it fails to produce a block over the course of `10n` rounds.

The suspension mechanism is implemented in `generateKnockOfflineAccountsList` in `eval/eval.go`. It
is closely modeled on the mechanism that knocks accounts offline if their voting keys have expired.
An absent account is added to the `AbsentParticipationAccounts` list of the block header. When
evaluating a block, accounts in `AbsentParticipationAccounts` are suspended by changing their
`Status` to `Offline` and setting `IncentiveEligible` to false, but retaining their voting keys.

### Keyreg and `LastHeartbeat`

As described so far, 320 rounds after a `keyreg` to go online, an account suddenly is expected to
have proposed more recently than 10 times its new expected interval. That would be impossible, since
it was not online until that round. Therefore, when a `keyreg` is used to go online and become
`IncentiveEligible`, the account's `LastHeartbeat` field is set 320 rounds into the future. In
effect, the account is treated as though it proposed in the first round it is online.

### Large Algo increases and `LastHeartbeat`

A similar problem can occur when an online account receives Algos. 320 rounds after receiving the
new Algos, the account's expected proposal interval will shrink. If, for example, such an account
increases by a factor of 10, then it is reasonably likely that it will not have proposed recently
enough, and will be suspended immediately. To mitigate this risk, any time an online,
`IncentiveEligible` account balance doubles from a single `Pay`, its `LastHeartbeat` is incremented
to 320 rounds past the current round.

## Challenges

The absenteeism checks quickly suspend a high-value account if it becomes inoperative. For example,
and account with 2% of stake can be marked absent after 500 rounds (about 24 minutes). After
suspension, the effect on consensus is mitigated after 320 more rounds (about 15
minutes). Therefore, the suspension mechanism makes Algorand significantly more robust in the face
of operational errors.

However, the absenteeism mechanism is very slow to notice small accounts. An account with 30,000
Algos might represent 1/100,000 or less of total stake. It would only be considered absent after a
million or more rounds without a proposal. At current network speeds, this about a month. With such
slow detection, a financially motived entity might make the decision to run a node even if they lack
the wherewithal to run the node with excellent uptime. A worst case scenario might be a node that is
turned off daily, overnight. Such a node would generate profit for the runner, would probably never
be marked offline by the absenteeism mechanism, yet would impact consensus negatively. Algorand
can't make progress with 1/3 of nodes offline at any given time for a nightly rest.

To combat this scenario, the network generates random _challenges_ periodically. Every
`Payouts.ChallengeInterval` rounds (currently 1000), a random selected portion (currently 1/32) of
all online accounts are challenged. They must _heartbeat_ within `Payouts.ChallengeGracePeriod`
rounds (currently 200), or they will be subject to suspension. With the current consensus
parameters, nodes can be expected to be challenged daily. When suspended, accounts must `keyreg`
with the `GoOnlineFee` in order to receive block payouts again, so it becomes unprofitable for
these low-stake nodes to operate with poor uptimes.

# Heartbeats

The absenteeism mechanism is subject to rare false positives. The challenge mechanism explicitly
requires an affirmative response from nodes to indicate they are operating properly on behalf of a
challenged account. Both of these needs are addressed by a new transaction type --- _Heartbeat_. A
Heartbeat transaction contains a signature (`HbProof`) of a recent block seed (`HbSeed`) under the
participation key of the account (`HbAddress`) in question. Note that the account being heartbeat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clarification to call out

for is _not_ the `Sender` of the transaction, which can be any address.

It is relatively easy for a bad actor to emit Heartbeats for its accounts without actually
participating. However, there is no financial incentive to do so. Pretending to be operational when
offline does not earn block payouts. Furthermore, running a server to monitor the block chain to
notice challenges and gather the recent blockseed is not significantly cheap that simply running a
functional node. It is _already_ possible for malicious, well-resourced accounts to cause consensus
difficulties by going online without actually participating. Heartbeats do not mitigate that
risk. But these mechanisms have been designed to avoid _motivating_ such behavior, so that they can
accomplish their actual goal of noticing poor behavior stemming from _inadvertent_ operational
problems.

## Free Heartbeats

Challenges occur frequently, so it important that `algod` can easily send Heartbeats as
required. How should these transactions be paid for? Many accounts, especially high-value accounts,
would not want to keep their spending keys available for automatic use by `algod`. Further, creating
(and keeping funded) a low-value side account to pay for Heartbeats would be an annoying operational
overhead. Therefore, when required by challenges, heartbeat transactions do not require a fee.
Therefore, any account, even an unfunded logigsig, can send heartbeats for an account under
challenge.

The conditions for a free Heartbeat are:

1. The Heartbeat is not part of a larger group, and has a zero `GroupID`.
1. The `HbAddress` is Online and under challenge with the grace period at least half over.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look for this in the code - but why does the grace period have to be at least half over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it would be nice to let things play out for a little while first. Any challenged node that proposes during the first half of the grace period doesn't have to heartbeat. I suppose that's not likely to be common. 1/32 accounts are challenged - so maybe we get three nodes heartbeating during the 100 rounds that constitute half a grace period?

We could increase the grace period (at most 500, it must be less than half our ability to lookup headers, which is 1001). That would delay suspensions, but who cares? We could also make it 3/4 of grace period? Or grace period minus some constant that gives nodes enough time to heartbeat. 50?

Or just leave it. I don't have any strong feelings. But I will add a unit test to confirm nobody ever makes the grace period more than half of our header lookup ability.

1. The `HbAddress` is `IncentiveEligible`.
1. There is no `Note`, `Lease`, or `RekeyTo`.

## Heartbeat Service

The Heartbeat Service (`heartbeat/service.go`) watches the state of all acounts for which `algod`
has participation keys. If any of those acounts meets the requirements above, a heartbeat
transaction is sent, starting with the round following half a grace period from the challenge. It
uses the (presumably unfunded) logicsig that does nothing except preclude rekey operations.

The heartbeat service does _not_ heartbeat if an account is unlucky and threatened to be considered
absent. We presume such false postives to be so unlikely that, if they occur, the node must be
brought back online manually. It would be reasonable to consider:

1. Making heartbeats free for accounts that are "nearly absent".

or

2. Allowing for paid heartbeats by the heartbeat service when configured with access to a funded
account's spending key.

20 changes: 20 additions & 0 deletions ledger/eval/eval.go
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,7 @@ func (cs *roundCowState) Move(from basics.Address, to basics.Address, amt basics
if overflowed {
return fmt.Errorf("overspend (account %v, data %+v, tried to spend %v)", from, fromBal, amt)
}
fromBalNew = cs.autoHeartbeat(fromBal, fromBalNew)
err = cs.putAccount(from, fromBalNew)
if err != nil {
return err
Expand Down Expand Up @@ -636,6 +637,7 @@ func (cs *roundCowState) Move(from basics.Address, to basics.Address, amt basics
if overflowed {
return fmt.Errorf("balance overflow (account %v, data %+v, was going to receive %v)", to, toBal, amt)
}
toBalNew = cs.autoHeartbeat(toBal, toBalNew)
err = cs.putAccount(to, toBalNew)
if err != nil {
return err
Expand All @@ -645,6 +647,24 @@ func (cs *roundCowState) Move(from basics.Address, to basics.Address, amt basics
return nil
}

// autoHeartbeat compares `before` and `after`, returning a new AccountData
// based on `after` but with an updated `LastHeartbeat` if `after` shows enough
// balance increase to risk a false positive suspension for absenteeism.
func (cs *roundCowState) autoHeartbeat(before, after ledgercore.AccountData) ledgercore.AccountData {
// No need to adjust unless account is suspendable
if after.Status != basics.Online || !after.IncentiveEligible {
return after
}

// Adjust only if balance has doubled
twice, o := basics.OMul(before.MicroAlgos.Raw, 2)
if !o && twice < after.MicroAlgos.Raw {
lookback := agreement.BalanceLookback(cs.ConsensusParams())
after.LastHeartbeat = cs.Round() + lookback
}
return after
}

func (cs *roundCowState) ConsensusParams() config.ConsensusParams {
return cs.proto
}
Expand Down
78 changes: 72 additions & 6 deletions test/e2e-go/features/incentives/whalejoin_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (

"github.com/stretchr/testify/require"

v2 "github.com/algorand/go-algorand/daemon/algod/api/server/v2"
"github.com/algorand/go-algorand/daemon/algod/api/server/v2/generated/model"
"github.com/algorand/go-algorand/data/basics"
"github.com/algorand/go-algorand/data/transactions"
Expand All @@ -36,8 +37,8 @@ import (
// TestWhaleJoin shows a "whale" with more stake than is currently online can go
// online without immediate suspension. This tests for a bug we had where we
// calcululated expected proposal interval using the _old_ totals, rather than
// the totals following the keyreg. So big joiner could be expected to propose
// in the same block they joined.
// the totals following the keyreg. So big joiner was being expected to propose
// in the same block it joined.
func TestWhaleJoin(t *testing.T) {
partitiontest.PartitionTest(t)
defer fixtures.ShutdownSynchronizedTest(t)
Expand Down Expand Up @@ -185,19 +186,84 @@ func TestBigJoin(t *testing.T) {
// is looking for.
}

// TestBigIncrease shows when an incentive eligible account receives a lot of
// algos, they are not immediately suspended. We also check the details of the
// mechanism - that LastHeartbeat is incremented when such an account doubles
// its balance in a single pay.
func TestBigIncrease(t *testing.T) {
partitiontest.PartitionTest(t)
defer fixtures.ShutdownSynchronizedTest(t)

t.Parallel()
a := require.New(fixtures.SynchronizedTest(t))

var fixture fixtures.RestClientFixture
const lookback = 32
fixture.FasterConsensus(protocol.ConsensusFuture, time.Second/2, lookback)
fixture.Setup(t, filepath.Join("nettemplates", "Payouts.json"))
defer fixture.Shutdown()

// Overview of this test:
// 0. spend wallet01 down so it has a very small percent of stake
// 1. rereg wallet01 so it is suspendable
// 2. move almost all of wallet15's money to wallet01
// 3. check that c1.LastHeart is set to 32 rounds later
// 4. wait 40 rounds ensure c1 stays online

clientAndAccount := func(name string) (libgoal.Client, model.Account) {
c := fixture.GetLibGoalClientForNamedNode(name)
accounts, err := fixture.GetNodeWalletsSortedByBalance(c)
a.NoError(err)
a.Len(accounts, 1)
fmt.Printf("Client %s is %v\n", name, accounts[0].Address)
return c, accounts[0]
}

c1, account01 := clientAndAccount("Node01")
c15, account15 := clientAndAccount("Node15")

// We need to spend 01 down so that it has nearly no stake. That way, it
// certainly will not have proposed by pure luck just before the critical
// round. If we don't do that, 1/16 of stake is enough that it will probably
// have a fairly recent proposal, and not get knocked off.
pay(&fixture, a, c1, account01.Address, account15.Address, 99*account01.Amount/100)

rekeyreg(&fixture, a, c1, account01.Address)

// 2. Wait lookback rounds
wait(&fixture, a, lookback)

tx := pay(&fixture, a, c15, account15.Address, account01.Address, 50*account15.Amount/100)
data, err := c15.AccountData(account01.Address)
a.NoError(err)
a.EqualValues(*tx.ConfirmedRound+lookback, data.LastHeartbeat)

wait(&fixture, a, lookback+5)
data, err = c15.AccountData(account01.Address)
a.NoError(err)
a.Equal(basics.Online, data.Status)
a.True(data.IncentiveEligible)
}

func wait(f *fixtures.RestClientFixture, a *require.Assertions, count uint64) {
res, err := f.AlgodClient.Status()
a.NoError(err)
round := res.LastRound + count
a.NoError(f.WaitForRoundWithTimeout(round))
}

func zeroPay(f *fixtures.RestClientFixture, a *require.Assertions,
c libgoal.Client, address string) {
pay, err := c.SendPaymentFromUnencryptedWallet(address, address, 1000, 0, nil)
func pay(f *fixtures.RestClientFixture, a *require.Assertions,
c libgoal.Client, from string, to string, amount uint64) v2.PreEncodedTxInfo {
pay, err := c.SendPaymentFromUnencryptedWallet(from, to, 1000, amount, nil)
a.NoError(err)
_, err = f.WaitForConfirmedTxn(uint64(pay.LastValid), pay.ID().String())
tx, err := f.WaitForConfirmedTxn(uint64(pay.LastValid), pay.ID().String())
a.NoError(err)
return tx
}

func zeroPay(f *fixtures.RestClientFixture, a *require.Assertions,
c libgoal.Client, address string) {
pay(f, a, c, address, address, 0)
}

// Go offline, but return the key material so it's easy to go back online
Expand Down
Loading