From 94f6f5eab99d6e26e6151e774b4ced01fd199469 Mon Sep 17 00:00:00 2001 From: Nicolas Frisby Date: Thu, 21 Nov 2024 13:12:20 -0800 Subject: [PATCH] website: draft CardanoPraosBasics.md and CivicTime.md --- .../for-developers/CardanoPraosBasics.md | 154 ++++++++++++ .../contents/for-developers/CivicTime.md | 230 ++++++++++++++++++ docs/website/sidebars.js | 2 + 3 files changed, 386 insertions(+) create mode 100644 docs/website/contents/for-developers/CardanoPraosBasics.md create mode 100644 docs/website/contents/for-developers/CivicTime.md diff --git a/docs/website/contents/for-developers/CardanoPraosBasics.md b/docs/website/contents/for-developers/CardanoPraosBasics.md new file mode 100644 index 0000000000..017c7be245 --- /dev/null +++ b/docs/website/contents/for-developers/CardanoPraosBasics.md @@ -0,0 +1,154 @@ +# Basics of Ouroboros Praos in Cardano + +2024 November | nick.frisby@iohk.io | nicolas.frisby@moduscreate.com + +## The Main Theorem + +If both of the following hold + +- {PraosGrinding}. + The adversary is not expending an enormous amount of hashing power[^GrindingPrereq] (ie hashes per second), on the order of magnitude of the the entire contemporary Bitcoin network. + +- {PraosHonestNetwork}. + There is always a set X (that may vary with time) of honest Cardano mainnet nodes that satisfy all of the following. + + - {PraosSelection}. + Each node in X promptly selects the best valid chain that has propagated to it. + + - {PraosSemisynchrony}. + The best selection among X will always propagate to all of X within much less than 20 seconds, despite the adversary's best effort to delay it. + + - {PraosHonestMinting}. + At the onset of a slot that elects some honest stake pools, for each such pool, exactly one node in X promptly mints exactly one block that extends the node's current selection. + That mechanism is the only way that honest stake pools with more than half of the (effective[^GrindingWiggle]) delegated stake mints blocks. + + - {PraosEpochStructure}. + The on-chain protocol state maintains an evolving nonce that accumulates every block's election proof. + The elections during epoch E on a chain are determined by the value of that evolving nonce that was snapshotted on that chain at least 129600 slots (ie 36 hr)[^GenesisBump] before E and a stake distribution that was snapshotted on that chain at least 172800 slots (ie 48 hours) before the nonce was snapshotted. + + - {PraosProtocolStability}. + Similarly, the protocol itself (eg parameters) cannot vary on a single chain with less than 129600 slots (ie 36 hr)^[GenesisBump] of warning. + +then the Praos paper proves a vanishingly small failure probability for each of the following. + +- {PraosCommonPrefix}. + The 2161st youngest block on any selection in X is already and always will be on every other selection in the (evolving) X. + (Seek out the pre-computed table of worst-case _settlement times_ in other Cardano documentation.) + +- {PraosChainGrowth}. + The 2161st youngest block on any selection in X is not older than 129600 slots (ie 36 hr). + (When all stake is behaving honestly, the age of the 2161st youngest block will be approximately 43200 slots (ie 12 hr).) + +- {PraosChainQuality}. + Every span of 43200 slots (ie 12 hr) on a selection in X contains at least one block minted by an honest node. + +[^GrindingPrereq]: Remark. + The PhD dissertation https://eprint.iacr.org/2021/1698 established that an adversary with less than roughly 9.5% stake will not successfully grind, regardless of hashing power. + Generally, though, more hashing power or more stake would both benefit the adversary. + +[^GrindingWiggle]: Elaboration. + A grinding adverary is effectively amplifying their portion of the stake. + The Praos security argument requires that even their _effective_ relative stake is below half. + +[^GenesisBump]: Warning. + Ouroboros Genesis increases this by 43200 slots (ie 12 more hours). + +## Grinding Attack + +In a {GrindingAttack}, the adversary attempts to choose the nonce for some epoch. +Suppose the following holds simultaneously, where C is the best selection in X and C^2160 is C less its 2160 youngest blocks. + +- C has between 1 and 2160 blocks after the slot that snapshots the target nonce. + +- The adversary has some private chains that are longer than C and fit the schema "C^2160 … H? … D …" where H is the youngest honest block on this chain (or else C^2160 if none) and D is the youngest adversarial block on this chain that's before the snapshot. + (The schema requires C^2160 because of PraosCommonPrefix and/or PraosEpochStructure.) + +The adversary can harvest a combinatorial number of nonces from each such chain with a different H. +The method is to drop any adversarial block after H up to and including D, but not so many that the result is not longer than C anymore, else X would ignore the resulting chain. +Crucially, each such chain snapshots a different nonce. +And preferably the chain still includes at least one block after the snapshot slot, so that extensions of it don't alter the nonce. + +The adversary has to expend their hashing power to calculate those nonces and additional compute power to evaluate them to find the most preferable nonce (eg the one that maximizes how many of the 432000 slots in the next epoch elect the adversary) --- that additional compute power has not yet been accounted for in any grinding analysis. +If the adversary finds a chain with a good nonce, it can release it to X, thereby improving the nonce for the target epoch. +And it can do so again, if it can find a chain that both has a better nonce and is even longer. + +However, the grinding calculations can only happen in a particular time interval. +Each such chain includes some honest blocks, at least C^2160 and more than that unless the adversary is gaining elections faster than C is gaining blocks. +Therefore, these calculations cannot start until the wall clock is somewhat close to the snapshot slot. +The calculations also have a deadline, since the chain C is growing during them. +If C^2160 reaches the intersection of C and some attacking chain or C just becomes longer than the best extension of that attacking chain the adversary can muster, X will never switch to that attacking chain. + +The duration of the grinding interval depends on various factors, but generally becomes longer if the honest chain is growing slower or the adversary is gaining elections faster and/or already had a big lead. +If the adversary wanted to remain covert, it'd need to continue contributing (most of) its elections to X's selections. +On Cardano, that typically means the time interval would not last significantly more than 12 hr. +Moreover, Praos would have to explicitly fail for the interval to last more than 36 hr, according to PraosChainGrowth. +(TODO that previous sentence doesn't consider that the adversary might be able to start grinding early.) +If the adversary doesn't have much stake, then it's extremely unlikely they'll have many excess elections. +That both decreases their search space and also reduces the interval since it makes them more dependent on orphaned honest blocks, which prevent the calculations from starting earlier than those orphans arrive. + +(TODO integrate the fact that 100 blocks already generates enough chains to require years of hashes to compute the nonces?) + +## Protocol and Leader Schedule Stability + +The delays required by PraosEpochStructure and PraosProtocolStability, when combined with PraosCommonPrefix and PraosChainGrowth, ensure that all of X will always be executing the same protocol with the same the leader schedule for the current epoch. +This does not actually seem to be a fundamental requirement for the protocol itself (since the headers/blocks that use some nonce identify their preceding chain) --- and it is in particular not crucial to the preceding GrindingAttack explanation. + +Instead, the Praos authors merely relied on this property to tackle the Praos proofs. +It similarly simplifies all reasoning about the protocol unless Praos has already failed in some way. +That includes third-party (community) tooling, etc. +So these two requirements provides some convenience to developers at the acceptable cost of a lower bound on governance delays. + +Notably, these requirements enable the Header-Body Split optimization: the honest node can validate a block header even before having fetched or validated the preceding 2160 blocks. +(Seek out more details in other Cardano Consensus documentation.) + +## Unspecified Behaviors When Praos Fails + +PraosCommonPrefix is both trivial and crucial to enforce within the node---without it, the node would have to process/retain incoming data regardless of how historical it is (eg headers/blocks). + +PraosChainGrowth, on the other hand, has been indirectly assumed in various places (notably PraosEpochStructure and PraosProtocolStability), but is not enforced anywhere. +It would also be trivial to enforce, but it has so far been considered unnecessary and even potentially inconvenient. + +For example, suppose Amazon had a very bad outage for couple days, and the Cardano network's "X" nodes _almost_ satisfied PraosChainGrowth but came up just one block short. +If the node were to enforce PraosChainGrowth (eg it refused to select chains that violate it), then the community would have no choice but to invoke the off-chain disaster recovery plan, which has significant instantaneous costs and long-term reputational costs. +Perhaps it's better for nodes to make and best effort and possibly limp along in that scenario, crossing sparse intervals of chains as best it can (if the adversary allows it). + +On the other hand, some unavoidable disasters could knock out the (public) infrastructure Cardano relies on, such that there can be no X. +EG a https://en.wikipedia.org/wiki/Carrington_Event solar storm could partition the Cardano network such that no set of connected nodes mints on behalf of a third of the delegated stake. +A PraosChainGrowth violation would be inevitable in that scenario. +In such an extreme scenario, it's easy to assume the nodes' behavior is irrelevant. +However, today's node can proceed (adversary permitting) as long as there's at least _one_ block in every 36 hr period. +Who is to say the Cardano community would never want the option to preserve that weak history of the chain instead of invoking the disaster recovery plan? + +One major downside to the current scenario is that almost no specifications for nodes, community tooling, etc were ever intended to scope over behaviors when Praos fails. +Thus some simplifications/optimizations/etc are likely to fail in surprising ways during disasters, including those that are fundamentally unavoidable. +This indirectly suggests that the community itself might effectively be partitioned --- whether that was immediately recognized or only manifested in surprising disagreements among tooling after some delay --- if the historical Cardano chain itself violated PraosChainGrowth. + +Thus, the _most careful_ option would be to enforce PraosChainGrowth in the node. +This does force the community to execute the disaster recovery plan even in "slight" disasters. +But doing so ensures that essentially all of the Cardano infrastructure can _safely_ continue to take PraosChainGrowth and PraosChainQuality for granted. + +(Like PraosChainGrowth, PraosChainQuality is similarly assumed at least in PraosEpochStructure, but it inherently cannot be enforced.) + +## Protocol Refinements/Extensions + +Subsequent work on the Ouroboros protocols refines/extends the Praos theorem. + +- PraosSemisynchrony indirectly precludes X from containing nodes that are still syncing. + Praos itself does not provide any protections for nodes to catch up to the honest network, ie to (re)join X, it merely ensure the protocol can benefit from nodes that manage to do so. + As of the Ouroboros Genesis protocol, X still excludes syncing nodes, but now an honest node will eventually be able to (re)join X after it's had access to X for long enough. + The specific Cardano implementation of Genesis additionally requires the syncing node is never eclipsed during its sync. + +- Ouroboros Peras sometimes expedites settlement times[^SettlementHashingPower] such that {PerasCommonPrefix} and {PerasChainGrowth} will reduce the 2161 count of PraosCommonPrefix and PraosChainGrowth in intervals during which Peras voting was not disrupted. + +- The count of 2160 blocks was chosen for Cardano's PraosCommonPrefix etc in order to absorb the risk of the GrindingAttack. + If that attack were somehow mitigated, the count could be reduced. + Improved anti-grinding measures (eg involving a _verifiable delay function_ in the evolving nonce) will therefore also expedite settlement[^SettlementHashingPower]. + The 2161 count will likely not be as reduced as it is in Peras, but it will be reduced _always_ instead of _sometimes_. + +- At the very least, PraosHonestMinting requires an accurate wall clock. + Most cardano nodes currently use the Network Time Protocol (NTP) for that, but that's regarded as a vulnerability. + Ouroboros Chronos replaces NTP with a bespoke on-chain clock. + Even without the on-chain mechanics, Chronos informs how the Praos node should handle blocks that arrive well before the onset of their slot, for example. + +[^SettlementHashingPower]: Remark. + Expediting settlement also reduces the search space and grinding interval of the GrindingAttack. diff --git a/docs/website/contents/for-developers/CivicTime.md b/docs/website/contents/for-developers/CivicTime.md new file mode 100644 index 0000000000..2d7ce9acea --- /dev/null +++ b/docs/website/contents/for-developers/CivicTime.md @@ -0,0 +1,230 @@ +# Civic Time in the Cardano Node + +## Introduction + +This document discusses how civic time (eg a value the wall clock could report) relates to the Cardano node, both the current design details and the higher-level fundamental needs. + +It also raises the question of what behaviors the node should exhibit when the Praos security argument has failed. + +## Pre-Ouroboros Chronos + +The Ouroboros Chronos protocol has not been implemented, but some of its basic rules have already been implmemented. +Today's node behaviors related to Chronos can be summarized as follows (see [`./HandlingBlocksFromTheFuture.md`](./HandlingBlocksFromTheFuture.md) for details). + +- The node trusts NTP. + (A full Ouroboros Chronos implementatino would not.) + +- The node silently ignores the wall clock moving backwards by a small amount. + It crashes if the wall clock moves backward by a large amount. + (This would be a NTP failure/attack vector.) + +- The node enforces a small bound for acceptable clock skew with respect to some peer's apparent clock. + +- The key indicator of a peer's apparent wall clock is the reception of a header from that peer whose slot onset is ahead of the local wall clock. + If the header's earliness is beyond the acceptable clock skew, the peer is considered buggy or dishonest; the node disconnects with prejudice. + If it's instead within bounds, the ChainSync client for that peer is paused until the header is no longer ahead of the local wall clock. + +An honest node will not fetch a block before validating the corresponding header, so the above rule prevents a node from receiving a block before the wall clock has reached the onset of its slot. + +*Aside*. +It is possibly that a block from the future is already in the database when the node starts. +That's a corner case that seems unlikely to matter in practice. + +## Time Translations + +The Ouroboros Praos protocol and the Cardano details built around it are almost exclusively defined in terms of slots rather than common civic measures of time, such as POSIX time, UTC, etc. +Each block explicitly inhabits some slot. +The same is true for block headers and the election proofs therein. +For the sake of determinism, transactions are labeled with a range of slots in which the transaction can be valid. + +The are only two exceptions. + +- The node's (commodity) operating system cannot compute the current slot, merely the current civic time. + +- The Plutus interface exposes the validity range as POSIX time, in the [`txInfoValidRange` field](https://plutus.cardano.intersectmbo.org/haddock/latest/plutus-ledger-api/PlutusLedgerApi-V1.html#t:TxInfo) (see this [blog post](https://iohk.io/en/blog/posts/2022/12/08/time-handling-on-cardano-part-2-use-cases/) for high-level background). + The Consensus Layer design would have been simpler if the Plutus API provided the validity range in terms of slots, but that ship has sailed. + (And developers writing Plutus scripts are almost certainly relieved that it did.) + The "Slot to time translation" paragraph within the Alonzo Ledger specification [`alonzo-ledger.pdf`](https://github.com/intersectmbo/cardano-ledger/releases/latest/download/alonzo-ledger.pdf) explains that this design prevents script authors from assuming the wrong translations between slot and civic time. + +Because of those exceptions, the Consensus Layer must use the wall clock and/or the translation back and forth between slots and civic. +Those uses are listed in the following table. + + - Every use involves some translation, whether it be from slot-to-civic or vice versa (which is always the wall clock as a slot). + + - This table excludes some uses that the Conensus Team is in the processing of removing (eg those that could be replaced by annotating validated headers with their slot onset). + + - The rightmost column of the table judges whether the use obviously involves some entity that obviously determines which ledger state to use for the translations. + The column is for the benefit of a section below, but the key idea is that the translations are inherently chain-dependent, but users and most developers are blissfully unaware of that and, moreover, such a dependency is fundamentally contrary to human intuitions about time. + + - The last few rows make explicit Consensus features that are not yet implemented but will also involve civic time. + +| Component | Use | Reads wall clock (whether translated to slot) | Which slots are translated to civic | Whether which ledger state is "obviously" determined | +| - | - | - | - | - | +| ChainSync | enforce the clock skew bound | Yes (raw) | header's slot | Yes, header's intersection [^HeaderBodySplit] | +| ChainSel | define Plutus `txInfoValidRange`s when validating a block | No | validity range | Yes, the block | +| The Mint | checking whether to mint | Yes (translated) | none | Yes, the new block | +| The Mint, via `getSnapshotFor` | recheck validity range[^MempoolAdd] | Yes (translated) | none | Yes, the new block | +| Genesis State Machine, aka GSM | detect that the selection is an old chain | Yes (raw) | selection's slot | Yes, the selection | +| hypothetically[^MempoolAdd]: Mempool (add and resync) | enforce the validity range | Yes (translated) | none | No, dangling txs do not | +| Mempool (only add[^NoResyncTranslation]) | define Plutus `txInfoValidRange` when validating a dangling tx | No | validity range | No, dangling txs do not | +| LocalStateQuery | the `GetInterpreter` query | No | arbitrary slots | ⚠No⚠[^GetInterpreterWild] | +| Ouroboros Peras | ? analogs of ChainSync + The Mint (+ Mempool?) for votes | ? Yes (both)| ? vote slot | ?[^GiorgosIdea] | +| Ouroboros Leios | ? analogs of ChainSync + The Mint (+ Mempool?) for IBs and EBs | ? Yes (both) | ? header slot | ?[^GiorgosIdea] | +| Mithril | ? analogs of ChainSync + The Mint (+ Mempool?) for votes | ? Yes (raw?) | ? | ? | + +[^HeaderBodySplit]: Elaboration. + The node is allowed to ignore headers that intersect more than k blocks back from its current selection. + So a relevant header's intersection will be one of the youngest k+1 blocks of the selection. + Also, also, the k+1st headers after that extension are completely sufficient to determine whether the node ought to switch to chain with those headers. + Thus the youngest k+1 blocks of the selection must be able to translate the slot times of a chain of up to k+1 relevant headers that extend that block but not its successor on the selection. + (Being able to translate one ChainGrowth stability window into the future suffices to ensure the second part.) + +[^MempoolAdd]: Explanation. + Today's Mempool doesn't actually use the wall clock to enforce the validity range; it instead uses the slot after the selection's tip. + But it might be preferable to use the wall-clock. + The recheck in the Mint means there's no risk of minting a block with a stale tx. + +[^GetInterpreterWild]: Warning. + The consumer of the `GetInterpeter` result could use it for anything. + Remarkably, the fact that the LocalStateQuery mini protocol forced the consumer to explicitly acquire a concrete ledger state (aka a point) before it could even issue the query is not obvious to the user of the consumer. + One notable known use is the CLI tool's computation of the upcoming the leadership schedule. + That tool's UX does not currently loudly indicate that the result depends on which ledger state answered the query and that the output could therefore be different if the same question is asked, even during the same epoch. + + This same thing is true of some other queries, but they all have arguments that are obviously chain-dependent. + +[^NoResyncTranslation]: Explanation. + Only Plutus scripts require `txInfoValidRange`, but resyncing the Mempool doesn't re-execute Plutus scripts. + See "Two-Phase Transaction Validation for Phase-2 Scripts" in the Alonzo ledger spec. + +[^GiorgosIdea]: Note. + We briefly raised this concern/request with Giorgos Panagiotakos. + He brainstormed an idea of perhaps recording the relevant nonce in vote/header. + +## Some Time Translations Cannot Be Predicted + +Different eras of the chain can have different slot durations. +Therefore it must be the responsibility of the Hard Fork Combinator (HFC) to define translation back and forth between slots and civic. +Even with all available information, some translations involving the future cannot be predicted. + +Recall that the Byron era of Cardano sets the slot duration to 20 seconds, while all other eras so far set it to one second. +Because eras can have different slot durations, a ledger state is fundamentally unable to correctly translate times _arbitrarily_ ahead of its own slot. +Even if that ledger state were to be perfectly recent (ie its slot is very near the wall clock), eras that have not yet been implemented/designed/even considered could have an unpredictable slot duration. + +Using a ledger state that is perfectly recent but otherwise arbitrary, how far ahead could the HFC do translations that are necessarily correct? +Ultimately, it depends on how quickly the net could fork to an era with a different slot duration. +In practice, that's a matter of months on Cardano mainnet[^EmergencyRapidity]. +In theory --- assuming only that the Praos security argument holds and that honest stakeholders wouldn't vote for the hard fork unless their nodes were already ready for it --- it's one stability window less than the lower bound on the duration between the current era's voting deadline and the proposal being enacted. + + - In Byron, that lower bound was originaly 2k slots = 4320 slots = 86400 seconds = one day, but was doubled to two days before the fork to Shelley happened. + So the lower bound on translations was at least one day ahead, since the Byron stability window was also 2k slots. + + - After Byron and before Conway, it was 6k/f slots = 259200 slots = 259200 seconds = 3 days. + So the lower bound on translations was at least 1.5 days ahead, since the post-Byron stability window has always been 3k/f slots. + + - Since Conway, it's been one epoch = 10k/f slots = 432000 slots = 432000 seconds = 5 days. + So the lower bound on translations is currently at least 3.5 days ahead, since the stability window is still 3k/f slots. + +Those lower bounds are for an arbitrary ledger state, and so they're the "worst-case". +For specific ledger states, their detailed location in the epoch can increase the lower bound by up to one epoch, which has always been 5 days in every era so far. + +The limit is one stability window less the the post-voting buffer because --- assuming only the Praos security argument --- that's the upper bound on the age of a block the honest Praos node might need to discard as part of a rollback. +In particular, until the oldest such block is after the voting deadline, a rollback could switch to a chain with a different voting outcome in that epoch. +For example, switching from a chain in which the next era is only a few slots away to one in which it is at least another epoch away. + +*Aside*. +On the other hand, when using a (non-recent) ledger state in a historical era, the HFC could theoretically use its knowledge about the upcoming eras in order to translate even further ahead. +For example, the HFC could safely assume the one second duration for slots between some given Babbage ledger state and however many slots it might possibly take before that ledger state could transition from Babbage to Conway and from Conway to whatever comes after, because that mystery era is the first time the slot duration could change. +The HFC does not do this in practice, since the extra complexity is not worthwhile; it's not necessary for syncing nodes to be smarter than caught-up nodes. + +[^EmergencyRapidity]: Technicality. + It could perhaps be weeks or days in an emergency, but a change to the slot duration would almost certainly be avoided in that case. + +## What About Chain Growth Failures? + +At a high-level, today's HFC was derived as follows. +The exact thought process was not recorded during the design work, but this is a plausible reconstruction. + + - {ImmutableTranslations}. + Because of humans' general assumptions about time, it would be prohibitively confusing if the node (even its internal interfaces) might give different translations of the same slots/civic times. + So a switch from `t1` to `t2` for the result of some translation is unacceptable. + At the very least, the node should refuse to do a translation that might be invalidated by on-chain governance in the meantime (eg several weeks into the future). + + - {NonemptyForecastRange}. + On the other hand, the node should be able to translate slot/civic times slightly ahead of its tip. + At the very least, some users presumably want some capacity to plan ahead (eg to smartly schedule their node's brief downtime). + + - {RollbackInsensitiveTranslations}. + As a specific subcase of the first requirement, the node should even refuse to do translations that might be invalidated by a rollback. + + - {MonotonicTranslations}. + It would also be too confusing if a node agreed to translate some slot/civic time and then subsequently refused to translate that same slot/civic time. + So `Just t1 -> Nothing` is also unacceptable for some translation, not merely `Just t1 -> Just t2`. + + - {DespiteChainGrowthViolation}[^Anachrony]. + All of the above must hold even in the presence of a Chain Growth violation, except switching from a chain that violates Chain Growth to one that does not is allowed to violate the RollbackInsensitiveTranslations subcase of ImmutableTranslations. + +[^Anachrony]: Clarification. + This was not originally a "requirement", but the behavior was eventually discovered and has so far been accepted as reasonable and potentially even desirable. + +**{Iteration 1}**. +The ImmutableTranslations and NonemptyForecastRange requirements are simple to achieve without the rest. + + - Refuse to translate a slot/civic time that is after the enactment of a governance outcome if using a ledger state that is before the corresponding voting deadline. + + - Require that a governance outcome is enacted at least X slots after the corresponding voting deadline. + Thus a ledger state can always translate at least X slots ahead of it. + +**{Iteration 2}**. +The RollbackInsensitiveTranslations requirement can be additionally supported with simple changes. + + - Refuse to translate a slot/civic time that is after the enactment of a governance outcome if using a ledger state that is less than one stability window after the corresponding voting deadline. + Just as with the third phase of the Praos epoch structure (which fixes the nonce for the next epoch), this one stability window buffer ensures --- via Chain Growth --- that there will be so many blocks after the deadline that no rollback could alter the governance outcome. + + - Accordingly require that X = one stability window + Y. + Thus a ledger state can always translate at least Y slots ahead of it. + +The MonotonicTranslations requirement is motivated by the following scenario. +The node may switch from a chain that is at least one stability window past the voting deadline to a chain that is not. +With such a switch, the node's translation for an argument after the outcome is enacted would switch from `Just t1` to `Nothing`, according to the above rules. +Due to Praos Chain Growth, the first chain must already have enough blocks to prevent rollbacks from reaching the voting deadline (since it returned `Just t1`). +And the selection rule would therefore ensure that the second chain would have at least as many blocks after the deadline. +So once the second chain grows past the threshold slot again, the translation would switch back from `Nothing` to `Just t1` (ie the same translation). +However, that intermittent `Nothing` is exactly what MonotonicTranslations prohibits. + +**{Iteration 3}**. +A pivot from slot counting to block counting can achieve this, since a chain switch can't rollback more than k blocks nor decrease the block count. + + - Keep X = one stability window + Y. + + - Refuse to translate a slot/civic time that is after the enactment of a governance outcome if using a ledger state that is less than k+1 blocks after the voting deadline. + +**{Iteration 4} (latest)**. +In order to satisfy the DespiteChainGrowthViolation requirement, today's HFC inlcudes a radical rule. + + - Silently ignore the on-chain governance --- ie the HFC continues with the current era _despite the on-chain governance outcome having signaled the transition to the next era_ --- if the stability window after the voting deadline contains less than k+1 blocks (ie violates Chain Growth). + (TODO this is today's intended behavior, but a bug is counting all blocks after the voting deadline instead of only those in the subsequent stability window.) + + - Keep X = one stability window + Y. + + - Refuse to translate a slot/civic time that is after the enactment of a governance outcome if using a ledger state that is both less than k+1 blocks after the voting deadline and also less than one stability window past the voting deadline. + Ledger states that are more than a stability window after the deadline but have fewer than k+1 blocks after the deadline do translations assuming the next epoch is in the same era (regardless of the actual on-chain governance outcome). + +Altogether, Iteration 4 ensures that the node will satisfy ImmutableTranslations, NonemptyForecastRange, and MonotonicTranslations as its selection grows without rolling back any blocks (thereby excluding RollbackInsensitiveTranslations), even if those extensions violate Praos Chain Growth. + +The Consensus Team would like to remove at least the DespiteChainGrowthViolation requirement and the corresponding possibility of ignoring the on-chain governance for the following reasons. + + - The occasional clarification to colleagues that "the HFC might override the on-chain governance" has always been met with (reasonable) alarm and confusion. + + - Even despite the extreme measure of overriding the on-chain governance, the HFC still is unable to ensure _all_ of its desiderata in the presence of a Chain Growth violation. + (It's not yet clear whether it would ever be possible to do so.) + + - It seems unlikely that the Ledger Team would agree that it is worthwhile to upstream the block counting logic such that Chain Growth violations prohibit the governance outcomes relevant to the HFC. + Making the on-chain governance itself detect Chain Growth violation (and specifying as much in the community documentation about governance) seems more reasonable than enabling HFC ledger states that are incoherent in surprising ways (ie increased major protocol version but still in the same era). + (For the record, the Consensus Team would be on board with this if others consider this option worthwhile.) + + - The rightmost column in the table above indicates that the most fundamental elements of the node (headers and blocks) inherently determine which ledger state must determine their time translations and therefore can freely rely on time translations (likely via a hybrid of Iteration 1 and Iteration 2) that vary depending on which ledger state is used. + + - The other node functions in that table (Mempool and user queries, for now) could prevent sensitivity to rollbacks by merely translating times according to the node's immutable tip ledger state instead of ever overriding the on-chain governance. + +It is not yet clear what all the resulting node behaviors would be during a Chain Growth violation --- nor whether it matters! +The node's behavior outside of the Praos security argument is very rarely discussed or even considered. diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js index 12e6a7c82a..da14732285 100644 --- a/docs/website/sidebars.js +++ b/docs/website/sidebars.js @@ -35,6 +35,8 @@ const sidebars = { 'for-developers/index', 'for-developers/Glossary', 'for-developers/ComponentDiagram', + 'for-developers/CardanoPraosBasics', + 'for-developers/CivicTime', 'for-developers/AbstractProtocol', 'for-developers/AddingAnEra', 'for-developers/ChainSync',