Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/control: Show last consensus height seen by block history #6013

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

martintomazic
Copy link
Contributor

@martintomazic martintomazic commented Jan 27, 2025

What & Why

#5998

How to test

  1. Delete data/runtime/[runtimeID] (this way we trigger reindex)
  2. oasis-node control status -a unix:path/to/data/internal.sock and observe whilst history reindex is happening, consensus height should be increasing e.g.:
      "storage": {
        "status": "initializing",
        "last_finalized_round": 18446744073709551615
      },
      "history": {
        "last_round": 8911898,
        "last_height": 25095662
      },

Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for oasisprotocol-oasis-core canceled.

Name Link
🔨 Latest commit ff6fd84
🔍 Latest deploy log https://app.netlify.com/sites/oasisprotocol-oasis-core/deploys/679ff518ed68a20008936982

Comment on lines 346 to 356
// Fetch last consensus height seen by block history.
height, err := rt.History().LastConsensusHeight()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful for oasis-node to report the current reindex round

Notice I report consensus height instead of runtime round. I would argue this is also more appropriate, since reindex iterates over the block height...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show the last height and the last round?
Would percent of indexed blocks help the node operators?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, all of these would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show the last height and the last round?

I have added those. Have a bit of a mixed feelings about percentage...

  1. Do we want to add it to consensus sync and storage syncing as well (to be consistent)?
  2. Should we only show it if during reindex/syncing?
  3. The value is not deterministic so we cannot expose it as status, [reindex/syncing]_progress: 90.1% looks good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we want to add it to consensus sync and storage syncing as well (to be consistent)?

I said it would be nice to have percentages, so that you know how far are you.

  1. Should we only show it if during reindex/syncing?

Probably no. If we could show 100%, that would mean that the syncing is not in progress.

  1. The value is not deterministic so we cannot expose it as status

Not sure what you are trying to say here.

Comment on lines 192 to 197
// BlockHistoryLastConsensusHeight is last consensus height seen by block history.
BlockHistoryLastConsensusHeight int64 `json:"block_history_last_consensus_height"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that the node operator could estimate when the reindex will finish.

I am not sure current node control status is helpful unless node operator is very familiar with oasis?

Hackish alternative may be to only show it if during "history reindex" status right below it, giving more context to the operator:

diff --git a/go/worker/common/api/api.go b/go/worker/common/api/api.go
index 41e0a46ff..0e050b421 100644
--- a/go/worker/common/api/api.go
+++ b/go/worker/common/api/api.go
@@ -107,6 +107,9 @@ type Status struct {
        // Status is a concise status of the committee node.
        Status StatusState `json:"status"`
 
+       // HistoryReindexHeight is last consensus height seen by block history.
+       HistoryReindexHeight *int64 `json:"history_reindex_height,omitempty"`
+
        // ActiveVersion is the currently active version.
        ActiveVersion *version.Version `json:"active_version"`
 
diff --git a/go/worker/common/committee/node.go b/go/worker/common/committee/node.go
index 9f06bcd15..668e43365 100644
--- a/go/worker/common/committee/node.go
+++ b/go/worker/common/committee/node.go
@@ -275,6 +275,20 @@ func (n *Node) GetStatus() (*api.Status, error) {
        var status api.Status
        status.Status = n.getStatusStateLocked()
 
+       // Include last consensus height if during history reindex.
+       if status.Status == api.StatusStateWaitingHistoryReindex {
+               height, err := n.Runtime.History().LastConsensusHeight()
+               if err != nil {
+                       n.logger.Warn("failed to retrieve last consensus height seen by block history",
+                               "err", err,
+                               "id", n.Runtime.ID(),
+                       )
+               } else {
+                       status.HistoryReindexHeight = &height
+
+               }
+       }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure current node control status is helpful unless node operator is very familiar with oasis?

The CLI can provide a nicer computed overlay from this information. But it would be good if the node could estimate the average reindex speed and then an ETA as the range of blocks to reindex is known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be good if the node could estimate the average reindex speed and then an ETA as the range of blocks to reindex is known.

I would prefer to tackle ETA in another PR here is why:

  1. consensus reindex >= storage sync >> history reindex (time needed for my testnet node to sync from start)
  2. ETA also depends if you are doing reindex during consensus sync or if you start with already synced node. Need to do some more experiments just to be sure...

@martintomazic martintomazic marked this pull request as ready for review January 27, 2025 23:34
@martintomazic martintomazic force-pushed the martin/feature/show-history-reindex-round branch from 47f3130 to 1368dfc Compare January 27, 2025 23:36
@@ -189,6 +189,9 @@ type RuntimeStatus struct {
// Storage contains the storage worker status in case this node is a storage node.
Storage *storageWorker.Status `json:"storage,omitempty"`

// BlockHistoryLastConsensusHeight is last consensus height seen by block history.
BlockHistoryLastConsensusHeight int64 `json:"block_history_last_consensus_height"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite long field name. I would try to find a shorter name or move it under history.
Should this field be grouped with LastRetainedRound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should shorten it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think the cleanest way would probably be to show last round and height as seen by the block history right below storage field. I am thinking about exposing this status from the roothash/api/BlockHistory interface (see):

      "latest_round": 10160924,
      "latest_hash": "85867c19bd6ad43521432e187962e3bb19a9731c79cd64f195821bfd90bd7b24",
      "latest_time": "2025-01-31T11:08:05+01:00",
      "latest_state_root": {
        "ns": "000000000000000000000000000000000000000000000000a6d1e3ebf60dff6c",
        "version": 10160924,
        "root_type": 1,
        "hash": "a914f8c8baf05a47757c1d7f7caecebdf614c8b3d1e20aa33d6855ec96a50ba1"
      },
      "genesis_round": 2995927,
      "genesis_hash": "c9f3ca654531b775d944c85e1f00e76944aaf5de1902acfa082ff76e852dba5e",
      "last_retained_round": 10145927,
      "last_retained_hash": "fad686c7eb077ec1a1a5a1ba1c27027a3788b003c94130e78c7dfb25a1d0e8cd",
      // other fields
      "storage": {
        "status": "syncing rounds",
        "last_finalized_round": 10160923
      },
      "history": {
        "last_round": 10160923,
        "last_height": 25058099
      },

Should this field be grouped with LastRetainedRound?

  1. LastRetainedRound is actually earliest (not latest) runtime block that is also synced to the storage.
    i. You may still get an error and show the latest retained round from the block history even if you don't have it synced to the storage... See. Is this a bug?
  2. History reindex is actually happening and finishes before storage round syncing so ideally this should be separate grouping.

I am thinking could we move last_retained_round and last_retained_hash to storage? This is breaking obviously. More importantly the storage status is received from the storage committee worker, whilst LastRetainedRound and hash are technically overwritten, directly from the runtime storage (see, described in 1.i). As from the docstring the former is synced + finalized, whilst if reading from the storage directly it is only guaranteed to be synced? Not sure what is the difference (if any)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

History reindex is actually happening and finishes before storage round syncing so ideally this should be separate grouping.

Maybe we show status of both. One part should probably be under storage, other under history. Although I'm not sure if users will understand the difference.

Should this field be grouped with LastRetainedRound?

The code which fetches these fields should also be together.

This is breaking

Breaking for who? I think it is not a problem if we change the status. If node operators depend on it, they can check the changelog and modify their code.

Comment on lines 346 to 356
// Fetch last consensus height seen by block history.
height, err := rt.History().LastConsensusHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show the last height and the last round?
Would percent of indexed blocks help the node operators?

Copy link

codecov bot commented Jan 28, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 64.64%. Comparing base (87c6072) to head (1368dfc).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
go/oasis-node/cmd/node/node_control.go 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6013      +/-   ##
==========================================
- Coverage   65.12%   64.64%   -0.49%     
==========================================
  Files         631      631              
  Lines       64419    64427       +8     
==========================================
- Hits        41956    41647     -309     
- Misses      17549    17867     +318     
+ Partials     4914     4913       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kostko kostko linked an issue Jan 28, 2025 that may be closed by this pull request
@martintomazic martintomazic force-pushed the martin/feature/show-history-reindex-round branch from 1368dfc to ff6fd84 Compare February 2, 2025 22:43
Comment on lines +236 to +243
// HistoryStatus is the status as observed by the block history.
type HistoryStatus struct {
// LastRound is the round of the latest block.
LastRound uint64 `json:"last_round"`
// LastHeight is the consensus height of the latest block.
LastHeight int64 `json:"last_height"`
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want a status like we have for storage? e.g. initializing, reindexing blocks, syncing blocks?

If so this logic would have to be moved to consensus roothash, or we add a method to query it. Albeit, feels weird adding this functionality to roothash/api/Backend...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want a status like we have for storage? e.g. initializing, reindexing blocks, syncing blocks?

Yes, I would like to have percentages (and maybe optional status).

From you example, I would like to have at least 3 fields, two that say which range is reindexBlocks working on (800 - 1000), and percentages of reindexed blocks (30/200 = 15%). When this is done, the node will see that too many blocks are still to be indexed, so it will update status by updating range (1000 - 1100) and percentage (0/100 = 0%).

Having field LastRound might be a problem when you start doing things in parallel, as some lower rounds might not be processed yet. Range might be better.

@@ -0,0 +1,8 @@
go/oasis_node: Add runtime block history status message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go/oasis_node: Add runtime block history status message
go/oasis-node: Add runtime block history status message

go/oasis_node: Add runtime block history status message

A new field `history` has been added to the `oasis-node control status`
output under the runtime status section. This field displays the latest round
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output under the runtime status section. This field displays the latest round
output under the runtime status section. This field displays the round

@@ -48,6 +48,8 @@ type BlockHistory interface {

// LastConsensusHeight returns the last consensus height which was seen
// by block history.
//
// This method can return height for block not yet synced to storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole comment is a bit confusing, as the first part says seen while the second says that the block (which one, consensus or runtime?) doesn't need to be synced.

Can we change to

Suggested change
// This method can return height for block not yet synced to storage.
// LastConsensusHeight returns the consensus height of the last committed block.

@@ -57,6 +59,7 @@ type BlockHistory interface {
GetCommittedBlock(ctx context.Context, round uint64) (*block.Block, error)

// GetBlock returns the block at a specific round.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be consistent and add new line in all places or nowhere.

Comment on lines 346 to 356
// Fetch last consensus height seen by block history.
height, err := rt.History().LastConsensusHeight()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Do we want to add it to consensus sync and storage syncing as well (to be consistent)?

I said it would be nice to have percentages, so that you know how far are you.

  1. Should we only show it if during reindex/syncing?

Probably no. If we could show 100%, that would mean that the syncing is not in progress.

  1. The value is not deterministic so we cannot expose it as status

Not sure what you are trying to say here.

@@ -189,6 +189,9 @@ type RuntimeStatus struct {
// Storage contains the storage worker status in case this node is a storage node.
Storage *storageWorker.Status `json:"storage,omitempty"`

// BlockHistoryLastConsensusHeight is last consensus height seen by block history.
BlockHistoryLastConsensusHeight int64 `json:"block_history_last_consensus_height"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

History reindex is actually happening and finishes before storage round syncing so ideally this should be separate grouping.

Maybe we show status of both. One part should probably be under storage, other under history. Although I'm not sure if users will understand the difference.

Should this field be grouped with LastRetainedRound?

The code which fetches these fields should also be together.

This is breaking

Breaking for who? I think it is not a problem if we change the status. If node operators depend on it, they can check the changelog and modify their code.

Comment on lines +236 to +243
// HistoryStatus is the status as observed by the block history.
type HistoryStatus struct {
// LastRound is the round of the latest block.
LastRound uint64 `json:"last_round"`
// LastHeight is the consensus height of the latest block.
LastHeight int64 `json:"last_height"`
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want a status like we have for storage? e.g. initializing, reindexing blocks, syncing blocks?

Yes, I would like to have percentages (and maybe optional status).

From you example, I would like to have at least 3 fields, two that say which range is reindexBlocks working on (800 - 1000), and percentages of reindexed blocks (30/200 = 15%). When this is done, the node will see that too many blocks are still to be indexed, so it will update status by updating range (1000 - 1100) and percentage (0/100 = 0%).

Having field LastRound might be a problem when you start doing things in parallel, as some lower rounds might not be processed yet. Range might be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

control status: Show current history reindex round number
3 participants