Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate and compare CRC when writing and reading ledger snapshots #1319

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

geo2a
Copy link
Contributor

@geo2a geo2a commented Nov 21, 2024

Description

Fixes #892

In this PR, we change the reading and writing disk snapshots of ledger state. When a snapshot is taken and written to disk, an additional file with the .checksum extension is written alongside it. The checksum file contains a string that represent the CRC32 checksum of the snapshot.

The checksum is calculated incrementally, alongside writing the snapshot to disk. When a snapshot is read from dist, the checksum is again calculated and compared to the tracked one. If the checksum is different, readSnaphot returns the ReadSnapshotDataCorruption error, indicating data corruption.

The checksum is calculated incrementally, alongside reading a writing the data. On write, we use the [hPutAllCRC](https://input-output-hk.github.io/fs-sim/fs-api/src/System.FS.CRC.html#hPutAllCRC) function from fs-sim, and on read we modify the readIncremental function to compute the checksum as data is read.

To enable seamless integration into cardano-node, we make the check optional. When initialising the ledger state from a snapshot in initLedgerDB, we issue a warning in case the checksum file is missing for a snapshot, but do not fail as in case of invalid snapshots.

The db-analyser tool ignores the checksum files by default when reading the snapshots. We add --disk-snapshot-checksum flag to enabled the check. When writing a snapshot to disk, e.g. as a result of the --store-ledger analysis, db-analyser will always write calculate the checksum and write it into the snapshot's .checksum file.

Effects on Performance:

Running db-analyser to read a ledger snapshot and store the snapshot of the state at the next slot shows a difference of 2 seconds on my machine. See a comment below for the logs.

To precisely evaluate the effects, we need a micro-benchmark of the reading and writing of snapshots with and without the checksum calculation.

@geo2a geo2a force-pushed the 892-checksum-snaphot-file branch 2 times, most recently from fe08707 to b001c90 Compare November 28, 2024 08:14
@geo2a geo2a force-pushed the 892-checksum-snaphot-file branch 2 times, most recently from 28e8320 to 31b892a Compare November 29, 2024 11:16
@geo2a
Copy link
Contributor Author

geo2a commented Nov 29, 2024

I've compared the performance of reading and then writing a snapshot using db-analyser. The feature branch seems to be 2s slower, but it is not clear if it is significant.

I would like to spend some more time looking at the microbenchmarks that we currently have and maybe adding one for reading-writing ledger state snapshots.

Logs of `db-analyser` with timings

Feature 101.325841s

time $(cabal list-bin db-analyser) --db ~/Workspace/IOG/cardano-node-data/mainnet/state/mainnet-db cardano --config ~/Workspace/IOG/cardano-node-data/mainnet/mainnet-config.json --analyse-from 112526498 --store-ledger 112526499 --verbose
[0.459957s] TraceImmutableDBEvent (ChunkValidationEvent (StartedValidatingChunk 5285 5285))
[0.869950s] TraceImmutableDBEvent (ChunkValidationEvent (ValidatedChunk 5285 5285))
[0.871115s] TraceImmutableDBEvent (ValidatedLastLocation 5285 (Tip {tipSlotNo = SlotNo 114172812, tipIsEBB = IsNotEBB, tipBlockNo = BlockNo 9827240, tipHash = fcb4668176a65b29cf64c31a0bf7c4d4558f61d17774c4f9c3c1366435be8a7f}))
[59.829891s] Started StoreLedgerStateAt (SlotNo 112526499) LedgerReapply
[60.150611s] TraceImmutableDBEvent (TraceCacheEvent (TraceCurrentChunkHit 5285 0))
[60.150929s] TraceImmutableDBEvent (TraceCacheEvent (TraceCurrentChunkHit 5285 0))
[60.151203s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkMiss 5209 0))
[60.179937s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkHit 5209 1))
[60.180243s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkHit 5209 1))
[101.325411s] Snapshot stored at SlotNo 112526500
[101.325568s] Snapshot was created at SlotNo 112526500 because there was no block forged at requested SlotNo 112526499
[101.325841s] Done
ImmutableDB tip: At (Block {blockPointSlot = SlotNo 114172812, blockPointHash = fcb4668176a65b29cf64c31a0bf7c4d4558f61d17774c4f9c3c1366435be8a7f})
[101.325945s] TraceImmutableDBEvent DBClosed
$(cabal list-bin db-analyser) --db  cardano --config  --analyse-from 11252649  102.92s user 6.79s system 107% cpu 1:41.95 total

Main 99.685827s

 time $(cabal list-bin db-analyser) --db ~/Workspace/IOG/cardano-node-data/mainnet/state/mainnet-db cardano --config ~/Workspace/IOG/cardano-node-data/mainnet/mainnet-config.json --analyse-from 112526498 --store-ledger 112526499 --verbose
[0.466244s] TraceImmutableDBEvent (ChunkValidationEvent (StartedValidatingChunk 5285 5285))
[0.879898s] TraceImmutableDBEvent (ChunkValidationEvent (ValidatedChunk 5285 5285))
[0.881057s] TraceImmutableDBEvent (ValidatedLastLocation 5285 (Tip {tipSlotNo = SlotNo 114172812, tipIsEBB = IsNotEBB, tipBlockNo = BlockNo 9827240, tipHash = fcb4668176a65b29cf64c31a0bf7c4d4558f61d17774c4f9c3c1366435be8a7f}))
[57.514297s] Started StoreLedgerStateAt (SlotNo 112526499) LedgerReapply
[57.837578s] TraceImmutableDBEvent (TraceCacheEvent (TraceCurrentChunkHit 5285 0))
[57.837830s] TraceImmutableDBEvent (TraceCacheEvent (TraceCurrentChunkHit 5285 0))
[57.838031s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkMiss 5209 0))
[57.858723s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkHit 5209 1))
[57.858963s] TraceImmutableDBEvent (TraceCacheEvent (TracePastChunkHit 5209 1))
[99.685293s] Snapshot stored at SlotNo 112526500
[99.685447s] Snapshot was created at SlotNo 112526500 because there was no block forged at requested SlotNo 112526499
[99.685721s] Done
ImmutableDB tip: At (Block {blockPointSlot = SlotNo 114172812, blockPointHash = fcb4668176a65b29cf64c31a0bf7c4d4558f61d17774c4f9c3c1366435be8a7f})
[99.685827s] TraceImmutableDBEvent DBClosed
$(cabal list-bin db-analyser) --db  cardano --config  --analyse-from 11252649  102.60s user 6.13s system 108% cpu 1:40.31 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT] - checksum when deserializing the ledger snapshot file
4 participants