-
Notifications
You must be signed in to change notification settings - Fork 66
Blockchain transactions file compression #1919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Total size of this file can be approximately reduced by 14-15%.
Estimation was removed because its results are not accurate.
This reverts commit 3756efd.
This reverts commit ec4e774.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements compression for serialized blockchain transactions to reduce file size by approximately 18%, while maintaining backward compatibility. The implementation uses the minlz compression library with a bit-flag in the transaction size field to indicate compression status.
Key Changes:
- Added transaction compression using minlz with automatic fallback to uncompressed storage when compression doesn't reduce size
- Modified
writeTransactionto return the number of bytes written for accurate offset tracking - Implemented backward-compatible size encoding using the highest bit as a compression flag
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/state/blockreadwriter.go | Core compression logic including encoding/decoding, compression/decompression functions, and updated read/write methods |
| pkg/state/snapshot_applier.go | Updated to handle new return value from writeTransaction |
| pkg/state/common_test.go | Updated test helper to handle new return value from writeTransaction |
| pkg/state/address_transactions_test.go | Updated test calls to handle new return value from writeTransaction |
| pkg/state/blockreadwriter_test.go | Refactored writeBlocks helper to use returned byte count instead of manual calculation, removed unused protobuf parameter |
| go.mod | Added dependencies: minlz v1.0.1 for compression, updated golang/snappy to v1.0.0 |
| go.sum | Updated checksums for new and updated dependencies |
| .golangci-strict.yml | Added replace-allow-list for syndtr/goleveldb dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return 0, errors.New("ReadAt did not read 4 bytes") | ||
| return 0, false, err | ||
| } else if n != uint32Size { | ||
| return 0, false, errors.New("ReadAt did not read 4 bytes") |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is outdated and no longer matches the code. The error states "ReadAt did not read 4 bytes", but the code now uses uint32Size constant instead of the hardcoded value. The error message should be updated to reflect this:
return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)Alternatively, if uint32Size is always 4, you could keep "4" but it's better to be consistent with the code.
| return 0, false, errors.New("ReadAt did not read 4 bytes") | |
| return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, why not to apply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do later
| // Append tx size at the beginning reusing txBytes slice. | ||
| txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize) |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Reusing the input slice by resetting its length (txBytes[:0]) may lead to unexpected behavior. While this appears safe in the current call chain (where txBytes comes from fresh marshal operations), it creates a fragile dependency on the calling context. If the caller retains a reference to txBytes, the underlying array will be modified.
Consider documenting this behavior or using a safer approach:
// Option 1: Allocate new slice (safer, slight performance cost)
txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize)
// Option 2: Add documentation
// Note: This function may reuse the input txBytes slice's underlying array for optimization.| // Append tx size at the beginning reusing txBytes slice. | |
| txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize) | |
| // Append tx size at the beginning using a new slice to avoid modifying the input slice. | |
| txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize) |
| func writeTransactionUncompressed(txBytes []byte) ([]byte, error) { | ||
| var ( | ||
| txBytesTotal = txBytes | ||
| txBytesTotalSize = len(txBytes) + uint32Size | ||
| ) | ||
| if cap(txBytesTotal) < txBytesTotalSize { | ||
| txBytesTotal = make([]byte, txBytesTotalSize) | ||
| } | ||
| txBytesTotal = txBytesTotal[:txBytesTotalSize] | ||
| txSizeU32, err := safecast.Convert[uint32](len(txBytes)) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| encodedSize, err := encodeSize(txSizeU32, false) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| copy(txBytesTotal[uint32Size:], txBytes) // copy original data | ||
| binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize) // put size at the beginning | ||
| return txBytesTotal, nil | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential data corruption issue when reusing the input slice. If cap(txBytesTotal) >= txBytesTotalSize, the function attempts to reuse the input txBytes slice, then copies it to itself at line 383 (copy(txBytesTotal[uint32Size:], txBytes)). Since txBytesTotal and txBytes are the same slice in this case, this creates overlapping source and destination regions, which can lead to data corruption.
Consider always allocating a new slice or ensuring the copy is safe:
txBytesTotal := make([]byte, txBytesTotalSize)
copy(txBytesTotal[uint32Size:], txBytes)
binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize)
return txBytesTotal, nil
pkg/state/blockreadwriter.go
Outdated
| const maxUint32BitMask = math.MaxUint32 - math.MaxInt32 | ||
|
|
||
| func encodeSize(size uint32, compressed bool) (uint32, error) { | ||
| if size > math.MaxInt32 { | ||
| return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32) | ||
| } | ||
| if compressed { | ||
| return size | maxUint32BitMask, nil | ||
| } | ||
| return size, nil | ||
| } | ||
|
|
||
| func decodeSize(encodedSize uint32) (uint32, bool) { | ||
| if encodedSize&maxUint32BitMask != 0 { | ||
| size := encodedSize &^ maxUint32BitMask // clear highest bit |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bit mask calculation is incorrect. maxUint32BitMask = math.MaxUint32 - math.MaxInt32 equals 2147483648 (0x80000000), which is the highest bit of a uint32. However, the naming suggests this is meant to be a mask for the "max uint32" range, which is confusing.
The logic appears to work correctly for setting/checking the highest bit to indicate compression, but the constant name and calculation method are misleading. Consider:
const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compressionThis makes the intent clearer: the highest bit is a flag for compression status.
| const maxUint32BitMask = math.MaxUint32 - math.MaxInt32 | |
| func encodeSize(size uint32, compressed bool) (uint32, error) { | |
| if size > math.MaxInt32 { | |
| return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32) | |
| } | |
| if compressed { | |
| return size | maxUint32BitMask, nil | |
| } | |
| return size, nil | |
| } | |
| func decodeSize(encodedSize uint32) (uint32, bool) { | |
| if encodedSize&maxUint32BitMask != 0 { | |
| size := encodedSize &^ maxUint32BitMask // clear highest bit | |
| const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compression | |
| func encodeSize(size uint32, compressed bool) (uint32, error) { | |
| if size > math.MaxInt32 { | |
| return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32) | |
| } | |
| if compressed { | |
| return size | compressionBitMask, nil | |
| } | |
| return size, nil | |
| } | |
| func decodeSize(encodedSize uint32) (uint32, bool) { | |
| if encodedSize&compressionBitMask != 0 { | |
| size := encodedSize &^ compressionBitMask // clear highest bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bb := bytebufferpool.Get() | ||
| defer bytebufferpool.Put(bb) | ||
| var ( | ||
| err error | ||
| res = make(proto.Transactions, 0, count) | ||
| ) | ||
| for range count { | ||
| if len(data) < uint32Size { | ||
| return nil, errors.New("invalid tx size: insufficient bytes for size prefix") | ||
| } | ||
| size, compressed := decodeSize(binary.BigEndian.Uint32(data[0:uint32Size])) | ||
| if int(size)+uint32Size > len(data) { | ||
| return nil, errors.New("invalid tx size: exceeds bytes slice bounds") | ||
| } | ||
| txBytes := data[uint32Size : size+uint32Size] | ||
| if compressed { | ||
| bb.B = bb.B[:cap(bb.B)] // expand length to capacity | ||
| bb.B, err = minlz.Decode(bb.B, txBytes) | ||
| if err != nil { | ||
| return nil, errors.Wrap(err, "failed to decompress transaction bytes") | ||
| } | ||
| txBytes = bb.B | ||
| } | ||
| tx, txErr := txUnmarshal(txBytes, scheme) | ||
| if txErr != nil { | ||
| return nil, errors.Wrap(txErr, "failed to deserialize transaction bytes") | ||
| } | ||
| res = append(res, tx) | ||
| data = data[uint32Size+size:] | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new compression/decompression logic in unmarshalTransactions lacks dedicated test coverage. While existing tests will exercise this code path, there are no tests that specifically verify: 1) compression bit encoding/decoding works correctly, 2) compressed and uncompressed transactions can be read back correctly, 3) backward compatibility with uncompressed data, and 4) the fallback to uncompressed when compression doesn't reduce size. Consider adding unit tests for encodeSize/decodeSize and integration tests that verify both compressed and uncompressed transaction storage.
| return 0, errors.New("ReadAt did not read 4 bytes") | ||
| return 0, false, err | ||
| } else if n != uint32Size { | ||
| return 0, false, errors.New("ReadAt did not read 4 bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, why not to apply?
Reduces the file size with serialized transactions by approximately 18%. Changes are backward compatible.