Skip to content

Conversation

@nickeskov
Copy link
Collaborator

Reduces the file size with serialized transactions by approximately 18%. Changes are backward compatible.

This reverts commit ec4e774.
Copilot finished reviewing on behalf of nickeskov November 28, 2025 13:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements compression for serialized blockchain transactions to reduce file size by approximately 18%, while maintaining backward compatibility. The implementation uses the minlz compression library with a bit-flag in the transaction size field to indicate compression status.

Key Changes:

  • Added transaction compression using minlz with automatic fallback to uncompressed storage when compression doesn't reduce size
  • Modified writeTransaction to return the number of bytes written for accurate offset tracking
  • Implemented backward-compatible size encoding using the highest bit as a compression flag

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/state/blockreadwriter.go Core compression logic including encoding/decoding, compression/decompression functions, and updated read/write methods
pkg/state/snapshot_applier.go Updated to handle new return value from writeTransaction
pkg/state/common_test.go Updated test helper to handle new return value from writeTransaction
pkg/state/address_transactions_test.go Updated test calls to handle new return value from writeTransaction
pkg/state/blockreadwriter_test.go Refactored writeBlocks helper to use returned byte count instead of manual calculation, removed unused protobuf parameter
go.mod Added dependencies: minlz v1.0.1 for compression, updated golang/snappy to v1.0.0
go.sum Updated checksums for new and updated dependencies
.golangci-strict.yml Added replace-allow-list for syndtr/goleveldb dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return 0, errors.New("ReadAt did not read 4 bytes")
return 0, false, err
} else if n != uint32Size {
return 0, false, errors.New("ReadAt did not read 4 bytes")
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is outdated and no longer matches the code. The error states "ReadAt did not read 4 bytes", but the code now uses uint32Size constant instead of the hardcoded value. The error message should be updated to reflect this:

return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)

Alternatively, if uint32Size is always 4, you could keep "4" but it's better to be consistent with the code.

Suggested change
return 0, false, errors.New("ReadAt did not read 4 bytes")
return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, why not to apply?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do later

Comment on lines +411 to +412
// Append tx size at the beginning reusing txBytes slice.
txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize)
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Reusing the input slice by resetting its length (txBytes[:0]) may lead to unexpected behavior. While this appears safe in the current call chain (where txBytes comes from fresh marshal operations), it creates a fragile dependency on the calling context. If the caller retains a reference to txBytes, the underlying array will be modified.

Consider documenting this behavior or using a safer approach:

// Option 1: Allocate new slice (safer, slight performance cost)
txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize)

// Option 2: Add documentation
// Note: This function may reuse the input txBytes slice's underlying array for optimization.
Suggested change
// Append tx size at the beginning reusing txBytes slice.
txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize)
// Append tx size at the beginning using a new slice to avoid modifying the input slice.
txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize)

Copilot uses AI. Check for mistakes.
Comment on lines +366 to +386
func writeTransactionUncompressed(txBytes []byte) ([]byte, error) {
var (
txBytesTotal = txBytes
txBytesTotalSize = len(txBytes) + uint32Size
)
if cap(txBytesTotal) < txBytesTotalSize {
txBytesTotal = make([]byte, txBytesTotalSize)
}
txBytesTotal = txBytesTotal[:txBytesTotalSize]
txSizeU32, err := safecast.Convert[uint32](len(txBytes))
if err != nil {
return nil, err
}
encodedSize, err := encodeSize(txSizeU32, false)
if err != nil {
return nil, err
}
copy(txBytesTotal[uint32Size:], txBytes) // copy original data
binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize) // put size at the beginning
return txBytesTotal, nil
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential data corruption issue when reusing the input slice. If cap(txBytesTotal) >= txBytesTotalSize, the function attempts to reuse the input txBytes slice, then copies it to itself at line 383 (copy(txBytesTotal[uint32Size:], txBytes)). Since txBytesTotal and txBytes are the same slice in this case, this creates overlapping source and destination regions, which can lead to data corruption.

Consider always allocating a new slice or ensuring the copy is safe:

txBytesTotal := make([]byte, txBytesTotalSize)
copy(txBytesTotal[uint32Size:], txBytes)
binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize)
return txBytesTotal, nil

Copilot uses AI. Check for mistakes.
Comment on lines 345 to 359
const maxUint32BitMask = math.MaxUint32 - math.MaxInt32

func encodeSize(size uint32, compressed bool) (uint32, error) {
if size > math.MaxInt32 {
return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)
}
if compressed {
return size | maxUint32BitMask, nil
}
return size, nil
}

func decodeSize(encodedSize uint32) (uint32, bool) {
if encodedSize&maxUint32BitMask != 0 {
size := encodedSize &^ maxUint32BitMask // clear highest bit
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bit mask calculation is incorrect. maxUint32BitMask = math.MaxUint32 - math.MaxInt32 equals 2147483648 (0x80000000), which is the highest bit of a uint32. However, the naming suggests this is meant to be a mask for the "max uint32" range, which is confusing.

The logic appears to work correctly for setting/checking the highest bit to indicate compression, but the constant name and calculation method are misleading. Consider:

const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compression

This makes the intent clearer: the highest bit is a flag for compression status.

Suggested change
const maxUint32BitMask = math.MaxUint32 - math.MaxInt32
func encodeSize(size uint32, compressed bool) (uint32, error) {
if size > math.MaxInt32 {
return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)
}
if compressed {
return size | maxUint32BitMask, nil
}
return size, nil
}
func decodeSize(encodedSize uint32) (uint32, bool) {
if encodedSize&maxUint32BitMask != 0 {
size := encodedSize &^ maxUint32BitMask // clear highest bit
const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compression
func encodeSize(size uint32, compressed bool) (uint32, error) {
if size > math.MaxInt32 {
return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)
}
if compressed {
return size | compressionBitMask, nil
}
return size, nil
}
func decodeSize(encodedSize uint32) (uint32, bool) {
if encodedSize&compressionBitMask != 0 {
size := encodedSize &^ compressionBitMask // clear highest bit

Copilot uses AI. Check for mistakes.
@nickeskov nickeskov requested a review from Copilot November 28, 2025 14:34
Copilot finished reviewing on behalf of nickeskov November 28, 2025 14:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +768 to +797
bb := bytebufferpool.Get()
defer bytebufferpool.Put(bb)
var (
err error
res = make(proto.Transactions, 0, count)
)
for range count {
if len(data) < uint32Size {
return nil, errors.New("invalid tx size: insufficient bytes for size prefix")
}
size, compressed := decodeSize(binary.BigEndian.Uint32(data[0:uint32Size]))
if int(size)+uint32Size > len(data) {
return nil, errors.New("invalid tx size: exceeds bytes slice bounds")
}
txBytes := data[uint32Size : size+uint32Size]
if compressed {
bb.B = bb.B[:cap(bb.B)] // expand length to capacity
bb.B, err = minlz.Decode(bb.B, txBytes)
if err != nil {
return nil, errors.Wrap(err, "failed to decompress transaction bytes")
}
txBytes = bb.B
}
tx, txErr := txUnmarshal(txBytes, scheme)
if txErr != nil {
return nil, errors.Wrap(txErr, "failed to deserialize transaction bytes")
}
res = append(res, tx)
data = data[uint32Size+size:]
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new compression/decompression logic in unmarshalTransactions lacks dedicated test coverage. While existing tests will exercise this code path, there are no tests that specifically verify: 1) compression bit encoding/decoding works correctly, 2) compressed and uncompressed transactions can be read back correctly, 3) backward compatibility with uncompressed data, and 4) the fallback to uncompressed when compression doesn't reduce size. Consider adding unit tests for encodeSize/decodeSize and integration tests that verify both compressed and uncompressed transaction storage.

Copilot uses AI. Check for mistakes.
return 0, errors.New("ReadAt did not read 4 bytes")
return 0, false, err
} else if n != uint32Size {
return 0, false, errors.New("ReadAt did not read 4 bytes")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, why not to apply?

@nickeskov nickeskov merged commit e911565 into master Dec 1, 2025
21 checks passed
@nickeskov nickeskov deleted the blockchain-transactions-file-compression branch December 1, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants