Blockchain transactions file compression #1919

nickeskov · 2025-11-28T13:47:42Z

Reduces the file size with serialized transactions by approximately 18%. Changes are backward compatible.

Total size of this file can be approximately reduced by 14-15%.

Estimation was removed because its results are not accurate.

This reverts commit 3756efd.

This reverts commit ec4e774.

Copilot

Pull request overview

This PR implements compression for serialized blockchain transactions to reduce file size by approximately 18%, while maintaining backward compatibility. The implementation uses the minlz compression library with a bit-flag in the transaction size field to indicate compression status.

Key Changes:

Added transaction compression using minlz with automatic fallback to uncompressed storage when compression doesn't reduce size
Modified writeTransaction to return the number of bytes written for accurate offset tracking
Implemented backward-compatible size encoding using the highest bit as a compression flag

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pkg/state/blockreadwriter.go	Core compression logic including encoding/decoding, compression/decompression functions, and updated read/write methods
pkg/state/snapshot_applier.go	Updated to handle new return value from writeTransaction
pkg/state/common_test.go	Updated test helper to handle new return value from writeTransaction
pkg/state/address_transactions_test.go	Updated test calls to handle new return value from writeTransaction
pkg/state/blockreadwriter_test.go	Refactored writeBlocks helper to use returned byte count instead of manual calculation, removed unused protobuf parameter
go.mod	Added dependencies: minlz v1.0.1 for compression, updated golang/snappy to v1.0.0
go.sum	Updated checksums for new and updated dependencies
.golangci-strict.yml	Added replace-allow-list for syndtr/goleveldb dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-28T13:53:51Z

pkg/state/blockreadwriter.go

-		return 0, errors.New("ReadAt did not read 4 bytes")
+		return 0, false, err
+	} else if n != uint32Size {
+		return 0, false, errors.New("ReadAt did not read 4 bytes")


The error message is outdated and no longer matches the code. The error states "ReadAt did not read 4 bytes", but the code now uses uint32Size constant instead of the hardcoded value. The error message should be updated to reflect this:

return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)

Alternatively, if uint32Size is always 4, you could keep "4" but it's better to be consistent with the code.

Suggested change

return 0, false, errors.New("ReadAt did not read 4 bytes")

return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)

Good suggestion, why not to apply?

Will do later

Copilot · 2025-11-28T13:53:51Z

pkg/state/blockreadwriter.go

+	// Append tx size at the beginning reusing txBytes slice.
+	txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize)


[nitpick] Reusing the input slice by resetting its length (txBytes[:0]) may lead to unexpected behavior. While this appears safe in the current call chain (where txBytes comes from fresh marshal operations), it creates a fragile dependency on the calling context. If the caller retains a reference to txBytes, the underlying array will be modified.

Consider documenting this behavior or using a safer approach:

// Option 1: Allocate new slice (safer, slight performance cost) txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize) // Option 2: Add documentation // Note: This function may reuse the input txBytes slice's underlying array for optimization.

Suggested change

// Append tx size at the beginning reusing txBytes slice.

txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize)

// Append tx size at the beginning using a new slice to avoid modifying the input slice.

txBytesTotal := binary.BigEndian.AppendUint32(nil, encodedSize)

pkg/state/blockreadwriter.go

Copilot · 2025-11-28T13:53:52Z

pkg/state/blockreadwriter.go

+func writeTransactionUncompressed(txBytes []byte) ([]byte, error) {
+	var (
+		txBytesTotal     = txBytes
+		txBytesTotalSize = len(txBytes) + uint32Size
+	)
+	if cap(txBytesTotal) < txBytesTotalSize {
+		txBytesTotal = make([]byte, txBytesTotalSize)
+	}
+	txBytesTotal = txBytesTotal[:txBytesTotalSize]
+	txSizeU32, err := safecast.Convert[uint32](len(txBytes))
+	if err != nil {
+		return nil, err
+	}
+	encodedSize, err := encodeSize(txSizeU32, false)
+	if err != nil {
+		return nil, err
+	}
+	copy(txBytesTotal[uint32Size:], txBytes)                           // copy original data
+	binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize) // put size at the beginning
+	return txBytesTotal, nil
+}


Potential data corruption issue when reusing the input slice. If cap(txBytesTotal) >= txBytesTotalSize, the function attempts to reuse the input txBytes slice, then copies it to itself at line 383 (copy(txBytesTotal[uint32Size:], txBytes)). Since txBytesTotal and txBytes are the same slice in this case, this creates overlapping source and destination regions, which can lead to data corruption.

Consider always allocating a new slice or ensuring the copy is safe:

txBytesTotal := make([]byte, txBytesTotalSize) copy(txBytesTotal[uint32Size:], txBytes) binary.BigEndian.PutUint32(txBytesTotal[:uint32Size], encodedSize) return txBytesTotal, nil

Copilot · 2025-11-28T13:53:52Z

pkg/state/blockreadwriter.go

+const maxUint32BitMask = math.MaxUint32 - math.MaxInt32
+
+func encodeSize(size uint32, compressed bool) (uint32, error) {
+	if size > math.MaxInt32 {
+		return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)
+	}
+	if compressed {
+		return size | maxUint32BitMask, nil
+	}
+	return size, nil
+}
+
+func decodeSize(encodedSize uint32) (uint32, bool) {
+	if encodedSize&maxUint32BitMask != 0 {
+		size := encodedSize &^ maxUint32BitMask // clear highest bit


The bit mask calculation is incorrect. maxUint32BitMask = math.MaxUint32 - math.MaxInt32 equals 2147483648 (0x80000000), which is the highest bit of a uint32. However, the naming suggests this is meant to be a mask for the "max uint32" range, which is confusing.

The logic appears to work correctly for setting/checking the highest bit to indicate compression, but the constant name and calculation method are misleading. Consider:

const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compression

This makes the intent clearer: the highest bit is a flag for compression status.

Suggested change

const maxUint32BitMask = math.MaxUint32 - math.MaxInt32

func encodeSize(size uint32, compressed bool) (uint32, error) {

if size > math.MaxInt32 {

return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)

}

if compressed {

return size | maxUint32BitMask, nil

}

return size, nil

}

func decodeSize(encodedSize uint32) (uint32, bool) {

if encodedSize&maxUint32BitMask != 0 {

size := encodedSize &^ maxUint32BitMask // clear highest bit

const compressionBitMask = 1 << 31 // 0x80000000 - highest bit indicates compression

func encodeSize(size uint32, compressed bool) (uint32, error) {

if size > math.MaxInt32 {

return 0, errors.Errorf("size overflow: %d > %d", size, math.MaxInt32)

}

if compressed {

return size | compressionBitMask, nil

}

return size, nil

}

func decodeSize(encodedSize uint32) (uint32, bool) {

if encodedSize&compressionBitMask != 0 {

size := encodedSize &^ compressionBitMask // clear highest bit

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-28T14:39:00Z

pkg/state/blockreadwriter.go

+	bb := bytebufferpool.Get()
+	defer bytebufferpool.Put(bb)
+	var (
+		err error
+		res = make(proto.Transactions, 0, count)
+	)
+	for range count {
+		if len(data) < uint32Size {
+			return nil, errors.New("invalid tx size: insufficient bytes for size prefix")
+		}
+		size, compressed := decodeSize(binary.BigEndian.Uint32(data[0:uint32Size]))
+		if int(size)+uint32Size > len(data) {
+			return nil, errors.New("invalid tx size: exceeds bytes slice bounds")
+		}
+		txBytes := data[uint32Size : size+uint32Size]
+		if compressed {
+			bb.B = bb.B[:cap(bb.B)] // expand length to capacity
+			bb.B, err = minlz.Decode(bb.B, txBytes)
+			if err != nil {
+				return nil, errors.Wrap(err, "failed to decompress transaction bytes")
+			}
+			txBytes = bb.B
+		}
+		tx, txErr := txUnmarshal(txBytes, scheme)
+		if txErr != nil {
+			return nil, errors.Wrap(txErr, "failed to deserialize transaction bytes")
+		}
+		res = append(res, tx)
+		data = data[uint32Size+size:]
+	}


The new compression/decompression logic in unmarshalTransactions lacks dedicated test coverage. While existing tests will exercise this code path, there are no tests that specifically verify: 1) compression bit encoding/decoding works correctly, 2) compressed and uncompressed transactions can be read back correctly, 3) backward compatibility with uncompressed data, and 4) the fallback to uncompressed when compression doesn't reduce size. Consider adding unit tests for encodeSize/decodeSize and integration tests that verify both compressed and uncompressed transaction storage.

alexeykiselev · 2025-12-01T07:00:00Z

pkg/state/blockreadwriter.go

-		return 0, errors.New("ReadAt did not read 4 bytes")
+		return 0, false, err
+	} else if n != uint32Size {
+		return 0, false, errors.New("ReadAt did not read 4 bytes")


Good suggestion, why not to apply?

nickeskov added 9 commits November 28, 2025 16:38

Change leveldb compression to zstd.

6903c17

Fix lint.

ec4e774

Add snappy compression for blockchain file.

8dbca0b

Total size of this file can be approximately reduced by 14-15%.

Fix tests.

8cbda11

Run 'go mod tidy'.

89717af

Combine comressed and uncompressed data in blockchain file.

da6c273

Removed 'MaxEncodedLen' estimation.

811cc29

Estimation was removed because its results are not accurate.

Revert "Change leveldb compression to zstd."

e245b22

This reverts commit 3756efd.

Changed compression for blockchain file to 'minlz' algo.

9028247

nickeskov requested review from alexeykiselev and Copilot November 28, 2025 13:47

nickeskov added the enhancement New feature or request label Nov 28, 2025

Copilot started reviewing on behalf of nickeskov November 28, 2025 13:48 View session

Revert "Fix lint."

ccf0f6b

This reverts commit ec4e774.

Copilot finished reviewing on behalf of nickeskov November 28, 2025 13:52

Copilot AI reviewed Nov 28, 2025

View reviewed changes

Fix some issues.

739ab12

nickeskov requested a review from Copilot November 28, 2025 14:34

Copilot started reviewing on behalf of nickeskov November 28, 2025 14:35 View session

Copilot finished reviewing on behalf of nickeskov November 28, 2025 14:37

Copilot AI reviewed Nov 28, 2025

View reviewed changes

Merge branch 'master' into blockchain-transactions-file-compression

2f8c38f

alexeykiselev approved these changes Dec 1, 2025

View reviewed changes

alexeykiselev temporarily deployed to ITests December 1, 2025 07:42 — with GitHub Actions Inactive

nickeskov merged commit e911565 into master Dec 1, 2025
21 checks passed

nickeskov deleted the blockchain-transactions-file-compression branch December 1, 2025 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blockchain transactions file compression #1919

Blockchain transactions file compression #1919

Uh oh!

nickeskov commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

alexeykiselev Dec 1, 2025

Uh oh!

nickeskov Dec 1, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 28, 2025

Uh oh!

alexeykiselev Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	return 0, false, errors.New("ReadAt did not read 4 bytes")
	return 0, false, errors.Errorf("ReadAt did not read %d bytes", uint32Size)

		// Append tx size at the beginning reusing txBytes slice.
		txBytesTotal := binary.BigEndian.AppendUint32(txBytes[:0], encodedSize)

Blockchain transactions file compression #1919

Blockchain transactions file compression #1919

Uh oh!

Conversation

nickeskov commented Nov 28, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

alexeykiselev Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

nickeskov Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

alexeykiselev Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants