fix: O(1) write performance via append-only serialization & optional history pruning#30
Open
trbielec wants to merge 5 commits intosupabase:masterfrom
Open
fix: O(1) write performance via append-only serialization & optional history pruning#30trbielec wants to merge 5 commits intosupabase:masterfrom
trbielec wants to merge 5 commits intosupabase:masterfrom
Conversation
- Fix buffer overread in JSONB import (use AMbytes instead of AMstr) - Fix stack overflow vulnerability by adding check_stack_depth() - Fix data corruption on updates via explicit cache invalidation - Fix silent data loss when exporting Text objects in Lists
- Switch from -O0 to -O2 to improve performance - Add AUTOMERGE_STATIC=1 option for deployment on managed PostgreSQL platforms
Adds comprehensive tests covering: - Stale cache invalidation (data corruption fix) - JSONB buffer safety (buffer overread fix) - Text list export correctness (data loss fix) - Numeric precision handling - Commit message type compatibility
- Add comprehensive installation instructions with static/dynamic linking options - Document usage examples and API reference - Add known limitations section - Update .gitignore to exclude PostgreSQL build artifacts (*.o, *.bc, *.so, results/)
Optimization 1: Append-Only Serialization - Transforms update complexity from O(N) to O(1) - Stores base document state and calculates deltas on save - Achieves constant-time updates regardless of document size - Performance: 75,600x faster for 10MB documents Optimization 2: History Pruning - Adds squash_history() function to compress documents - Discards historical operations while preserving final state - Reduces storage by up to 99.98% for operation-heavy documents - Warning: Breaks sync history - use only for archival/inactive docs Bug Fixes: - Fixed missing SET_VARSIZE in append-only path (underflow bug) - Fixed AMgetChanges to handle empty change sets - Fixed base_heads safety via temporary document reload - Added Marks preservation in squash_history Testing: - All 4 regression tests passing (100%) - Performance tests included and verified - Memory safety confirmed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR addresses the fundamental scalability limitation of the extension: the linear$O(N)$ serialization cost during updates. By implementing an append-only storage strategy, write operations are now constant time regardless of document size. Additionally, it introduces a
squash_historyfunction to manage storage growth.These changes transform the extension from a proof-of-concept into a high-performance engine capable of handling large (>10MB) real-time documents.
Changes
1. Append-Only Serialization (O(1) Writes)
AMsave) on everyUPDATE, the extension now caches the original binary blob (base_data) and appends only the new binary changes (AMgetChanges) generated by the current operation.base_headsto prevent Automerge iterator invalidation bugs, ensuring memory safety during delta calculation.2. History Pruning (
squash_history)automerge.squash_history(doc)3. Critical Bug Fixes (Pre-emptive)
VARSIZEunderflow bug in the flattening logic that could corrupt memory during TOAST operations.Performance Benchmarks
Performance tests were conducted on a 10MB document to measure
UPDATElatency (PostgreSQL 18.1, -O2 build).Verification
automerge,errors,teams,production_fixes).test_optimization_performance.sqlandtest_history_pruning.sqlto verify performance gains and data integrity after squashing.Impact
This PR removes the primary bottleneck preventing
pg_crdtfrom being used in production for collaborative editing of non-trivial documents. It allows the database to handle high-frequency updates without CPU saturation.