Skip to content

Conversation

@satyakigh
Copy link
Collaborator

@satyakigh satyakigh commented Dec 23, 2025

1. MDB_BAD_RSLOT (Fork Detection Issues)

Error Message:

MDB_BAD_RSLOT: Invalid reuse of reader locktable slot: 
The reader lock pid X, txn Y, doesn't match env pid Z

Cause:
LMDB uses memory-mapped files with process-specific reader lock slots. When a process forks (common in VS Code's extension host), the child process inherits the parent's LMDB environment but the reader lock table still references the parent's PID. Any read operation in the child triggers this error.


2. MDB_CORRUPTED / MDB_PAGE_NOTFOUND

Error Messages:

MDB_CORRUPTED: Located page was wrong type
MDB_PAGE_NOTFOUND: Requested page not found

Cause:

  • Stale memory mappings after fork
  • Incomplete writes from crashes
  • Concurrent access issues across processes
  • File system corruption

These errors indicate the B+ tree structure is inconsistent - either a page pointer references the wrong page type, or a referenced page doesn't exist.

3. Closed Database Errors

Error Message:

Can not renew a transaction from a closed database

Cause:
Operations attempted on an LMDB environment that was closed (either explicitly or due to error recovery). The renewReadTxn function tries to reuse a transaction from an invalidated environment.


4. MDB_BAD_TXN

Error Message:

MDB_BAD_TXN: Transaction must abort, has a child, or is invalid

Cause:
Transaction state corruption, often from:

  • Using transactions across fork boundaries
  • Nested transaction issues
  • Cursor operations on aborted transactions

Implemented Fixes

Fix 1: Proactive Fork Detection

  • Tracks the PID when environment was opened (openPid)
  • Before every database operation, checks if current PID differs
  • If fork detected, proactively reopens environment before the operation
  • Updates all store handles with fresh database references

Addresses: MDB_BAD_RSLOT errors (2,856 occurrences)


Fix 2: Reactive Error Recovery

  • Catches errors from LMDB operations
  • Classifies errors into fork-related vs corruption-related
  • Triggers appropriate recovery strategy

Fix 3: Environment Recreation

  • Creates fresh LMDB environment with new memory mappings
  • Updates PID tracking
  • Recreates all database handles
  • Existing store references remain valid (they get updated handles)

Fix 4: Store Handle Updates

  • Allows in-place update of the underlying database handle
  • Callers holding store references don't need new references
  • Enables seamless recovery without breaking existing code

Fix 5: Operation Wrapping

  • Every operation calls validateDatabase() first (fork detection)
  • If operation fails, onError() triggers recovery
  • Telemetry wrapping provides observability

@satyakigh satyakigh requested a review from a team as a code owner December 23, 2025 20:11
atennak1
atennak1 previously approved these changes Dec 24, 2025
Copy link
Contributor

@atennak1 atennak1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending unit test fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants