Skip to content

Commit

Permalink
Merge bitcoin/bitcoin#30125: test: improve BDB parser (handle interna…
Browse files Browse the repository at this point in the history
…l/overflow pages, support all page sizes)

d45eb39 test: compare BDB dumps of test framework parser and wallet tool (Sebastian Falbesoner)
01ddd9f test: complete BDB parser (handle internal/overflow pages, support all page sizes) (Sebastian Falbesoner)

Pull request description:

  This PR adds missing features to our test framework's BDB parser with the goal of hopefully being able to read all legacy wallets that are created with current and past versions of Bitcoin Core. This could be useful both for making review of bitcoin/bitcoin#26606 easier and to also possibly improve our functional tests for the wallet BDB-ro parser by additionally validating it with an alternative implementation. The second commits introduces a test that create a legacy wallet with huge label strings (in order to create overflow pages, i.e. pages needed for key/value data than is larger than the page size) and compares the dump outputs of wallet tool and the extended test framework BDB parser.
  It can be exercised via `$ ./test/functional/tool_wallet.py --legacy`. BDB support has to be compiled in (obviously).

  For some manual tests regarding different page sizes, the following patch can be used:
  ```diff
  diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
  index 38cca32f80..1bf39323d3 100644
  --- a/src/wallet/bdb.cpp
  +++ b/src/wallet/bdb.cpp
  @@ -395,6 +395,7 @@ void BerkeleyDatabase::Open()
                               DB_BTREE,                                 // Database type
                               nFlags,                                   // Flags
                               0);
  +            pdb_temp->set_pagesize(1<<9); /* valid BDB pagesizes are from 1<<9 (=512) to <<16 (=65536) */

               if (ret != 0) {
                   throw std::runtime_error(strprintf("BerkeleyDatabase: Error %d, can't open database %s", ret, strFile));
  ```
  I verified that the newly introduced test passes with all valid page sizes between 512 and 65536.

ACKs for top commit:
  achow101:
    ACK d45eb39
  furszy:
    utACK d45eb39
  brunoerg:
    code review ACK d45eb39

Tree-SHA512: 9f8ac80452545f4fcd24a17ea6f9cf91b487cfb1fcb99a0ba9153fa4e3b239daa126454e26109fdcb72eb1c76a4ee3b46fd6af21dc318ab67bd12b3ebd26cfdd
  • Loading branch information
achow101 committed Jan 29, 2025
2 parents 1d6c6e9 + d45eb39 commit 1e0c5bd
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 39 deletions.
137 changes: 98 additions & 39 deletions test/functional/test_framework/bdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,55 @@
Utilities for working directly with the wallet's BDB database file
This is specific to the configuration of BDB used in this project:
- pagesize: 4096 bytes
- Outer database contains single subdatabase named 'main'
- btree
- btree leaf pages
- btree internal, leaf and overflow pages
Each key-value pair is two entries in a btree leaf. The first is the key, the one that follows
Each key-value pair is two entries in a btree leaf, which optionally refers to overflow pages
if the data doesn't fit into a single page. The first entry is the key, the one that follows
is the value. And so on. Note that the entry data is itself not in the correct order. Instead
entry offsets are stored in the correct order and those offsets are needed to then retrieve
the data itself.
the data itself. Note that this implementation currently only supports reading databases that
are in the same endianness as the host.
Page format can be found in BDB source code dbinc/db_page.h
This only implements the deserialization of btree metadata pages and normal btree pages. Overflow
pages are not implemented but may be needed in the future if dealing with wallets with large
transactions.
`db_dump -da wallet.dat` is useful to see the data in a wallet.dat BDB file
"""

import struct

# Important constants
PAGESIZE = 4096
PAGE_HEADER_SIZE = 26
OUTER_META_PAGE = 0
INNER_META_PAGE = 2

# Page type values
BTREE_INTERNAL = 3
BTREE_LEAF = 5
OVERFLOW_DATA = 7
BTREE_META = 9

# Record type values
RECORD_KEYDATA = 1
RECORD_OVERFLOW_DATA = 3

# Some magic numbers for sanity checking
BTREE_MAGIC = 0x053162
DB_VERSION = 9

# Deserializes a leaf page into a dict.
# Btree internal pages have the same header, for those, return None.
# For the btree leaf pages, deserialize them and put all the data into a dict
def dump_leaf_page(data):
SUBDATABASE_NAME = b'main'

# Deserializes an internal, leaf or overflow page into a dict.
# In addition to the common page header fields, the result contains an 'entries'
# array of dicts with the following fields, depending on the page type:
# internal page [BTREE_INTERNAL]:
# - 'page_num': referenced page number (used to find further pages to process)
# leaf page [BTREE_LEAF]:
# - 'record_type': record type, must be RECORD_KEYDATA or RECORD_OVERFLOW_DATA
# - 'data': binary data (key or value payload), if record type is RECORD_KEYDATA
# - 'page_num': referenced overflow page number, if record type is RECORD_OVERFLOW_DATA
# overflow page [OVERFLOW_DATA]:
# - 'data': binary data (part of key or value payload)
def dump_page(data):
page_info = {}
page_header = data[0:26]
_, pgno, prev_pgno, next_pgno, entries, hf_offset, level, pg_type = struct.unpack('QIIIHHBB', page_header)
Expand All @@ -56,20 +67,35 @@ def dump_leaf_page(data):
page_info['entry_offsets'] = struct.unpack('{}H'.format(entries), data[26:26 + entries * 2])
page_info['entries'] = []

if pg_type == BTREE_INTERNAL:
# Skip internal pages. These are the internal nodes of the btree and don't contain anything relevant to us
return None
assert pg_type in (BTREE_INTERNAL, BTREE_LEAF, OVERFLOW_DATA)

assert pg_type == BTREE_LEAF, 'A non-btree leaf page has been encountered while dumping leaves'
if pg_type == OVERFLOW_DATA:
assert entries == 1
page_info['entries'].append({'data': data[26:26 + hf_offset]})
return page_info

for i in range(0, entries):
entry = {}
offset = page_info['entry_offsets'][i]
entry = {'offset': offset}
page_data_header = data[offset:offset + 3]
e_len, pg_type = struct.unpack('HB', page_data_header)
entry['len'] = e_len
entry['pg_type'] = pg_type
entry['data'] = data[offset + 3:offset + 3 + e_len]
record_header = data[offset:offset + 3]
offset += 3
e_len, record_type = struct.unpack('HB', record_header)

if pg_type == BTREE_INTERNAL:
assert record_type == RECORD_KEYDATA
internal_record_data = data[offset:offset + 9]
_, page_num, _ = struct.unpack('=BII', internal_record_data)
entry['page_num'] = page_num
elif pg_type == BTREE_LEAF:
assert record_type in (RECORD_KEYDATA, RECORD_OVERFLOW_DATA)
entry['record_type'] = record_type
if record_type == RECORD_KEYDATA:
entry['data'] = data[offset:offset + e_len]
elif record_type == RECORD_OVERFLOW_DATA:
overflow_record_data = data[offset:offset + 9]
_, page_num, _ = struct.unpack('=BII', overflow_record_data)
entry['page_num'] = page_num

page_info['entries'].append(entry)

return page_info
Expand Down Expand Up @@ -115,37 +141,70 @@ def dump_meta_page(page):
return metadata

# Given the dict from dump_leaf_page, get the key-value pairs and put them into a dict
def extract_kv_pairs(page_data):
def extract_kv_pairs(page_data, pages):
out = {}
last_key = None
for i, entry in enumerate(page_data['entries']):
data = b''
if entry['record_type'] == RECORD_KEYDATA:
data = entry['data']
elif entry['record_type'] == RECORD_OVERFLOW_DATA:
next_page = entry['page_num']
while next_page != 0:
opage = pages[next_page]
opage_info = dump_page(opage)
data += opage_info['entries'][0]['data']
next_page = opage_info['next_pgno']

# By virtue of these all being pairs, even number entries are keys, and odd are values
if i % 2 == 0:
out[entry['data']] = b''
last_key = entry['data']
last_key = data
else:
out[last_key] = entry['data']
out[last_key] = data
return out

# Extract the key-value pairs of the BDB file given in filename
def dump_bdb_kv(filename):
# Read in the BDB file and start deserializing it
pages = []
with open(filename, 'rb') as f:
data = f.read(PAGESIZE)
while len(data) > 0:
pages.append(data)
data = f.read(PAGESIZE)
# Determine pagesize first
data = f.read(PAGE_HEADER_SIZE)
pagesize = struct.unpack('I', data[20:24])[0]
assert pagesize in (512, 1024, 2048, 4096, 8192, 16384, 32768, 65536)

# Sanity check the meta pages
dump_meta_page(pages[OUTER_META_PAGE])
dump_meta_page(pages[INNER_META_PAGE])
# Read rest of first page
data += f.read(pagesize - PAGE_HEADER_SIZE)
assert len(data) == pagesize

# Fetch the kv pairs from the leaf pages
# Read all remaining pages
while len(data) > 0:
pages.append(data)
data = f.read(pagesize)

# Sanity check the meta pages, read root page
outer_meta_info = dump_meta_page(pages[OUTER_META_PAGE])
root_page_info = dump_page(pages[outer_meta_info['root']])
assert root_page_info['pg_type'] == BTREE_LEAF
assert len(root_page_info['entries']) == 2
assert root_page_info['entries'][0]['data'] == SUBDATABASE_NAME
assert len(root_page_info['entries'][1]['data']) == 4
inner_meta_page = int.from_bytes(root_page_info['entries'][1]['data'], 'big')
inner_meta_info = dump_meta_page(pages[inner_meta_page])

# Fetch the kv pairs from the pages
kv = {}
for i in range(3, len(pages)):
info = dump_leaf_page(pages[i])
if info is not None:
info_kv = extract_kv_pairs(info)
pages_to_process = [inner_meta_info['root']]
while len(pages_to_process) > 0:
curr_page_no = pages_to_process.pop()
assert curr_page_no <= outer_meta_info['last_pgno']
info = dump_page(pages[curr_page_no])
assert info['pg_type'] in (BTREE_INTERNAL, BTREE_LEAF)
if info['pg_type'] == BTREE_INTERNAL:
for entry in info['entries']:
pages_to_process.append(entry['page_num'])
elif info['pg_type'] == BTREE_LEAF:
info_kv = extract_kv_pairs(info, pages)
kv = {**kv, **info_kv}
return kv
46 changes: 46 additions & 0 deletions test/functional/tool_wallet.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,23 @@

import os
import platform
import random
import stat
import string
import subprocess
import textwrap

from collections import OrderedDict

from test_framework.bdb import dump_bdb_kv
from test_framework.messages import ser_string
from test_framework.test_framework import BitcoinTestFramework
from test_framework.util import (
assert_equal,
assert_greater_than,
sha256sum_file,
)
from test_framework.wallet import getnewdestination


class ToolWalletTest(BitcoinTestFramework):
Expand Down Expand Up @@ -545,6 +550,44 @@ def test_dump_unclean_lsns(self):
self.stop_node(0)
self.assert_tool_output("The dumpfile may contain private keys. To ensure the safety of your Bitcoin, do not share the dumpfile.\n", "-wallet=unclean_lsn", f"-dumpfile={wallet_dump}", "dump")

def test_compare_legacy_dump_with_framework_bdb_parser(self):
self.log.info("Verify that legacy wallet database dump matches the one from the test framework's BDB parser")
wallet_name = "bdb_ro_test"
self.start_node(0)
# add some really large labels (above twice the largest valid page size) to create BDB overflow pages
self.nodes[0].createwallet(wallet_name)
wallet_rpc = self.nodes[0].get_wallet_rpc(wallet_name)
generated_labels = {}
for i in range(10):
address = getnewdestination()[2]
large_label = ''.join([random.choice(string.ascii_letters) for _ in range(150000)])
wallet_rpc.setlabel(address, large_label)
generated_labels[address] = large_label
# fill the keypool to create BDB internal pages
wallet_rpc.keypoolrefill(1000)
self.stop_node(0)

wallet_dumpfile = self.nodes[0].datadir_path / "bdb_ro_test.dump"
self.assert_tool_output("The dumpfile may contain private keys. To ensure the safety of your Bitcoin, do not share the dumpfile.\n", "-wallet={}".format(wallet_name), "-dumpfile={}".format(wallet_dumpfile), "dump")

expected_dump = self.read_dump(wallet_dumpfile)
# remove extra entries from wallet tool dump that are not actual key/value pairs from the database
del expected_dump['BITCOIN_CORE_WALLET_DUMP']
del expected_dump['format']
del expected_dump['checksum']
bdb_ro_parser_dump_raw = dump_bdb_kv(self.nodes[0].wallets_path / wallet_name / "wallet.dat")
bdb_ro_parser_dump = OrderedDict()
assert any([len(bytes.fromhex(value)) >= 150000 for value in expected_dump.values()])
for key, value in sorted(bdb_ro_parser_dump_raw.items()):
bdb_ro_parser_dump[key.hex()] = value.hex()
assert_equal(bdb_ro_parser_dump, expected_dump)

# check that all labels were created with the correct address
for address, label in generated_labels.items():
key_bytes = b'\x04name' + ser_string(address.encode())
assert key_bytes in bdb_ro_parser_dump_raw
assert_equal(bdb_ro_parser_dump_raw[key_bytes], ser_string(label.encode()))

def run_test(self):
self.wallet_path = self.nodes[0].wallets_path / self.default_wallet_name / self.wallet_data_filename
self.test_invalid_tool_commands_and_args()
Expand All @@ -561,6 +604,9 @@ def run_test(self):
self.test_dump_createfromdump()
self.test_chainless_conflicts()
self.test_dump_very_large_records()
if not self.options.descriptors and self.is_bdb_compiled() and not self.options.swap_bdb_endian:
self.test_compare_legacy_dump_with_framework_bdb_parser()


if __name__ == '__main__':
ToolWalletTest(__file__).main()

0 comments on commit 1e0c5bd

Please sign in to comment.