Skip to content

Table files

Niclas Olofsson edited this page May 18, 2019 · 5 revisions

Original notes on table format

LevelDB stor data in stored table files in levels 0..max-level. Each level increase the table file size by a factor of 10. Table files use the extension .tbland they are numbered sequentially and increased automatically. Files are merged, and old files removed when compacting happens.

Table file format

A table file contain several sections of data in the following layout

[data block 1]
[data block 2]
...
[data block N]

[meta block 1]
...
[meta block K]
[metaindex block]

[index block]

[Footer]        (fixed size; starts at file_size - sizeof(Footer))

The format in formal specification (individual types described below)

table_file :=
    data = data_block*
    metaindex = metaindex
    index = block_index
    footer = footer;

BlockHandle

A block handle points to a position and lenght of a block of data in the file.

block_handle :=
    offset:   varint64
    size:     varint64

The varints are regular formatted google style varints from protobuf.

Data blocks

The sequence of key/value pairs in the file are stored in sorted order and partitioned into a sequence of data blocks. These blocks come one after another with start at the beginning of the file.

data_block:= 
    block_data = data*;
    restart_index = restart_index;

data :=
    shared = varint64;
    non_shared = varint64;
    data_size = varint64;
    user_data = byte[data_size];

restart_index :=
    restart_points = uint32[count];
    count = uint32;

Restart indexes and key compression

The restart index is used to compress key stored in the data block. Since data is stored in sorted order it allows for parts of the keys to share bytes with the previous key. To make scanning fast, restartpoints are stored at even intervalls, and at that point the full key is stored. This can then be used to jump between blocks of data for faster search.

Meta and metaindex blocks

After data blocks we find a set of meta blocks. The supported meta block types are described in the original documentation and I won't go into it here.

The metaindex block contains one entry for every other meta block where the key is the name of the meta block and the value is a BlockHandle pointing to that meta block. This block follow immediate afer all the meta blocks themselves.

Notice that the key in the index in this case is an UTF8 encoded string containing the name of the meta block.

Index block

The index block contains one entry per data block, where the key is a string >= last key in that data block and before the first key in the successive data block. The value is the BlockHandle for the data block. It ends with a restart index.

block_index := 
    indexes = index*;
    restart_index = restart_index;

restart_index :=
    restart_points = uint32[count];
    count = uint32;

index :=
    shared = varint64;
    non_shared = varint64;
    size = varint64;
    data_handle = block_handle;

This is essentially the same format as for a regular data block above.

Footer

At the very end of the file is a fixed length footer that contains the BlockHandle of the metaindex and index blocks as well as a magic number.

footer :=
    metaindex_handle: block_handle;     // Block handle for metaindex
    index_handle:     block_handle;     // Block handle for index
    padding:          zero*;            // zeroed bytes to make fixed length
                                        // (40==2*sizeof(BlockHandle max length)
    magic:            fixed64;          // == 0xdb4775248b80fb57

When reading a table file, this is what we start with. From this we get the index handle that we use to search for a key (using min, max search), and then read the actual data block.

Clone this wiki locally