Skip to content

Commit

Permalink
Minor doc improvements.
Browse files Browse the repository at this point in the history
  • Loading branch information
akrylysov committed Apr 4, 2020
1 parent eeec217 commit 359a3e1
Showing 1 changed file with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ It aims to provide fast point lookups by indexing keys in an on-disk hash table.

Two key components of Pogeb are a write-ahead log (WAL) and a hash table index.
The WAL stores key-value pairs on disk in append-only files.
The on-disk hash table allows performing constant time lookups from keys to key-value pairs in the WAL.
The on-disk hash table allows constant time lookups from keys to key-value pairs in the WAL.

## Write-ahead log

Expand Down Expand Up @@ -54,9 +54,9 @@ Record

The Record Type field is either `Put` (0) or `Delete` (1).

## Hash index
## Hash table index

Pogreb uses two files to store the hash index on disk - "main" and "overflow" index files.
Pogreb uses two files to store the hash table on disk - "main" and "overflow" index files.

Each index file holds an array of buckets.

Expand All @@ -69,7 +69,7 @@ Index

### Bucket

Each bucket is an array of slots followed by an optional file pointer to the overflow bucket (stored in the "overflow"
A bucket is an array of slots followed by an optional file pointer to the overflow bucket (stored in the "overflow"
index).
The number of slots in a bucket is 31 - that is the maximum number of slots that is possible to fit in 512
bytes.
Expand Down Expand Up @@ -104,8 +104,8 @@ For example, a hash table with *L=0* contains between 0 and 1 buckets; *L=3* con

*S* is the index of the "split" bucket (initially *S=0*).

Collisions are resolved using the bucket chaining method.
Overflow buckets are stored in an "overflow index" file and form a linked list.
Collisions are resolved using the bucket chaining technique.
The "overflow" index file stores overflow buckets that form a linked list.

### Lookup

Expand All @@ -131,9 +131,10 @@ h(key) -> | Bucket 1 | -> | Slot 0 | Slot 1 | ... | Slot N |

To get the position of the bucket:

- Hash the key (Pogreb uses the 32-bit version of MurmurHash3).
- Use 2<sup>L</sup> bits of the hash to get the position of the bucket - `hash % math.Pow(2, L)`.
- If the calculated position comes before the split bucket *S*, the position is `hash % math.Pow(2, L+1)`.
1. Hash the key (Pogreb uses the 32-bit version of MurmurHash3).
2. Use 2<sup>L</sup> bits of the hash to get the position of the bucket - `hash % math.Pow(2, L)`.
3. Set the position to `hash % math.Pow(2, L+1)` if the previously calculated position comes before the
split bucket *S*.

The lookup function reads a bucket at the given position from the index file and performs a linear search to find a slot
with the required hash.
Expand All @@ -155,12 +156,12 @@ If the bucket has all of its slots occupied, a new overflow bucket is created.
When the number of items in the hash table exceeds the load factor threshold (70%), the split operation is performed on
the split bucket *S*:

- A new bucket is allocated at the end of the index file.
- The split bucket index *S* is incremented.
- If *S* points to 2<sup>L</sup>, *S* is reset to 0 and *L* is incremented.
- The items from the old split bucket are separated between the newly allocated bucket and the old split bucket by
1. Allocate a new bucket at the end of the index file.
2. Increment the split bucket index *S*.
3. Increment *L* and reset *S* to 0 if *S* points to 2<sup>L</sup>.
4. Divide items from the old split bucket between the newly allocated bucket and the old split bucket by
recalculating the positions of the keys in the hash table.
- The number of buckets *N* is incremented.
5. Increment the number of buckets *N*.

### Removal

Expand Down

0 comments on commit 359a3e1

Please sign in to comment.