Skip to content

Commit

Permalink
docs: improve day 2 documentation & add guidance for task 1 (#19)
Browse files Browse the repository at this point in the history
* feat(docs): improve day 2 documentation & add guidance for task 1

* Apply suggestions from code review

---------

Co-authored-by: Alex Chi Z <[email protected]>
  • Loading branch information
xzhseh and skyzh committed Jul 11, 2023
1 parent a5ac71c commit 26b8e6c
Showing 1 changed file with 23 additions and 4 deletions.
27 changes: 23 additions & 4 deletions mini-lsm-book/src/02-sst.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,17 @@ The SST builder is similar to block builder -- users will call `add` on the buil
inside SST builder and split block when necessary. Also, you will need to maintain block metadata `BlockMeta`, which
includes the first key in each block and the offset of each block. The `build` function will encode the SST, write
everything to disk using `FileObject::create`, and return an `SsTable` object. Note that in part 2, you don't need to
actually write the data to the disk. Just store everything in memory as a vector until we implement a block cache.
actually write the data to the disk.
Just store everything in memory as a vector until we implement a block cache (Day 4, Task 5).

The encoding of SST is like:

```
| data block | data block | data block | data block | meta block | meta block offset (u32) |
-------------------------------------------------------------------------------------------
| Block Section | Meta Section | Extra |
-------------------------------------------------------------------------------------------
| data block | ... | data block | meta block | ... | meta block | meta block offset (u32) |
-------------------------------------------------------------------------------------------
```

You also need to implement `estimated_size` function of `SsTableBuilder`, so that the caller can know when can it start
Expand All @@ -39,6 +44,17 @@ more data than meta block, we can simply return the size of data blocks for `est
You can also align blocks to 4KB boundary so as to make it possible to do direct I/O in the future. This is an optional
optimization.

The recommend sequence to finish **Task 1** is as below:

- Implement `SsTableBuilder` in `src/table/builder.rs`
- Before implementing `SsTableBuilder`, you may want to take a look in `src/table.rs`, for `FileObject` & `BlockMeta`.
- For `FileObject`, you should at least implement `read`, `size` and `create` (No need for Disk I/O) before day 4.
- For `BlockMeta`, you may want to add some extra fields when encoding / decoding the `BlockMeta` to / from a buffer.
- Implement `SsTable` in `src/table.rs`
- Same as above, you do not need to worry about `BlockCache` until day 4.

After finishing **Task 1**, you should be able to pass all the current tests except two iterator tests.

## Task 2 - SST Iterator

Like `BlockIteartor`, you will need to implement an iterator over an SST. Note that you should load data on demand. For
Expand All @@ -53,15 +69,18 @@ which block might possibly contain the key. It is possible that the key doesn't
block iterator will be invalid immediately after a seek. For example,

```
| block 1 | block 2 | block meta |
----------------------------------
| block 1 | block 2 | block meta |
----------------------------------
| a, b, c | e, f, g | 1: a, 2: e |
----------------------------------
```

If we do `seek(b)` in this SST, it is quite simple -- using binary search, we can know block 1 contains keys `a <= keys
< e`. Therefore, we load block 1 and seek the block iterator to the corresponding position.

But if we do `seek(d)`, we will position to block 1, but seeking `d` in block 1 will reach the end of the block.
Therefore, we should check if the iterator is invalid after seek, and switch to the next block if necessary.
Therefore, we should check if the iterator is invalid after the seek, and switch to the next block if necessary.

## Extra Tasks

Expand Down

0 comments on commit 26b8e6c

Please sign in to comment.