Development notes and plans

Backing storage options to consider

Plain old filesystem (probably won’t scale with the level of caching granularity we might want to aspire to)
RocksDB / LevelDB
sled
- Author recommends comparing against PingCAP’s TitanDB for the large blob workload:
  
  one DB that might be interesting to benchmark for your workload against sled is PingCAP’s titandb, which separates keys from values, and I think they might move values even less over time, so that could be one of the better options
  
  they use the WiscKey DB architecture of keeping keys in an LSM but values out of it so they can avoid moving values as often. this is the same approach used in the Badger DB for the Go ecosystem that is nice in some situations
  
  — Tyler Neely (@spacejam)
- TiKV for distributed stores would be interesting, though probably excessive (there’s not as much chance of conflict in a largely content‐addressed database)
LMDB
SQLite

Investigate which of these options would need an accompanying large blob storage system (certainly SQLite) and then don’t bother using those options because that would be way too much work. The sled author’s remarks on the topic:

the general reason why most DBs will suggest punting to a FS for that kind of thing is because over time, databases will tend to rearrange items internally in order to perform defragmentation etc… sled will actually just spill over to using a file itself, but as I’m writing this it will only split leaf nodes when they hit 17 children, so if you have 17 1gb values all clustered on one leaf, sled will need to read that whole 17gb file in before serving your data

having said that, it isn’t that much work to change how sled splits leaf nodes to increase this granularity, so each 1gb value would get its own file

DBs usually assume they can copy values over and over pretty cheaply, and for most DB’s this is indeed sort of a nightmare workload

but right now, sled only splits leafs when they exceed the 16 child limit. in general I want to make this more size based, but as of right now it’s up to 16 values per node, and this stuff happens per-node rather than per-key

anyway, the bottom line is, measure it 🙂

as the creator of the thing I’ll always be aware of ways of making it better, but maybe for you it’s already good enough

— Tyler Neely (@spacejam)

Reference format

0x00/uu:…: A universally unique random identifier.
0x01/b3:…: A BLAKE3 hash.
…: Extensible! Let’s hope we never need more than 255 hash functions.

Hash/UUID payload is 31 bytes so that the reference as a whole is 256 bits.

TODO: Come up with a name for uu:… identifiers that doesn’t clash with standard 128 bit UUIDs.

# Hash a constant
$ mew ref 1234
b3:…

# Import and hash a file
$ mew ref -f ./file
b3:…

# Hash a file without importing into the forest
$ mew ref -n -f ./file
b3:…

# Download and import a file
$ mew ref -f https://example.com/example-1.0.tar.gz
b3:…

# Generate a unique random identifier
$ mew ref -u
uu:…

Investigate network protocol/RPC options

gRPC and Cap’n Proto are the main contenders here. Maybe figure out if gRPC could be used with Cap’n Proto payloads, or hand‐roll something based on Cap’n Proto + QUIC/HTTP 3.

Compare Rust QUIC/HTTP 3 libraries

https://github.com/mozilla/neqo
https://github.com/cloudflare/quiche
https://github.com/djc/quinn

Sketch out the mewl language

Somewhere between a purely functional shell and Dhall; implementation tightly integrated with the build store interface.

Ensuring adherence to object‐capability principles is Very Important™.

Implementation concerns

Grammar

It’d be nice to specify the language grammar as a hybrid parser‐pretty printer. I’ve always wanted to do that.

I’d like to think about how to reconcile Roslyn‐style 1:1 concrete‐abstract syntax mapping that preserves comments and whitespace with Wadler‐style pretty-printing, which feels much more structured than just twiddling the whitespace fields according to a large library of rules to me.

Typed tree structure sketch

TreeSpec : Type
Dir : Map PathComponent TreeSpec → TreeSpec
Blob : TreeSpec
Executable : TreeSpec

Tree : TreeType → Type
get : (p : PathComponent) → ∀ts ⇒ Tree (Dir ts) → (e : p ∈ ts) ⇒ Tree (value e)
get? : PathComponent → ∀ts ⇒ Tree (Dir ts) → Option (∃t · Tree t)
execute : Tree Executable → …
execute? : ∀t ⇒ Tree t → Option …

— guaranteed to contain bin/hello, which you can execute (including
— from build rules), and share/doc/hello/README, a plain blob, but can
— also contain arbitrary other trees in addition
hello-tree : Tree (Dir {
  "bin" = Dir {"hello" = Executable},
  "share" = Dir {"doc" = Dir {"hello" = Dir {"README" = Blob}}},
})

— hello-tree ▻ get "bin" : Tree (Dir {"hello" = Executable})
— execute (hello-tree ▻ get "bin" ▻ get "hello") : …

Could probably use row types for this:

— essentially mapping from identifiers to the specified type
Row : Type → Type

⦃
  bin : Dir ⦃hello : Executable⦄,
  share : Dir ⦃doc : Dir ⦃hello : Dir ⦃README : Blob⦄⦄⦄,
⦄ : Row TreeSpec

Then we can get nicer bin : … syntax by punning on the fact that rows would also be used to specify record types.

Investigate Guix properly

and raid it for ideas.

Incremental computation for configuration management

Adapton seems like it should have insights that are applicable to modelling Ansible/Terraform‐style reconciliation of configuration with state in a pure system.

Toolchains and bootstrapping

Shiz’s LLVM + clang + LLD + elftoolchain + compiler-rt + libc++ Linux toolchain work is probably worth referencing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.adoc

notes.adoc

Development notes and plans

Backing storage options to consider

Reference format

Investigate network protocol/RPC options

Compare Rust QUIC/HTTP 3 libraries

Sketch out the mewl language

Implementation concerns

Grammar

Typed tree structure sketch

Investigate Guix properly

Incremental computation for configuration management

Toolchains and bootstrapping

Meta

Set up CI

Set up bors

Set up and require commit signing

Move to self‐hosted infrastructure

Prohibit force pushes

Files

notes.adoc

Latest commit

History

notes.adoc

File metadata and controls

Development notes and plans

Backing storage options to consider

Reference format

Investigate network protocol/RPC options

Compare Rust QUIC/HTTP 3 libraries

Sketch out the mewl language

Implementation concerns

Grammar

Typed tree structure sketch

Investigate Guix properly

Incremental computation for configuration management

Toolchains and bootstrapping

Meta

Set up CI

Set up bors

Set up and require commit signing

Move to self‐hosted infrastructure

Prohibit force pushes