Skip to content

Commit b571a65

Browse files
committed
wip
1 parent e8dfe8d commit b571a65

File tree

30 files changed

+1286
-224
lines changed

30 files changed

+1286
-224
lines changed

README.md

Lines changed: 136 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,37 @@
11
# Versioned Binary Application Record Encoding (VBARE)
22

3-
Fearless schema migrations at theoretical maximum performance
3+
_Simple schema evoluation with maximum performance_
44

5-
VBARE is a tiny extension to BARE
5+
VBARE is a tiny extension to [BARE](https://baremessages.org/) that provides a way of handling schema evoluation.
66

77
## Preface: What is BARE?
88

9-
- https://baremessages.org/
10-
- https://www.ietf.org/archive/id/draft-devault-bare-11.html
9+
> BARE is a simple binary representation for structured application data.
10+
>
11+
> - Messages are encoded in binary and compact in size. Messages do not contain
12+
> schema information — they are not self-describing.
13+
>
14+
> - BARE is optimized for small messages. It is not optimized for encoding
15+
> large amounts of data in a single message, or efficiently reading a message
16+
> with fields of a fixed size. However, all types are aligned to 8 bits,
17+
> which does exchange some space for simplicity.
18+
>
19+
> - BARE's approach to extensibility is conservative: messages encoded today
20+
> will be decodable tomorrow, and vice-versa. But extensibility is still
21+
> possible; implementations can choose to decode user-defined types at a
22+
> higher level and map them onto arbitrary data types.
23+
>
24+
> - The specification is likewise conservative. Simple implementations of
25+
> message decoders and encoders can be written inside of an afternoon.
26+
>
27+
> - An optional DSL is provided to document message schemas and provide a
28+
> source for code generation. However, if you prefer, you may also define
29+
> your schema using the type system already available in your programming
30+
> language.
31+
>
32+
> [Source](https://baremessages.org/)
33+
34+
Also see the [IETF specification](https://www.ietf.org/archive/id/draft-devault-bare-11.html).
1135

1236
## Project goals
1337

@@ -18,6 +42,7 @@ VBARE is a tiny extension to BARE
1842
non-goals:
1943

2044
- data compactness -> that's what gzip is for
45+
- provide an rpc layer -> this is trivial to do yourself based on your specific requirements
2146

2247
## Use cases
2348

@@ -31,83 +56,144 @@ non-goals:
3156
- Every message has a version associated with it
3257
- either pre-negotiated (via something like an http request query parameter/handshake) or embedded int he message itself
3358
- Applications provide functions to upgrade between protocol versions
34-
- There is no migration in the schema itself, just copy and paste the schema to write the new one
59+
- There is no evolution semantics in the schema itself, just copy and paste the schema to write the new one
3560

36-
## Migration philosophy
61+
## evolutino philosophy
3762

3863
- declare discrete versions with predefined version indexes
39-
- manual migrations simplify the application logic by putting complex defaults in your app code
64+
- manual evolutions simplify the application logic by putting complex defaults in your app code
4065
- stop making big breaking v1 -> v2 changes, make much smaller changes with more flexibility
4166
- reshaping structures is important -- not just changing types and names
4267

43-
## Code examples
68+
## specification
4469

45-
## Current users
70+
### versions
4671

47-
- Rivet
48-
- Network protocol
49-
- All internal communication
50-
- All data stored at rest
51-
- RivetKit
52-
- Protocol for communicating with clients
72+
each schema version is a monotomically incrementing <TODO: integer type>
5373

54-
## FAQ
74+
### embedded version
5575

56-
### Why not include RPC?
76+
embedded version works by inserting a <TODO: integer type> integer at the beginning of the buffer. this integer is used to define which version of the schema is being used.
5777

58-
- why the fuck does your protocol need to define an rpc schema
59-
- keep it simple, use a union
78+
the layout looks like this:
6079

61-
### Why is copying the entire schema for every version better than using decorators for gradual migrations
80+
```
81+
TODO
82+
```
6283

63-
### Isn't copying the schema going to result in a lot of duplicate code?
84+
### pre-negotiated version
6485

65-
- yes. after enough pain and suffering of running production APIS, this is what you will end up doing manually, but in a much more painful way.
66-
- having schema versions also makes it much easier to reason about how clients are connecting to your system/the state of an application. incremental migrations dno't let you consider other properties/structures.
67-
- this also lets you reshape your structures.
86+
often times, you speicty the protocol version outside of the message iteself. for eaxmple, if making an http request with the version in the path like `POST /v3/users`, we can extract version 3 from the path. in this case, VBARE does not insert a version in to the buffer. for this, vbare simply acts as a simple step function for upgrading/downgrading version data structures.
6887

69-
### Why copying instead of decorators for migrations?
88+
## Implementations
7089

71-
- decorators are limited and get very complicated
72-
- it's unclear what version of the protocol a decorator takes effect -- this is helpful
73-
- generated sdks become more and more bloated with every change
74-
- you need a validation build step for your validators
75-
- things you can do with manual migrations
90+
- [TypeScript](./typescript/)
91+
- [Example Code](./typescript/examples/basic/src/migrator.ts)
92+
- [Rust](./rust/)
93+
- [Example Code](./rust/examples/basic/src/lib.rs)
7694

77-
### Don't migration steps get repetitive?
95+
([Full list of BARE implementations](https://baremessages.org/))
7896

79-
- most of the time, structures will match exactly. most languages can provide a 1:1 migration.
80-
- the most eggrarious offendors will be deeply nested structures, but even that isn't terrible
97+
_Adding an implementation takes less than an hour -- it's really that simple._
98+
99+
## Current users
100+
101+
- [Rivet Engine](https://github.com/rivet-dev/engine)
102+
- [Data at rest](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/data)
103+
- Internal network protocols ([tunnel](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/epoxy-protocol), [Epoxy](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/epoxy-protocol), [UPS](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/ups-protocol))
104+
- Public network protocol ([runner](https://github.com/rivet-dev/engine/tree/bbdf1c1c49e307ba252186aa4d75a9452d74fca7/sdks/schemas/runner-protocol))
105+
- [RivetKit](https://github.com/rivet-dev/rivetkit)
106+
- [Client protocol](https://github.com/rivet-dev/rivetkit/tree/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol)
107+
- [Persisted state](https://github.com/rivet-dev/rivetkit/tree/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/actor-persist)
108+
- [File system driver](https://github.com/rivet-dev/rivetkit/tree/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/file-system-driver)
109+
110+
## Embedded vs Negotiated Version
111+
112+
TODO
113+
114+
## Clients vs Servers
115+
116+
- Only servers need to ahve the evolutions steps
117+
- clients just send their version
118+
119+
## Downsides
120+
121+
- extensive migration code
122+
- the older the version the more migration steps (though these migration steps should be effectively free)
123+
- migration steps are not portable across langauges, but only the server needs to the migration step. so usually this is only done once.
81124

82125
## Comparison
83126

84127
- Protobuf (versioned: yes)
85128
- unbelievably poorly designed protocol
86-
- makes it your problem by making everything optional
129+
- makes migrations your problem at runtime by making everything optional
87130
- even worse, makes properties have a default value (ie integers) which leads to subtle bugs with serious concequenses
88-
- tracking indexes in the file is ass
131+
- tracking field numbers in a file is a pain in the ass
89132
- Cap'n'proto (versioned: yes)
90-
- simplicity
91-
- quality of output languages
133+
- includes the rpc layer as part of the library, this is out of the scope of what we want in our schema design
134+
- of the schema languages we evaluated, this provides by far the most flexible schema migrations
135+
- has poor language support. technically most major languages are supported, but the qulaity of the ipmlementations are lacking. i suspect this is largely due to the complexity of capnproto itself compared to other protocols.
136+
- generics are cool. but we opt for simplicity with more repetition.
137+
- the learning curve seems the steepest of any other tool
138+
- cap'n'web (versioned: no)
139+
- this is focused on rpc with json. not relevant to what we needed.
92140
- cbor/messagepack/that mongodb one (versioned: self-describing)
93-
- requires encoding the entire key
141+
- does not have a schema, it's completley self-describing
142+
- requires encoding the entire key, not suitable for our needs
94143
- Flatbuffers (versioned: yes)
144+
- intented as a high performance encoding similar to protobuf
95145
- still uses indexes like protobuf, unless you use structs
96-
- structs are ass
97-
- cdoegen is ass
98-
- https://crates.io/crates/bebop & https://crates.io/crates/borsh (versioned: TODO)
99-
- provides cross platform
100-
- TODO: more complicated than i'd like
101-
- bebop includes an extra ??? step
146+
- to achieve what we wanted, we'd have to use just structs
147+
- schema evolution works similar to protobuf
148+
- also requires writing field numbers in the file
149+
- https://crates.io/crates/bebop (verisoned: no)
150+
- provides cross platform compact self-contained binary encoding
151+
- rpc is split out in to a separate package, which i like because i don't want to use someone else's rpc
152+
- includes json-over-bebop which is nice. currenlty we rely on cbor for this.
153+
- could not find docs on schema evolution
154+
- considered bebop instead of bare, but bare seemed significantly simpler and more focused
155+
- https://crates.io/crates/borsh (versioned: no)
156+
- provies cross platform compact self-contained binary encoding
157+
- considered borsh instead of bare, but bare seemed significantly simpler and more focused
102158
- rust options like postcard/etc (versioned: no)
159+
- also provides self-contained binary encoding
103160
- not cross platform
104161

105-
## Implementations
162+
other deatils not included in this evaluation:
163+
- number compression (ie static 64 bits vs using minimal bits)
164+
- zero-copy ser/de
165+
- json support & extensions
166+
- rpc
167+
168+
## FAQ
169+
170+
### Why is copying the entire schema for every version better than using decorators for gradual migrations?
171+
172+
- decorators are limited and get very complicated
173+
- it's unclear what version of the protocol a decorator takes effect -- this is helpful
174+
- generated sdks become more and more bloated with every change
175+
- you need a validation build step for your validators
176+
- things you can do with manual migrations
177+
178+
### Why not include RPC?
179+
180+
RPC interfaces are trivial to implement yourself. Libraries that provide RPC interfaces tend to add extra bloat & cognitive load over things like abstracting transports, compatibility with the language's async runtime, and complex codegen to implement handlers.
181+
182+
Usually, you just want a `ToServer` and `ToClient` union that looks like this: [ToClient example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L34), [ToServer example](https://github.com/rivet-dev/rivetkit/blob/b81d9536ba7ccad4449639dd83a770eb7c353617/packages/rivetkit/schemas/client-protocol/v1.bare#L56)
183+
184+
185+
### Isn't copying the schema going to result in a lot of duplicate code?
186+
187+
- yes. after enough pain and suffering of running production APIS, this is what you will end up doing manually, but in a much more painful way.
188+
- having schema versions also makes it much easier to reason about how clients are connecting to your system/the state of an application. incremental migrations dno't let you consider other properties/structures.
189+
- this also lets you reshape your structures.
190+
191+
### Don't migration steps get repetitive?
192+
193+
- most of the time, structures will match exactly. most languages can provide a 1:1 migration.
194+
- the most complicated migration steps will be very deeply nested structures that changed, but that's pretty simple
106195

107-
| Language | BARE | VBARE |
108-
| --- | --- | --- |
109-
| TypeScript | X | X |
110-
| Rust | X | X |
196+
## License
111197

112-
[Full list of BARE implementations](https://baremessages.org/)
198+
MIT
113199

fixtures/tests/basic/v1.bare

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
type Todo struct {
2+
id: u32
3+
title: string
4+
done: bool
5+
}
6+
7+
type App struct {
8+
todos: list<Todo>
9+
}
10+

fixtures/tests/basic/v2.bare

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
// - Introduce TodoId u64; Todo.id: u32 -> TodoId.
2+
// - Convert Todo.done: bool -> TodoStatus enum { OPEN, IN_PROGRESS, DONE }.
3+
// - Add created_at: u64 and tags: list<string> to Todo.
4+
// - Change App.todos: list<Todo> -> map<TodoId, Todo> for fast lookup.
5+
// - Add App.settings: map<string, string> to demonstrate maps.
6+
7+
type TodoId u64
8+
9+
enum TodoStatus {
10+
OPEN
11+
IN_PROGRESS
12+
DONE
13+
}
14+
15+
type Todo struct {
16+
id: TodoId
17+
title: string
18+
status: TodoStatus
19+
created_at: u64
20+
tags: list<string>
21+
}
22+
23+
type App struct {
24+
todos: map<TodoId, Todo>
25+
settings: map<string, string>
26+
}

fixtures/tests/basic/v3.bare

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
// - Extract TodoDetail struct; move title and tags into detail field.
2+
// - Add Priority enum; add priority to Todo.
3+
// - Convert tags: list<string> -> map<TagId, Tag> (list -> map).
4+
// - Add assignee as structured data (Assignee struct) replacing implicit absence.
5+
// - Add history: []Change capturing lifecycle events.
6+
// - Replace App.settings map<string,string> with structured AppConfig.
7+
// - Add Boards: map<BoardId, Board> referencing todos by id with columns.
8+
9+
type TodoId u64
10+
type UserId u64
11+
type TeamId u64
12+
type TagId u32
13+
type BoardId u32
14+
15+
enum TodoStatus {
16+
OPEN
17+
IN_PROGRESS
18+
DONE
19+
}
20+
21+
enum Priority {
22+
LOW
23+
MEDIUM
24+
HIGH
25+
CRITICAL
26+
}
27+
28+
enum AssigneeKind {
29+
NONE
30+
USER
31+
TEAM
32+
}
33+
34+
type Assignee struct {
35+
kind: AssigneeKind
36+
user_id: optional<UserId>
37+
team_id: optional<TeamId>
38+
}
39+
40+
type Tag struct {
41+
id: TagId
42+
name: string
43+
color: optional<string>
44+
}
45+
46+
type TodoDetail struct {
47+
title: string
48+
tags: map<TagId, Tag>
49+
}
50+
51+
type Change struct {
52+
at: u64
53+
kind: ChangeKind
54+
}
55+
56+
enum ChangeKind {
57+
CREATED
58+
UPDATED
59+
STATUS_CHANGED
60+
ASSIGNED
61+
TAGGED
62+
}
63+
64+
type Todo struct {
65+
id: TodoId
66+
status: TodoStatus
67+
created_at: u64
68+
priority: Priority
69+
assignee: Assignee
70+
detail: TodoDetail
71+
history: list<Change>
72+
}
73+
74+
enum Theme {
75+
LIGHT
76+
DARK
77+
SYSTEM
78+
}
79+
80+
type AppConfig struct {
81+
theme: Theme
82+
features: map<string><bool>
83+
}
84+
85+
type Board struct {
86+
id: BoardId
87+
name: string
88+
// Map of column name -> ordered list of TodoId
89+
columns: map<string, list<TodoId>>
90+
}
91+
92+
type App struct {
93+
todos: map<TodoId, Todo>
94+
config: AppConfig
95+
boards: map<BoardId, Board>
96+
}

rust/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[workspace]
2-
members = ["bare-gen", "vbare-compiler", "vbare"]
2+
members = ["vbare-gen", "vbare-compiler", "vbare", "examples/basic"]
33
resolver = "2"
44

55
[workspace.package]
@@ -19,4 +19,4 @@ proc-macro2 = "1.0"
1919
quote = "1.0"
2020
serde = { version = "1.0", features = ["derive"] }
2121
serde_bare = "0.5"
22-
syn = "2.0"
22+
syn = "2.0"

0 commit comments

Comments
 (0)