Skip to content

Expand and clarify consitency/durability docs in store.wit #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 52 additions & 16 deletions imports.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,50 @@ the common denominator for all data types defined by different key-value stores
ensuring compatibility between different key-value stores. Note: the clients will be expecting
serialization/deserialization overhead to be handled by the key-value store. The value could be
a serialized object from JSON, HTML or vendor-specific data types like AWS S3 objects.</p>
<p>Data consistency in a key value store refers to the guarantee that once a write operation
completes, all subsequent read operations will return the value that was written.</p>
<p>Any implementation of this interface must have enough consistency to guarantee &quot;reading your
writes.&quot; In particular, this means that the client should never get a value that is older than
the one it wrote, but it MAY get a newer value if one was written around the same time. These
guarantees only apply to the same client (which will likely be provided by the host or an
external capability of some kind). In this context a &quot;client&quot; is referring to the caller or
guest that is consuming this interface. Once a write request is committed by a specific client,
all subsequent read requests by the same client will reflect that write or any subsequent
writes. Another client running in a different context may or may not immediately see the result
due to the replication lag. As an example of all of this, if a value at a given key is A, and
the client writes B, then immediately reads, it should get B. If something else writes C in
quick succession, then the client may get C. However, a client running in a separate context may
still see A or B</p>
<h2>Consistency</h2>
<p>An implementation of this interface MUST be eventually consistent, but is not required to
provide any consistency guaranteeds beyond that. Practically speaking, eventual consistency is
among the weakest of consistency models, guaranteeing only that values will not be produced
&quot;from nowhere&quot;, i.e. any value read is guaranteed to have been written to that key at some
earlier time. Beyond that, there are no guarantees, and thus a portable component must neither
expect nor rely on anything else.</p>
<p>In the future, additional interfaces may be added to <code>wasi:keyvalue</code> with stronger guarantees,
which will allow components to express their requirements by importing whichever interface(s)
provides matching (or stronger) guarantees. For example, a component requiring strict
serializability might import a (currently hypothetical) <code>strict-serializable-store</code> interface
with a similar signature to <code>store</code> but with much stronger semantic guarantees. On the other
end, a host might either support implementations of both the <code>store</code> and
<code>strict-serializable-store</code> or just the former, in which case the host would immediately reject
a component which imports the unsupported interface.</p>
<p>Here are a few examples of behavior which a component developer might wish to rely on but which
are <em>NOT</em> guaranteed by an eventually consistent system (e.g. a distributed system composed of
multiple replicas, each of which may receive writes in a different order, making no attempt to
converge on a global consensus):</p>
<ul>
<li>
<p>Read-your-own-writes: eventual consistency does <em>NOT</em> guarantee that a write to a given key
followed by a read from the same key will retrieve the same or newer value.</p>
</li>
<li>
<p>Convergence: eventual consistency does <em>NOT</em> guarantee that any two replicas will agree on the
value for a given key -- even after all writes have had time to propagate to all replicas.</p>
</li>
<li>
<p>Last-write-wins: eventual consistency does <em>NOT</em> guarantee that the most recent write will
take precendence over an earlier one; old writes may overwrite newer ones temporarily or
permanently.</p>
</li>
</ul>
<h2>Durability</h2>
<p>This interface does not currently make any hard guarantees about the durability of values
stored. A valid implementation might rely on an in-memory hash table, the contents of which are
lost when the process exits. Alternatively, another implementation might synchronously persist
all writes to disk -- or even to a quorum of disk-backed nodes at multiple locations -- before
returning a result for a <code>set</code> call. Finally, a third implementation might persist values
asynchronously on a best-effort basis without blocking <code>set</code> calls, in which case an I/O error
could occur after the component instance which originally made the call has exited.</p>
<p>Future versions of <code>wasi:keyvalue</code> may provide ways to query and control the durability and
consistency provided by the backing implementation.</p>
<hr />
<h3>Types</h3>
<h4><a id="error"></a><code>variant error</code></h4>
Expand Down Expand Up @@ -74,7 +104,7 @@ there are no more keys to fetch.
<h4><a id="bucket"></a><code>resource bucket</code></h4>
<p>A bucket is a collection of key-value pairs. Each key-value pair is stored as a entry in the
bucket, and the bucket itself acts as a collection of all these entries.</p>
<p>It is worth noting that the exact terminology for bucket in key-value stores can very
<p>It is worth noting that the exact terminology for bucket in key-value stores can vary
depending on the specific implementation. For example:</p>
<ol>
<li>Amazon DynamoDB calls a collection of key-value pairs a table</li>
Expand All @@ -85,7 +115,13 @@ depending on the specific implementation. For example:</p>
<li>Memcached calls a collection of key-value pairs a slab</li>
<li>Azure Cosmos DB calls a collection of key-value pairs a container</li>
</ol>
<h2>In this interface, we use the term <a href="#bucket"><code>bucket</code></a> to refer to a collection of key-value pairs</h2>
<p>In this interface, we use the term <a href="#bucket"><code>bucket</code></a> to refer to a connection to a collection of
key-value pairs.</p>
<h2>Note that opening two <a href="#bucket"><code>bucket</code></a> resources using the same identifier MAY result in connections
to two separate replicas in a distributed database, and that writes to one of those
resources are not guaranteed to be readable from the other resource promptly (or ever, in
the case of a replica failure or message reordering). See the <code>Consistency</code> section of the
<code>store</code> interface documentation for details.</h2>
<h3>Functions</h3>
<h4><a id="open"></a><code>open: func</code></h4>
<p>Get the bucket with the specified identifier.</p>
Expand Down
68 changes: 52 additions & 16 deletions watch-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,50 @@ the common denominator for all data types defined by different key-value stores
ensuring compatibility between different key-value stores. Note: the clients will be expecting
serialization/deserialization overhead to be handled by the key-value store. The value could be
a serialized object from JSON, HTML or vendor-specific data types like AWS S3 objects.</p>
<p>Data consistency in a key value store refers to the guarantee that once a write operation
completes, all subsequent read operations will return the value that was written.</p>
<p>Any implementation of this interface must have enough consistency to guarantee &quot;reading your
writes.&quot; In particular, this means that the client should never get a value that is older than
the one it wrote, but it MAY get a newer value if one was written around the same time. These
guarantees only apply to the same client (which will likely be provided by the host or an
external capability of some kind). In this context a &quot;client&quot; is referring to the caller or
guest that is consuming this interface. Once a write request is committed by a specific client,
all subsequent read requests by the same client will reflect that write or any subsequent
writes. Another client running in a different context may or may not immediately see the result
due to the replication lag. As an example of all of this, if a value at a given key is A, and
the client writes B, then immediately reads, it should get B. If something else writes C in
quick succession, then the client may get C. However, a client running in a separate context may
still see A or B</p>
<h2>Consistency</h2>
<p>An implementation of this interface MUST be eventually consistent, but is not required to
provide any consistency guaranteeds beyond that. Practically speaking, eventual consistency is
among the weakest of consistency models, guaranteeing only that values will not be produced
&quot;from nowhere&quot;, i.e. any value read is guaranteed to have been written to that key at some
earlier time. Beyond that, there are no guarantees, and thus a portable component must neither
expect nor rely on anything else.</p>
<p>In the future, additional interfaces may be added to <code>wasi:keyvalue</code> with stronger guarantees,
which will allow components to express their requirements by importing whichever interface(s)
provides matching (or stronger) guarantees. For example, a component requiring strict
serializability might import a (currently hypothetical) <code>strict-serializable-store</code> interface
with a similar signature to <code>store</code> but with much stronger semantic guarantees. On the other
end, a host might either support implementations of both the <code>store</code> and
<code>strict-serializable-store</code> or just the former, in which case the host would immediately reject
a component which imports the unsupported interface.</p>
<p>Here are a few examples of behavior which a component developer might wish to rely on but which
are <em>NOT</em> guaranteed by an eventually consistent system (e.g. a distributed system composed of
multiple replicas, each of which may receive writes in a different order, making no attempt to
converge on a global consensus):</p>
<ul>
<li>
<p>Read-your-own-writes: eventual consistency does <em>NOT</em> guarantee that a write to a given key
followed by a read from the same key will retrieve the same or newer value.</p>
</li>
<li>
<p>Convergence: eventual consistency does <em>NOT</em> guarantee that any two replicas will agree on the
value for a given key -- even after all writes have had time to propagate to all replicas.</p>
</li>
<li>
<p>Last-write-wins: eventual consistency does <em>NOT</em> guarantee that the most recent write will
take precendence over an earlier one; old writes may overwrite newer ones temporarily or
permanently.</p>
</li>
</ul>
<h2>Durability</h2>
<p>This interface does not currently make any hard guarantees about the durability of values
stored. A valid implementation might rely on an in-memory hash table, the contents of which are
lost when the process exits. Alternatively, another implementation might synchronously persist
all writes to disk -- or even to a quorum of disk-backed nodes at multiple locations -- before
returning a result for a <code>set</code> call. Finally, a third implementation might persist values
asynchronously on a best-effort basis without blocking <code>set</code> calls, in which case an I/O error
could occur after the component instance which originally made the call has exited.</p>
<p>Future versions of <code>wasi:keyvalue</code> may provide ways to query and control the durability and
consistency provided by the backing implementation.</p>
<hr />
<h3>Types</h3>
<h4><a id="error"></a><code>variant error</code></h4>
Expand Down Expand Up @@ -72,7 +102,7 @@ there are no more keys to fetch.
<h4><a id="bucket"></a><code>resource bucket</code></h4>
<p>A bucket is a collection of key-value pairs. Each key-value pair is stored as a entry in the
bucket, and the bucket itself acts as a collection of all these entries.</p>
<p>It is worth noting that the exact terminology for bucket in key-value stores can very
<p>It is worth noting that the exact terminology for bucket in key-value stores can vary
depending on the specific implementation. For example:</p>
<ol>
<li>Amazon DynamoDB calls a collection of key-value pairs a table</li>
Expand All @@ -83,7 +113,13 @@ depending on the specific implementation. For example:</p>
<li>Memcached calls a collection of key-value pairs a slab</li>
<li>Azure Cosmos DB calls a collection of key-value pairs a container</li>
</ol>
<h2>In this interface, we use the term <a href="#bucket"><code>bucket</code></a> to refer to a collection of key-value pairs</h2>
<p>In this interface, we use the term <a href="#bucket"><code>bucket</code></a> to refer to a connection to a collection of
key-value pairs.</p>
<h2>Note that opening two <a href="#bucket"><code>bucket</code></a> resources using the same identifier MAY result in connections
to two separate replicas in a distributed database, and that writes to one of those
resources are not guaranteed to be readable from the other resource promptly (or ever, in
the case of a replica failure or message reordering). See the <code>Consistency</code> section of the
<code>store</code> interface documentation for details.</h2>
<h3>Functions</h3>
<h4><a id="open"></a><code>open: func</code></h4>
<p>Get the bucket with the specified identifier.</p>
Expand Down
71 changes: 54 additions & 17 deletions wit/store.wit
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,52 @@
/// ensuring compatibility between different key-value stores. Note: the clients will be expecting
/// serialization/deserialization overhead to be handled by the key-value store. The value could be
/// a serialized object from JSON, HTML or vendor-specific data types like AWS S3 objects.
///
/// ## Consistency
///
/// Data consistency in a key value store refers to the guarantee that once a write operation
/// completes, all subsequent read operations will return the value that was written.
///
/// Any implementation of this interface must have enough consistency to guarantee "reading your
/// writes." In particular, this means that the client should never get a value that is older than
/// the one it wrote, but it MAY get a newer value if one was written around the same time. These
/// guarantees only apply to the same client (which will likely be provided by the host or an
/// external capability of some kind). In this context a "client" is referring to the caller or
/// guest that is consuming this interface. Once a write request is committed by a specific client,
/// all subsequent read requests by the same client will reflect that write or any subsequent
/// writes. Another client running in a different context may or may not immediately see the result
/// due to the replication lag. As an example of all of this, if a value at a given key is A, and
/// the client writes B, then immediately reads, it should get B. If something else writes C in
/// quick succession, then the client may get C. However, a client running in a separate context may
/// still see A or B
/// An implementation of this interface MUST be eventually consistent, but is not required to
/// provide any consistency guaranteeds beyond that. Practically speaking, eventual consistency is
/// among the weakest of consistency models, guaranteeing only that values will not be produced
/// "from nowhere", i.e. any value read is guaranteed to have been written to that key at some
/// earlier time. Beyond that, there are no guarantees, and thus a portable component must neither
/// expect nor rely on anything else.
///
/// In the future, additional interfaces may be added to `wasi:keyvalue` with stronger guarantees,
/// which will allow components to express their requirements by importing whichever interface(s)
/// provides matching (or stronger) guarantees. For example, a component requiring strict
/// serializability might import a (currently hypothetical) `strict-serializable-store` interface
/// with a similar signature to `store` but with much stronger semantic guarantees. On the other
/// end, a host might either support implementations of both the `store` and
/// `strict-serializable-store` or just the former, in which case the host would immediately reject
/// a component which imports the unsupported interface.
///
/// Here are a few examples of behavior which a component developer might wish to rely on but which
/// are _NOT_ guaranteed by an eventually consistent system (e.g. a distributed system composed of
/// multiple replicas, each of which may receive writes in a different order, making no attempt to
/// converge on a global consensus):
///
/// - Read-your-own-writes: eventual consistency does _NOT_ guarantee that a write to a given key
/// followed by a read from the same key will retrieve the same or newer value.
///
/// - Convergence: eventual consistency does _NOT_ guarantee that any two replicas will agree on the
/// value for a given key -- even after all writes have had time to propagate to all replicas.
Comment on lines +37 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing your point here, but I thought that Eventual Consistency did mean that eventually all replicas will converge on the same value... you just don't know how long it'll take.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, but see @Mossaka's comment above:

Let's say, replica A believes k=v1 arrives first due to clock skew, and replica B believes k=v2 arrives first. Then because of the Last-Write-Win conflict resolution mechanism, these two replicas will permanently have conflicting values for the same key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see how that could arise in a fully-weak consistency model; but in that case we should not say "Eventual Consistency" above. That being said, are we aware of any particular kv-store implementations we'd like to allow that aren't even Eventual Consistency? I had thought EC was sortof the "lower bound" for traditional KV Stores. If we start talking about "caches", then I can see this happening, but I guess that's a question: even if we're not making durability guarantees, do we want implementations that actively evict keys (as opposed to only losing them on crashes)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mossaka seems to be saying that eventual consistency is that weak:

It is important to recognize that while the keyvalue system aims for replicas to have the same view of the data, realistically, a consistent state in eventual consistency does not guarantee that all replicas will converge to exactly the same value for every key.

If someone can point me to an authoritative definition of what "eventual consistency" means and what it does and does not include, I'm happy to use that as a reference and update this document to be consistent with it. So far, it seems that everyone has their own, incompatible idea of what it means.

Maybe there isn't a precise, widely-accepted definition? In that case, I can note that in the docs here, e.g. "Although 'eventual consistency' has no precise, widely-accepted definition, here we define it to mean..." Or just not use the term at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to read any alternative definitions, but the wikipedia article does clearly describe convergence over time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I thought it meant and what I want it to mean. @Mossaka can you explain where your assertion that "a consistent state in eventual consistency does not guarantee that all replicas will converge to exactly the same value for every key" came from? It seems to contradict what the Wikipedia article is claiming.

///
/// - Last-write-wins: eventual consistency does _NOT_ guarantee that the most recent write will
/// take precendence over an earlier one; old writes may overwrite newer ones temporarily or
/// permanently.
///
/// ## Durability
///
/// This interface does not currently make any hard guarantees about the durability of values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to leave the durability wide open. I am wondering in your case 3 - under async set calls scenario, we want to emphasize that the implementation should still guarantee "Read your write" data consistency.

Now, there is a question of "what happens if an async I/O error occurs right after the set call completes successfully": a weak point of the current specification and I was hoping that we could address this one.

In a strict interpretation of the spec, once set is Ok, the handle SHOULD behave as if the value is now present. A get on the same handle SHOULD return the new value.

If the store experiences a critical I/O failure that causes data corruption or data loss, there are currently no instructions on how the store should respond. Should it return Err(error::other(...)) on subsequent get calls?

I think there are two possible ways to extend the specification to address the above concerns:

Handle defunct after errors

We could define that once a bucket handle experiences a critical I/O error, all further operations on that handle must return an error. That is, if a store fails after set, it would no longer provide a consistent view for subsequent get operations. This does not violate the “read your write” guarantee since the handle is considered defunct.

a Best-effort guarantee tied to success conditions

The specification could define that “read your writes” holds as long as the store does not fail irrecoverably between operations. A get operation should return a Err(error::other("I/O failure")) to reflect the error condition from the store.

Copy link
Member

@lukewagner lukewagner Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mossaka Based on the previous discussion above, I think there's performance reasons not to require "read your writes" (even when reads follow writes on the same bucket handle). In particular, if the implementation of write sends the written values out over the network to a primary/writer node, and the implementation of read sends a request over the network to a read replica (distinct from the primary writer node), then you won't have "read your writes" without maintaining extra cached copies or making extra network requests. Thus, I think even when there is not an irrecoverable error, we shouldn't say that "read your writes" holds.

/// stored. A valid implementation might rely on an in-memory hash table, the contents of which are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For in-memory stores, we probably want to emphasize that the data might be lost due to store crashed, and the Best-effort guarantee described in my comment above should apply to our specification - stating that the "read your write" consistency contract should only apply to store operating under normal conditions.

/// lost when the process exits. Alternatively, another implementation might synchronously persist
/// all writes to disk -- or even to a quorum of disk-backed nodes at multiple locations -- before
/// returning a result for a `set` call. Finally, a third implementation might persist values
/// asynchronously on a best-effort basis without blocking `set` calls, in which case an I/O error
/// could occur after the component instance which originally made the call has exited.
///
/// Future versions of `wasi:keyvalue` may provide ways to query and control the durability and
/// consistency provided by the backing implementation.
interface store {
/// The set of errors which may be raised by functions in this package
variant error {
Expand Down Expand Up @@ -56,7 +86,7 @@ interface store {
/// A bucket is a collection of key-value pairs. Each key-value pair is stored as a entry in the
/// bucket, and the bucket itself acts as a collection of all these entries.
///
/// It is worth noting that the exact terminology for bucket in key-value stores can very
/// It is worth noting that the exact terminology for bucket in key-value stores can vary
/// depending on the specific implementation. For example:
///
/// 1. Amazon DynamoDB calls a collection of key-value pairs a table
Expand All @@ -67,7 +97,14 @@ interface store {
/// 6. Memcached calls a collection of key-value pairs a slab
/// 7. Azure Cosmos DB calls a collection of key-value pairs a container
///
/// In this interface, we use the term `bucket` to refer to a collection of key-value pairs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the wording "connection to a collection of key-value pairs" instead of "a collection of key-value pairs" to be a bit strange - it now implies a networked view instead of a logical container. What does this say to downstream implementation that does not involve networking, e.g. a filesystem implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used that wording to emphasize the fact that you can have to bucket resource handles pointing to the same key-value connection but connected to different replicas in an eventually consistent distributed system, in which case they'll see that collection from different points of view such that values may arrive in different orders, etc. In other words, I'm trying to emphasize that each handle represents a potentially unique view of the collection which is not necessarily consistent with another view, despite being opened with the same name.

It might help to use two different terms for these concepts, e.g. "bucket" could refer to the collection while "bucket-view" refers to a specific view of the collection, similar the distinction between a value and a pointer to a value in a programing language.

In the interest of minimizing further changes to this PR, though, would it help to change "connection to a collection of key-value pairs" to "view of a collection of key-value pairs" (and likewise replace "connection" with "view" anywhere else it appears)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I am okay to merge this PR as is because we can always update the spec if other people find this confusing.

/// In this interface, we use the term `bucket` to refer to a connection to a collection of
/// key-value pairs.
///
/// Note that opening two `bucket` resources using the same identifier MAY result in connections
/// to two separate replicas in a distributed database, and that writes to one of those
/// resources are not guaranteed to be readable from the other resource promptly (or ever, in
/// the case of a replica failure or message reordering). See the `Consistency` section of the
/// `store` interface documentation for details.
resource bucket {
/// Get the value associated with the specified `key`
///
Expand Down