Skip to content

Conversation

@ma2bd
Copy link
Contributor

@ma2bd ma2bd commented Nov 10, 2025

Motivation

Prevent DB corruption

Proposal

  • Add a lease management API to DirectWritableKeyValueStore
  • Use it in journaling.rs

TODO:

  • tests
  • add journaling to the storage-service?
  • deduplicate code
  • make the TTL configurable
  • in scylladb we could also use USING TTL

Test Plan

CI + TBD

Release Plan

  • These changes should be backported to the latest testnet branch, then
    • be released in a validator hotfix.

.await
{
tracing::error!(
"Failed to renew lease for UUID {uuid} and root key {root_key_owned:?}: {err}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: discard lease

let (applied,) = row?;
if !applied {
// Delete failed - UUID doesn't match or row doesn't exist
tracing::warn!("Failed to discard lease");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: better message


// Check if the LWT was applied
let rows = result.into_rows_result()?;
let mut rows = rows.rows::<(bool,)>()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this

@ma2bd ma2bd requested a review from MathieuDutSik November 11, 2025 03:51
Copy link
Contributor

@MathieuDutSik MathieuDutSik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment:

  • To make the examples compile under wasm32, the best is to make journaling, lru and the like not available in wasm32, where anyway they are not used.
  • After those corrections, the DynamoDB end-to-end test fail randomly and the ScyllaDB test fail due to syntax error in the request.
  • The obtain_lease is returning a Result<Result<_,_>,_>. Such constructions were previously proposed and refused for linera-storage in favor of returning a Result<Option<_>,_>. I am find with doing such a construction, but better be clear about that.
  • To put the data in the journaling seems preposterous. Why not instead create a LeaseDatabase? That seems cleaner.
  • To have the tokio::spawn in open_exclusive which is not async is a break with previous practice. Could we instead have a boolean like need_lease in the store and whenever we need a database access we set it up. That is the lease is obtained at the first time we do an access.
  • Independent of this PR, we have in journaling.rs the entry Journal = 1, which is contrary to our practices.
  • In journaling, the database error (pattern "Err(err)" is simply eaten instead of being propagated upstream with ?.
  • The uuid in journaling.rs is a random number. Why not instead put the Thread ID? Isn't what we are really after.
  • Both for DynamoDB and ScyllaDB, we are going to implement the lease with a key/value. Whjy not use the existing system. Adding more attributes (in DynamoDB) and keys (in ScyllaDB) seems like a waste. If it were up to me, I would write a single key [255] and the value would be the thread-id and the end-time.
  • The open_exclusive is doing the job of spanning the thread. But the threads that does the cancellation token is done in the same thread that creates it.

Why not create a special (key, value), never deletes it.

  • If absent, create it with (thread_id, duration).
  • If present, if same thread_id, go ahead, if not, check the duration.

Do that check at the first access of an async operation in a LeaseDatabase connector.

@ma2bd
Copy link
Contributor Author

ma2bd commented Nov 11, 2025

  • To make the examples compile under wasm32, the best is to make journaling, lru and the like not available in wasm32, where anyway they are not used.

yes

  • After those corrections, the DynamoDB end-to-end test fail randomly and the ScyllaDB test fail due to syntax error in the request.

yeah claude must have hallucinated some of the DB queries

  • The obtain_lease is returning a Result<Result<_,_>,_>. Such constructions were previously proposed and refused for linera-storage in favor of returning a Result<Option<_>,_>. I am find with doing such a construction, but better be clear about that.

I'm not seeing which Result<Option<_>,_> you mean in particular. It looks like in many cases Ok(None) means "the value was missing". If we change obtain_lease to use Result<Option<_>,_> instead of Result<Result<(), _>>, we will have that Ok(None) means "everything's fine" and Ok(Some(LeaseConflict)) means "there was an error" which is quite confusing.

  • To put the data in the journaling seems preposterous. Why not instead create a LeaseDatabase? That seems cleaner.

We can discuss but reusing DirectWritableKeyValueStore made the PR much smaller. Also, in practice, journals are used for the shared remote database that we want to protect with a lease.

  • To have the tokio::spawn in open_exclusive which is not async is a break with previous practice. Could we instead have a boolean like need_lease in the store and whenever we need a database access we set it up. That is the lease is obtained at the first time we do an access.

tokio::spawn is not async so it works. We want the lease to be managed immediately after open_exclusive not after the write DB write. We could also make open_exclusive async.

  • Independent of this PR, we have in journaling.rs the entry Journal = 1, which is contrary to our practices.

unrelated and not changeable without migration

  • In journaling, the database error (pattern "Err(err)" is simply eaten instead of being propagated upstream with ?.

probably intended

  • The uuid in journaling.rs is a random number. Why not instead put the Thread ID? Isn't what we are really after.

The thread ID is not stable for a given root key. We could certainly add more information in addition to the UUID.

  • Both for DynamoDB and ScyllaDB, we are going to implement the lease with a key/value. Why not use the existing system. Adding more attributes (in DynamoDB) and keys (in ScyllaDB) seems like a waste. If it were up to me, I would write a single key [255] and the value would be the thread-id and the end-time.

This won't allow the atomic test-and-update, which is what this PR is about.

  • The open_exclusive is doing the job of spanning the thread. But the threads that does the cancellation token is done in the same thread that creates it.

so what? define "thread"

Why not create a special (key, value), never deletes it.

  • If absent, create it with (thread_id, duration).
  • If present, if same thread_id, go ahead, if not, check the duration.

Do that check at the first access of an async operation in a LeaseDatabase connector.

not sure where you're going with this

@ma2bd ma2bd changed the title Add notion of lease in journaling Add a notion of lease in journaling Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants