Make state management more explicit by alexjg · Pull Request #496 · automerge/automerge-repo

alexjg · 2025-09-12T13:58:20Z

Currently the lifecycle of a document is a little tricky to follow. We have an xstate state machine in DocHandle, but the events which modify that state machine are emitted by various asynchronous processes which have a reference to the document scattered throughout the system. This makes it hard to answer questions like "why is this document unavailable" because we don't have a single place in the codebase where we have all the information used to make that decision at the same time.

This commit is an attempt to reorganise state management to make these kinds of questions easier to answer. Roughly speaking it looks like this:

Replace the CollectionSynchronizer and DocSynchronizer classes with a single DocumentPhasor class, which manages the lifecycle of a single document.
Move all the state logic out of DocHandle. DocHandles are now just a place for users to register interest in a document and get notified of changes. All actual state changes are routed through the DocumentPhasor

The "DocumentPhasor" state machine manages a set of Phases, each of which represents different stages of the document lifecycle ("loading", "requesting", "ready", "unavailable"). Modifications to the phasor are achieved by passing events to the DocumentPhasor.handleEvent method, which returns an object describing what has changed.

This PR is still in-progress because

a) I think the state machine logic is pretty verbose and can probably be made more succinct
b) I need to write up a few things which are not covered by tests relating to the metrics we use for monitoring the sync server
c) I need to go through in detail looking for things which are not covered by tests in the original implementation as this is a large change.

Currently the lifecycle of a document is a little tricky to follow. We have an xstate state machine in `DocHandle`, but the events which modify that state machine are emitted by various asynchronous processes which have a reference to the document scattered throughout the system. This makes it hard to answer questions like "why is this document unavailable" because we don't have a single place in the codebase where we have all the information used to make that decision at the same time. This commit is an attempt to reorganise state management to make these kinds of questions easier to answer. Roughly speaking it looks like this: * Replace the `CollectionSynchronizer` and `DocSynchronizer` classes with a single `DocumentPhasor` class, which manages the lifecycle of a single document. * Move all the state logic out of `DocHandle`. `DocHandle`s are now just a place for users to register interest in a document and get notified of changes. All actual state changes are routed through the `DocumentPhasor` The "DocumentPhasor" state machine manages a set of `Phase`s, each of which represents different stages of the document lifecycle ("loading", "requesting", "ready", "unavailable"). Modifications to the phasor are achieved by passing events to the `DocumentPhasor.handleEvent` method, which returns an object describing what has changed.

msakrejda

I didn't have time to look at the full thing this morning, but I made it through most of DocumentPhasor. In general, I really like this approach: I think it achieves the goals you laid out in the overview. I left some nitpicks and a bunch of questions, and I think I need to read it over another couple of times to fully understand it, but this is a great direction.

msakrejda · 2025-09-12T14:34:31Z

-  unload() {
-    this.#machine.send({ type: UNLOAD })
-  }
+  unload() {}

  /** Called by the repo to reuse an unloaded handle. */
-  reload() {
-    this.#machine.send({ type: RELOAD })
-  }
+  reload() {}


If these are no-ops now, should they be deprecated?

msakrejda · 2025-09-12T14:37:44Z


-  /** Called by the repo when the document is deleted. */
+  /** Called by the repo when the document is deleted.
+   * @deprecated Use Repo#delete instead


Should we be deprecating this? Since you can update a document via its handle, it seems reasonable to want to delete it via its handle, too, no? No strong feelings, just wondering.

msakrejda · 2025-09-12T15:10:36Z

+//
+// ### Connections
+//
+// Each `DocumentPhasor` manages it's own set of connected peers. The caller


Suggested change

// Each `DocumentPhasor` manages it's own set of connected peers. The caller

// Each `DocumentPhasor` manages its own set of connected peers. The caller

msakrejda · 2025-09-12T15:13:31Z

+// sync state and a share policy before it can be used to send messages (or not,
+// as the case maybe). Callers should provide these via the


... (or not, as the case maybe).

What does this mean?

msakrejda · 2025-09-12T15:24:38Z

+// Phase transitions always happen after applying the new event and as a result of the
+// `Phase.transition` method, which makes it easy to understand why transitions occur.


I find the wording here confusing. Is this equivalent to

Suggested change

// Phase transitions always happen after applying the new event and as a result of the

// `Phase.transition` method, which makes it easy to understand why transitions occur.

// Phase transitions always happen as a result of the `Phase.transition` method, which may

// happen after applying a new event. This makes it easy to understand why transitions occur.

? And localChange never generates a phase transition?

msakrejda · 2025-09-12T15:43:57Z

+        this.#outboundEphemeralMessages.push(event.message)
+        break
+      }
+      case "reload": {


This event does not affect what peers we know about, right?

msakrejda · 2025-09-12T15:45:32Z

+        }
+        break
+      }
+      case "peer_removed":


On peer_added, we tell our current phase about the peer. Do we need a corresponding relay to the current phase here?

msakrejda · 2025-09-12T15:46:45Z

+        const exhaustivenessCheck: never = event
+        throw new Error(`Unhandled event type: ${exhaustivenessCheck}`)


Do we want event or event.type here?

msakrejda · 2025-09-12T15:53:34Z

+    let stateChange = null
+    if (before.phase?.name !== this.#currentPhase.name) {
+      stateChange = {
+        before: before.phase?.name || "loading", // TODO: introduce a "starting" phase?


No strong feelings on a "starting" phase yet, but I'm a fan of sentinel values like that in general...

msakrejda · 2025-09-12T15:56:18Z

+          break
+        default:
+          const exhaustiveCheck: never = transition
+          throw new Error(`Unhandled transition: ${exhaustiveCheck}`)


Similar to above, should this be transition.to?

msakrejda

I took another look at this, focusing on the phases. I still think it broadly makes sense, though the interface between DocumentPhasor and the phases seems pretty intricate. Maybe that's necessary complexity, but I wonder if we can avoid some of it. E.g., loadRunning seems like a quirk.

I also feel like I'm not quite clear on the sync protocol: are there docs on that?

msakrejda · 2025-09-19T22:16:45Z

+      case "not_requesting_due_to_sharepolicy":
+        if (shouldShare) {
+          // This would happen if the sharePolicy changed since the last time we checked
+          // somehow


Does this mean that sharePolicyChanged above should not be empty? Or is that not relevant here?

msakrejda · 2025-09-19T23:54:51Z

+            state: "awaiting_send",
+          })
+          return this.generateMessage({
+            docId: this.#documentId,


this.generateMessage doesn't take a docId?

msakrejda · 2025-09-20T00:03:08Z

+//
+// Initially a phasor is in a "network not ready" state. This means that it will not mark
+// documents as unavailable until the "network_ready" event has been received. Callers
+// should dispatch this event as soon as the network subsystem says it is ready.


I don't think this refactor should change the state machine, but in the long term, it would be nice to support the network going in and out of the "ready" state. Have you considered how that fits in here?

msakrejda reviewed Sep 12, 2025

View reviewed changes

msakrejda reviewed Sep 20, 2025

View reviewed changes

	// Each `DocumentPhasor` manages it's own set of connected peers. The caller
	// Each `DocumentPhasor` manages its own set of connected peers. The caller

		// sync state and a share policy before it can be used to send messages (or not,
		// as the case maybe). Callers should provide these via the

		// Phase transitions always happen after applying the new event and as a result of the
		// `Phase.transition` method, which makes it easy to understand why transitions occur.

		const exhaustivenessCheck: never = event
		throw new Error(`Unhandled event type: ${exhaustivenessCheck}`)

Conversation

alexjg commented Sep 12, 2025

Uh oh!

msakrejda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msakrejda left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants