Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More responsive to network outages #360

Merged

Conversation

paulgb
Copy link
Member

@paulgb paulgb commented Dec 13, 2024

This PR introduces a number of improvements that make YSweetProvider more responsive to network conditions.

The main issue this attempts to address is that browsers are surprisingly inconsistent in how long it takes to terminate a WebSocket connection when the network is lost. I have observed the timeout taking anywhere from 10s (Firefox) to 95s (Chrome) when using a VPN. (Without the VPN it is a bit better, presumably because the VPN makes the network status more opaque to the browser.)

Passive / active connection monitoring

This PR introduces passive connection monitoring that switches into an active heartbeat when message volume is low.

When the server is sending the client lots of messages (e.g. frequent awareness updates), we know the connection is alive, and don't have to send any sort of heartbeat to confirm it.

When we haven't received a message for a certain time threshold (MAX_TIMEOUT_BETWEEN_HEARTBEATS), we don't know whether it's because the connection was lost or just a lull in events, so we start to send an active heartbeat. This uses the same mechanism on the server introduced for detecting local unsaved changes: we simply re-send the latest localVersion (formerly lastSyncSent) instead of incrementing it (since the underlying data hasn't changed).

Browser connection status monitoring

Browsers also provide a mechanism for being notified about online/offline state. It's not entirely reliable, but we can use it as an additional signal to accelerate reconnection. The way it works is:

  • When we receive and offline event, we immediately send a heartbeat, which means we only have to wait MAX_TIMEOUT_WITHOUT_RECEIVING_HEARTBEAT instead of up to MAX_TIMEOUT_BETWEEN_HEARTBEATS + MAX_TIMEOUT_WITHOUT_RECEIVING_HEARTBEAT before showing as offline.
  • When we receive an online event, if we are currently waiting to retry a network operation (either obtaining credentials or connecting to a websocket), we bypass the wait and go directly to making the request.

Copy link

vercel bot commented Dec 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
y-sweet-demos ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 13, 2024 7:50pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
y-sweet-debugger ⬜️ Ignored (Inspect) Visit Preview Dec 13, 2024 7:50pm
y-sweet-gendocs ⬜️ Ignored (Inspect) Visit Preview Dec 13, 2024 7:50pm

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to f9e68ca in 45 seconds

More details
  • Looked at 272 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. js-pkg/client/src/sleeper.ts:6
  • Draft comment:
    The reject function is defined but never used. Consider removing it to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The reject function is defined but never used. This is unnecessary and could be misleading.
2. js-pkg/client/src/provider.ts:333
  • Draft comment:
    Ensure reconnectSleeper is not already in use before reassigning it to avoid potential issues.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. js-pkg/client/src/provider.ts:343
  • Draft comment:
    Ensure reconnectSleeper is not already in use before reassigning it to avoid potential issues.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_13iEYblxATMs42Ye


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@jakelazaroff jakelazaroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up the comments, I think we can simplify this a little bit by:

  1. Rather than being able to "set" and "clear" a heartbeat, just have one operation — "reset", which clears any heartbeat timeout and sets another one. That gets called on websocket connection and on receiving a message. (We might still want a "clear" that gets called in situations where we don't want to set another one, like actually going offline).
  2. Rather than just looking for a connection timeout on specific sync messages, set it on every sent message. This means we'll initiate the reconnection flow if it takes too long to get a response to any message. (This isn't technically true: if we send two messages and then get one response, it will clear the connection timeout without setting another, but we'd fall back to the heartbeat anyway in that scenario).

Also, I can't comment on lines that haven't changed, but I think we should clear both the heartbeat and connection timeout in websocketClose and websocketError

js-pkg/client/src/provider.ts Outdated Show resolved Hide resolved
Comment on lines 233 to 236
let hasLocalChanges = this.ackedVersion !== this.localVersion
if (hasLocalChanges === this.hasLocalChanges) {
return
}
Copy link
Contributor

@jakelazaroff jakelazaroff Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just occurred to me now — could we remove a bit of state by making a getter to derive hasLocalChanges? Like, within the class:

  get hasLocalChanges() {
    return this.ackedVersion !== this.localVersion
  }

Thinking it over now, I'm not actually 100% clear when local-changes should get emitted — it seems like we go through this code path when both sending and receiving a sync message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, done. Unfortunately it adds a bit of complexity at the former call sites because we want to emit if and only if the hasLocalChanges value would change, but I do think it's an improvement overall.

// doesn't mean we entirely true the browser, since it can be wrong
// (e.g. in the case that the connection is over localhost).
this.checkSync()
this.setConnectionTimeout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's a drawback to just calling this inside send rather than here and in the setHeartbeat callback? It would get called in both paths anyway via checkSync, plus that way we'd initiate the reconnection flow if the server takes too long to respond to any message (rather than just specific sync messages). We could rename MAX_TIMEOUT_WITHOUT_RECEIVING_HEARTBEAT to MAX_RESPONSE_TIMEOUT or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble with that is that not every message necessarily has a response, so we might encounter false positives.

That said, I have moved the call into checkSync, since we know that every checkSync call should return a response. This also should speed up offline detection if a user is interacting with the state when the connection is dropped.

Comment on lines 187 to 203
private clearHeartbeat() {
if (this.heartbeatHandle) {
clearTimeout(this.heartbeatHandle)
this.heartbeatHandle = null
}
}

private setHeartbeat() {
if (this.heartbeatHandle) {
return
}
this.heartbeatHandle = setTimeout(() => {
this.checkSync()
this.setConnectionTimeout()
this.heartbeatHandle = null
}, MAX_TIMEOUT_BETWEEN_HEARTBEATS)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to think if there's ever a scenario in which we want to bail out of setting a heartbeat if there's one queued, rather than just resetting it? I think we basically always want a heartbeat going and to just keep "moving the timer up" when we receive messages from the server.

I'd change these methods like this:

Suggested change
private clearHeartbeat() {
if (this.heartbeatHandle) {
clearTimeout(this.heartbeatHandle)
this.heartbeatHandle = null
}
}
private setHeartbeat() {
if (this.heartbeatHandle) {
return
}
this.heartbeatHandle = setTimeout(() => {
this.checkSync()
this.setConnectionTimeout()
this.heartbeatHandle = null
}, MAX_TIMEOUT_BETWEEN_HEARTBEATS)
}
private clearHeartbeat() {
if (this.heartbeatHandle) {
clearTimeout(this.heartbeatHandle)
this.heartbeatHandle = null
}
}
private resetHeartbeat() {
this.clearHeartbeat()
this.heartbeatHandle = setTimeout(() => {
this.checkSync()
this.setConnectionTimeout()
this.heartbeatHandle = null
}, MAX_TIMEOUT_BETWEEN_HEARTBEATS)
}

And then we should probably call clearHeartbeat in websocketClose and websocketError.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 2ece974 in 20 seconds

More details
  • Looked at 28 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. js-pkg/client/src/provider.ts:272
  • Draft comment:
    setConnectionTimeout is called multiple times unnecessarily, which can lead to redundant operations. Consider refactoring to ensure it's only called when needed.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_PHWYw7aXZvzIJ0js


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on eff7cd2 in 27 seconds

More details
  • Looked at 92 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. js-pkg/client/src/provider.ts:228
  • Draft comment:
    The comment here is misleading. It suggests that the local version is incremented before emitting, but the emission happens after the increment. Consider revising the comment to accurately reflect the method's behavior.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. js-pkg/client/src/provider.ts:242
  • Draft comment:
    The comment here is misleading. It suggests that the local version is incremented before emitting, but the method does not increment the local version. Consider revising the comment to accurately reflect the method's behavior.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_okh5oQZ2BCRje4Qz


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 036b06e in 3 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. js-pkg/client/src/provider.ts:244
  • Draft comment:
    The change from (version === this.localVersion) to version === this.localVersion is stylistic and doesn't affect functionality. Ensure this aligns with your coding standards.
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The PR description mentions a change in the way localVersion is handled, but the code change is minimal and doesn't seem to address the main points of the description.

Workflow ID: wflow_I37hJoJPfVxHeozt


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 81f66ff in 3 seconds

More details
  • Looked at 36 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. js-pkg/client/src/provider.ts:190
  • Draft comment:
    The resetHeartbeat method is a good addition for managing heartbeats. Ensure that this method is consistently used wherever a heartbeat reset is needed, such as in websocketOpen and receiveMessage. This helps maintain a single point of control for heartbeat logic.
  • Reason this comment was not posted:
    Confidence changes required: 20%
    The PR introduces a new method resetHeartbeat which calls clearHeartbeat and then sets a new timeout. This is a good approach to ensure that the heartbeat is reset properly. However, the resetHeartbeat method is called in multiple places, and it's important to ensure that the logic is consistent and doesn't introduce any unintended side effects.

Workflow ID: wflow_5HUULJRmWSnFMp2A


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 649289c in 20 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. js-pkg/client/src/provider.ts:489
  • Draft comment:
    Consider adding a check to ensure only one reconnection attempt is made at a time to prevent multiple concurrent reconnection attempts.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_RY5N2YkbzzTroLux


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@paulgb
Copy link
Member Author

paulgb commented Dec 13, 2024

To sum up the comments, I think we can simplify this a little bit by:

  1. Rather than being able to "set" and "clear" a heartbeat, just have one operation — "reset", which clears any heartbeat timeout and sets another one. That gets called on websocket connection and on receiving a message. (We might still want a "clear" that gets called in situations where we don't want to set another one, like actually going offline).
  2. Rather than just looking for a connection timeout on specific sync messages, set it on every sent message. This means we'll initiate the reconnection flow if it takes too long to get a response to any message. (This isn't technically true: if we send two messages and then get one response, it will clear the connection timeout without setting another, but we'd fall back to the heartbeat anyway in that scenario).

Also, I can't comment on lines that haven't changed, but I think we should clear both the heartbeat and connection timeout in websocketClose and websocketError

Thanks for the review! I think I did all of this, except for one thing: instead of setting the connection timeout only on every message type, I do it on every sync message. This way if we send some message that the server doesn't need to acknowledge, we don't accidentally disconnect. This change still cleans up the logic a bit.

@paulgb paulgb requested a review from jakelazaroff December 13, 2024 17:23
js-pkg/client/src/provider.ts Outdated Show resolved Hide resolved
encoding.writeVarUint8Array(encoder, encoding.toUint8Array(versionEncoder))

this.send(encoding.toUint8Array(encoder))

this.updateSyncedState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure this is intentional — we do not want to emit the local-changes event here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right, this function is now responsible for checking if the current localVersion has been seen by the server. Since it is no longer responsible for incrementing localVersion, it can no longer lead to the local sync status changing -- it is the responsibility of the caller to increment localVersion and emit the event if necessary.

js-pkg/client/src/provider.ts Show resolved Hide resolved
js-pkg/client/src/provider.ts Show resolved Hide resolved
Copy link
Contributor

@jakelazaroff jakelazaroff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤘🏻

@paulgb paulgb merged commit 80fec8d into main Dec 16, 2024
7 checks passed
@paulgb paulgb deleted the paul/dis-3029-make-y-sweet-more-responsive-to-network-disconnections branch December 16, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants