Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core, api-client): Add abort controller to notifications api call (WPB-11013) #6577

Merged
merged 17 commits into from
Oct 16, 2024

Conversation

thisisamir98
Copy link
Contributor

@thisisamir98 thisisamir98 commented Oct 10, 2024

Fix for Missing Messages Due to Race Condition in WebSocket/API Call Handling

Problem:

A message sent by a user was visible Android or IOS clients, but not in the web app.

We handle WebSocket connections using reconnecting-websocket and API calls using axios and axios-retry. The process involves connecting to a WebSocket, locking it, fetching new messages from the /notifications endpoint using the last known notification ID, decrypting the messages, inserting them into IndexedDB, and finally unlocking the WebSocket to allow queued messages to be processed.

Identified Issue:

The root cause of the bug seems to be a race condition involving the /notifications API call. While an AbortController was in place to check if the WebSocket connection dropped, we never canceled the ongoing request to /notifications. This led to a situation where multiple API requests would fire if the network went offline and then came back online, as shown in the second attached screenshot. These multiple requests resulted in multiple sets of messages being processed simultaneously, causing inconsistencies.

Fix:

By implementing the abort functionality on the API call, we now ensure that the /notifications request is properly canceled when the WebSocket disconnects. This should prevent multiple API responses from arriving at different times and help resolve the race condition that was likely causing the issue.

Race Condition Explanation:

The issue arises when the WebSocket disconnects and reconnects, leading to multiple API calls being initiated to /notifications. If the WebSocket disconnects and reconnects multiple times, each reconnection initiates a new API call to fetch notifications since the last known notification ID. The problem occurs when these multiple API requests resolve at different times but carry overlapping or identical sets of messages (e.g., message IDs 0 to 500).

These overlapping messages are then decrypted and processed in separate loops, which can lead to duplicate processing, especially if two responses arrive at different times, leading to the same messages being processed twice.

Scenario:

  1. WebSocket connects and begins to receive notifications.
  2. An API call to /notifications is made to fetch new messages (IDs 0 to 500).
  3. The WebSocket disconnects unexpectedly, but the ongoing API call to /notifications continues and is not canceled.
  4. The WebSocket reconnects, and a new API call is made to /notifications for the same range (IDs 0 to 500).
  5. Now there are two pending API requests for the same range of messages.
  6. Both requests resolve at different times. The first request returns after 2 seconds, and the second returns after 4 seconds, both carrying messages from 0 to 500.
  7. Both sets of messages are processed in separate decryption loops, resulting in duplicate decryption for the same messages, which may lead to errors or inconsistencies in the message handling process.

Visualization of the Race Condition:

graph TD
    A(WebSocket Connects) --> B[/notifications API Call Started/]
    B --> C(WebSocket Disconnects)
    
    C --> D1(Ongoing API Call 1 Continues)
    C --> E(WebSocket Reconnects)
    E --> F[/notifications API Call 2 Started/]
    
    D1 --> G1(API Call 1 Resolves after 2s)
    F --> G2(API Call 2 Resolves after 4s)
    
    G1 --> H1[Messages 0-500 Arrive from Call 1]
    G2 --> H2[Messages 0-500 Arrive from Call 2]
    
    H1 --> I1{Start Decryption Loop for Call 1}
    H2 --> I2{Start Decryption Loop for Call 2}

    I1 --> J1[Processing Messages 0-500 from Call 1]
    I2 --> J2[Processing Messages 0-500 from Call 2]

    J1 --> K{Duplicate Message Processing?}
    J2 --> K{Duplicate Message Processing?}

    K --> L(Duplicate Decryption Error)
Loading

image

@thisisamir98
Copy link
Contributor Author

PR is blocked by jakearchibald/idb#327

@thisisamir98 thisisamir98 changed the title fix: Add abort controller to notifications api call (WPB-11013) fix(core, api-client): Add abort controller to notifications api call (WPB-11013) Oct 14, 2024
Copy link

sonarcloud bot commented Oct 14, 2024

@thisisamir98
Copy link
Contributor Author

For now we will continue with skipLibCheck: true because jakearchibald/idb#327 is not resolved yet

@thisisamir98 thisisamir98 merged commit 3a5a308 into main Oct 16, 2024
5 checks passed
@thisisamir98 thisisamir98 deleted the WPB-11013 branch October 16, 2024 09:29
@paulwire paulwire added the echoes: bugs Technical or functional defects in the product label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
echoes: bugs Technical or functional defects in the product type: bug / fix 🐞
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants