Skip to content

WSClient.reConnect: timer leak causes unbounded reconnect loops and memory growth #177

@noodle-bag

Description

@noodle-bag

Description

WSClient.reConnect() in @larksuiteoapi/node-sdk@1.59.0 leaks setTimeout handles, causing unbounded parallel reconnect loops and memory growth over time.

Root Cause

In WSClient.reConnect(), the loopReConnect recursive function stores its setTimeout ID in this.reconnectInterval:

this.reconnectInterval = setTimeout(() => {
    loopReConnect.bind(this)(count);
}, reconnectInterval);

When reConnect(isStart=true) is called externally (e.g. by a health monitor), it attempts to clear the previous loop:

if (this.reconnectInterval) {
    clearTimeout(this.reconnectInterval);
}

However, this.reconnectInterval only holds the last setTimeout ID. If loopReConnect has recursed multiple times, all prior timer handles are overwritten and can never be cleared. The old loop continues running independently.

Each external restart thus spawns a new infinite reconnect loop without fully canceling the old one.

Impact

After running for 2-3 days with periodic external restarts (every ~35 minutes), we observed:

  • 62,000+ ws connect failed log entries over 3 days
  • Dozens of parallel loopReConnect loops running concurrently (visible from different retry counters: 66, 364, 1446, 1241, etc. appearing simultaneously)
  • Memory growth from ~750MB to 1.9GB peak (RSS)
  • Unnecessary CPU and network overhead from futile reconnect attempts

The actual WebSocket connection established via the isStart=true path works fine — all leaked loops are from orphaned reconnect chains.

Reproduction

  1. Create a WSClient with default config (reconnectCount: -1, i.e. infinite)
  2. Let it connect successfully
  3. Externally call reConnect(isStart=true) multiple times (simulating a health-monitor restart)
  4. Observe that old loopReConnect loops are not canceled and continue running in parallel

Suggested Fix

Option A: Use an AbortController or a monotonically increasing generation counter. Before starting a new reconnect loop, increment the counter and have each loopReConnect iteration check if its generation is still current — if not, bail out.

Option B: Track all pending timer IDs (e.g. in a Set) and clear them all in reConnect(isStart=true).

Example (Option A):

// In reConnect(isStart=true):
this.reconnectGeneration = (this.reconnectGeneration || 0) + 1;
const currentGeneration = this.reconnectGeneration;

// In loopReConnect, before each iteration:
if (this.reconnectGeneration !== currentGeneration) {
    return; // Stale loop, exit
}

Environment

  • @larksuiteoapi/node-sdk: 1.59.0
  • Node.js: v22 (Linux x86_64)
  • OS: Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions