-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Description
WSClient.reConnect() in @larksuiteoapi/node-sdk@1.59.0 leaks setTimeout handles, causing unbounded parallel reconnect loops and memory growth over time.
Root Cause
In WSClient.reConnect(), the loopReConnect recursive function stores its setTimeout ID in this.reconnectInterval:
this.reconnectInterval = setTimeout(() => {
loopReConnect.bind(this)(count);
}, reconnectInterval);When reConnect(isStart=true) is called externally (e.g. by a health monitor), it attempts to clear the previous loop:
if (this.reconnectInterval) {
clearTimeout(this.reconnectInterval);
}However, this.reconnectInterval only holds the last setTimeout ID. If loopReConnect has recursed multiple times, all prior timer handles are overwritten and can never be cleared. The old loop continues running independently.
Each external restart thus spawns a new infinite reconnect loop without fully canceling the old one.
Impact
After running for 2-3 days with periodic external restarts (every ~35 minutes), we observed:
- 62,000+
ws connect failedlog entries over 3 days - Dozens of parallel
loopReConnectloops running concurrently (visible from different retry counters: 66, 364, 1446, 1241, etc. appearing simultaneously) - Memory growth from ~750MB to 1.9GB peak (RSS)
- Unnecessary CPU and network overhead from futile reconnect attempts
The actual WebSocket connection established via the isStart=true path works fine — all leaked loops are from orphaned reconnect chains.
Reproduction
- Create a
WSClientwith default config (reconnectCount: -1, i.e. infinite) - Let it connect successfully
- Externally call
reConnect(isStart=true)multiple times (simulating a health-monitor restart) - Observe that old
loopReConnectloops are not canceled and continue running in parallel
Suggested Fix
Option A: Use an AbortController or a monotonically increasing generation counter. Before starting a new reconnect loop, increment the counter and have each loopReConnect iteration check if its generation is still current — if not, bail out.
Option B: Track all pending timer IDs (e.g. in a Set) and clear them all in reConnect(isStart=true).
Example (Option A):
// In reConnect(isStart=true):
this.reconnectGeneration = (this.reconnectGeneration || 0) + 1;
const currentGeneration = this.reconnectGeneration;
// In loopReConnect, before each iteration:
if (this.reconnectGeneration !== currentGeneration) {
return; // Stale loop, exit
}Environment
@larksuiteoapi/node-sdk: 1.59.0- Node.js: v22 (Linux x86_64)
- OS: Ubuntu 24.04