You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've discovered a memory leak in Node.js applications using gRPC with xDS (@grpc/grpc-js v1.12.5 and @grpc/grpc-js-xds v1.12.1). Applications showed a consistent pattern of memory growth, increasing by approximately 300MB within 24 hours of running, eventually leading to container crashes due to hitting memory limits.
Reproduction steps
The issue occurred consistently when xDS was enabled.
Environment
OS name, version and architecture: Debian 4.19.208-1 x86_64 GNU/Linux
Node version: v20.17.0
Node installation method [e.g. nvm]: nvm
If applicable, compiler version [e.g. clang 3.8.0-2ubuntu4]: N/A
Package name and version [e.g. [email protected]]: @grpc/grpc-js v1.12.5 and @grpc/grpc-js-xds v1.12.1
Additional context
We have recently started to use xDS with gRPC, and in our Node applications we’re using the following packages:
@grpc/grpc-js: v1.12.5
@grpc/grpc-js-xds: v1.12.1
With xDS enabled, we noticed that all Node applications exhibit the same memory utilization pattern: within 24 hours of running, the memory footprint of the application increases by about 300 MB.
At some point the containers hit the max memory limit, crash and are recreated.
With xDS disabled, we do not observe this pattern. Disabled means that the grpc-js-xds package is still loaded; however, the endpoint used for gRPC does not use the xDS protocol. E.g.,
Below is a chart of memory utilization for a particular Node application. In this example, xDS was turned on for the application just before 6:00 PM on January 9th, over a 24 hour period we see the memory utilization climb abnormally, just after 6:00 on January 10th, xDS was disabled and the containers were restarted. For the next 2+ days the memory profile on the application was normal.
This is one particular Node application, but we observed this pattern on all Node applications where xDS was enabled.
After analyzing heap snapshots of affected application instances, we identified a large number of LrsCallState objects with a high total retained memory size (249 instances, with 631 MB retained in this example).
After reviewing the implementation, we expected to see only one instance of the LrsCallState Class as this is set in the XdsSingleServerClient, which we found had only one instance:
Normally, the unset instance without references would be garbage collected; however, the LrsCallState has a NodeJS.Timeout that is created with a setInterval call here:
When an instance is unset, the statsTimer is not cleared, and it continues to operate (be referenced) in the global context – because of the backreference to the instance of the LrsCallState, the LrsCallState and associated resources are not collected, resulting in a memory leak:
We created a patched version of the grpc-js-xds package, with the changes in this draft PR:
Problem description
We've discovered a memory leak in Node.js applications using gRPC with xDS (@grpc/grpc-js v1.12.5 and @grpc/grpc-js-xds v1.12.1). Applications showed a consistent pattern of memory growth, increasing by approximately 300MB within 24 hours of running, eventually leading to container crashes due to hitting memory limits.
Reproduction steps
The issue occurred consistently when xDS was enabled.
Environment
Additional context
We have recently started to use xDS with gRPC, and in our Node applications we’re using the following packages:
With xDS enabled, we noticed that all Node applications exhibit the same memory utilization pattern: within 24 hours of running, the memory footprint of the application increases by about 300 MB.
At some point the containers hit the max memory limit, crash and are recreated.
With xDS disabled, we do not observe this pattern. Disabled means that the grpc-js-xds package is still loaded; however, the endpoint used for gRPC does not use the xDS protocol. E.g.,
Below is a chart of memory utilization for a particular Node application. In this example, xDS was turned on for the application just before 6:00 PM on January 9th, over a 24 hour period we see the memory utilization climb abnormally, just after 6:00 on January 10th, xDS was disabled and the containers were restarted. For the next 2+ days the memory profile on the application was normal.
This is one particular Node application, but we observed this pattern on all Node applications where xDS was enabled.
After analyzing heap snapshots of affected application instances, we identified a large number of LrsCallState objects with a high total retained memory size (249 instances, with 631 MB retained in this example).
After reviewing the implementation, we expected to see only one instance of the LrsCallState Class as this is set in the XdsSingleServerClient, which we found had only one instance:
https://github.com/grpc/grpc-node/blob/master/packages/grpc-js-xds/src/xds-client.ts#L822
After deeper review, we found that this lrsCallState attribute is unset here:
https://github.com/grpc/grpc-node/blob/master/packages/grpc-js-xds/src/xds-client.ts#L941
And then recreated here:
https://github.com/grpc/grpc-node/blob/master/packages/grpc-js-xds/src/xds-client.ts#L956
Normally, the unset instance without references would be garbage collected; however, the LrsCallState has a NodeJS.Timeout that is created with a setInterval call here:
https://github.com/grpc/grpc-node/blob/master/packages/grpc-js-xds/src/xds-client.ts#L719
When an instance is unset, the statsTimer is not cleared, and it continues to operate (be referenced) in the global context – because of the backreference to the instance of the LrsCallState, the LrsCallState and associated resources are not collected, resulting in a memory leak:
We created a patched version of the grpc-js-xds package, with the changes in this draft PR:
#2891
After 3 full days, we see memory utilization performing much closer to the normal profile for Node Applications with the patched xDS client.
The text was updated successfully, but these errors were encountered: