Resource leak when using IPC subscription #589
Labels
bug
This issue is a bug.
investigating
Issue is being investigated and/or work is in progress to resolve the issue.
p2
This is a standard priority issue
Describe the bug
We have detected what appears to be a resource leak in the Greengrass nucleus related to IPC subscriptions.
When the below Python code is executed (in a Greengrass component) which constantly creates an IPC client, sets up a topic subscription and then closes the subscription and client, the underlying resources appear not to be freed.
Expected Behavior
Resources in the Greengrass nucleus should be released when the client is closed.
Current Behavior
Heap is eventually exhausted for the Greengrass Java process.
Reproduction Steps
If the Greengrass heap is set to something like 100Mb, the memory is exhausted after about 15-20 minutes when running the above snippet in a Greengrass component, which we can see by enabling the Native Memory Tracking feature in the JVM.
The above code snippet was the most compact way we were able to replicate the problem we have been seeing on our production devices (but there the memory leak takes several weeks to manifest since we obviously don’t create clients as frequently as in the code snippet above).
Analyzing memory dumps of the JVM identifies the
com.aws.greengrass.builtin.services.pubsub.PubSubIPCEventStreamAgent
as the object retaining almost all memory. Digging into the references of this class reveals hundreds of thousands of objects of typejava.util.concurrent.ConcurrentHashMap$Node
which in turn have references tocom.aws.greengrass.builtin.services.pubsub.SubscriptionTrie
. It looks like this class contains the topic name of the generated subscription.Studying the IPC documentation I can’t see anything obviously wrong with our code snippet - both the Greengrass IPC client and the subscription operations are closed - shouldn’t this free up all resources?
I raised this issue with the Nucleus team, but they state that this must be a problem in Python SDK since the client is not disconnected: aws-greengrass/aws-greengrass-nucleus#1650
Possible Solution
No response
Additional Information/Context
I've also tried with a version of the above snippet when the client is created once (outside the loop), but I get the same behavior.
SDK version used
1.19.0
Environment details (OS name and version, etc.)
Linux 5.15.61-v8+ #1579 SMP PREEMPT 2022 aarch64 GNU/Linux
The text was updated successfully, but these errors were encountered: