-
Notifications
You must be signed in to change notification settings - Fork 248
feat: Fix GC pressure when fetching batches from JVM #2662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2662 +/- ##
============================================
+ Coverage 56.12% 59.19% +3.07%
- Complexity 976 1447 +471
============================================
Files 119 147 +28
Lines 11743 13747 +2004
Branches 2251 2361 +110
============================================
+ Hits 6591 8138 +1547
- Misses 4012 4387 +375
- Partials 1140 1222 +82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@EmilyMatt fyi |
|
I am moving this to draft until #2664 is complete |
ScanExec|
I know you're quite busy @andygrove, but is it possible to use your awesome memory profiling feature to see if there's a meaningful difference with this PR? |
Yes, I'll focus on this today. I'm also considering moving the documentation into a separate PR. Another consideration is whether to add yet another config option to choose the behavior here, but I am wary of adding more configs that we have to explain. |
|
I moved the docs to a new PR: #2668 |
|
@EmilyMatt Do you have any tips for finding a good repro for the GC pressure issue? I am trying to reproduce this locally so that I can demonstrate the benefit. |
Unfortunately I was also unable to reproduce this locally.
|
Thanks @EmilyMatt. Yes, with the unified pool, we will spill to disk and that will release the JVM wrapper objects, so maybe this is not an issue now. Thanks for helping me understand the issue. This has resulted in improved documentation in the contributor guide that explains this issue. https://datafusion.apache.org/comet/contributor-guide/ffi.html I will close this PR and will close issue #2661 but feel free to reopen if this is still an issue for you. |
Which issue does this PR close?
Closes #2661
Rationale for this change
The current approach to passing batches from the JVM to native code creates GC pressure because the JVM retains
ArrowArrayandArrowSchemawrapper objects for each batch that is being accumulated on the native side in operators such asSortExecandShuffleWriterExec.What changes are included in this PR?
ScanExecto that JVM resources can be gargage-collectedHow are these changes tested?