feat: Fix GC pressure when fetching batches from JVM #2662

andygrove · 2025-10-29T15:46:45Z

Which issue does this PR close?

Rationale for this change

The current approach to passing batches from the JVM to native code creates GC pressure because the JVM retains ArrowArray and ArrowSchema wrapper objects for each batch that is being accumulated on the native side in operators such as SortExec and ShuffleWriterExec.

What changes are included in this PR?

Always take a deep copy in ScanExec to that JVM resources can be gargage-collected
Add FFI guide to contributors guide

How are these changes tested?

codecov-commenter · 2025-10-29T16:08:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.19%. Comparing base (f09f8af) to head (34baeed).
⚠️ Report is 650 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2662      +/-   ##
============================================
+ Coverage     56.12%   59.19%   +3.07%     
- Complexity      976     1447     +471     
============================================
  Files           119      147      +28     
  Lines         11743    13747    +2004     
  Branches       2251     2361     +110     
============================================
+ Hits           6591     8138    +1547     
- Misses         4012     4387     +375     
- Partials       1140     1222      +82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2025-10-29T18:04:01Z

@EmilyMatt fyi

andygrove · 2025-10-29T19:19:33Z

I am moving this to draft until #2664 is complete

…o ffi-guide

mbutrovich · 2025-10-30T00:44:29Z

I know you're quite busy @andygrove, but is it possible to use your awesome memory profiling feature to see if there's a meaningful difference with this PR?

andygrove · 2025-10-30T14:42:23Z

I know you're quite busy @andygrove, but is it possible to use your awesome memory profiling feature to see if there's a meaningful difference with this PR?

Yes, I'll focus on this today. I'm also considering moving the documentation into a separate PR. Another consideration is whether to add yet another config option to choose the behavior here, but I am wary of adding more configs that we have to explain.

andygrove · 2025-10-30T15:49:47Z

I moved the docs to a new PR: #2668

andygrove · 2025-10-31T13:48:01Z

@EmilyMatt Do you have any tips for finding a good repro for the GC pressure issue? I am trying to reproduce this locally so that I can demonstrate the benefit.

EmilyMatt · 2025-11-02T07:44:13Z

@EmilyMatt Do you have any tips for finding a good repro for the GC pressure issue? I am trying to reproduce this locally so that I can demonstrate the benefit.

Unfortunately I was also unable to reproduce this locally.
The images I sent previously were saved on my machine from a while back^^
I do have the following pointers:

Use multiple sequential scan operators with something that ends with a loop that consumes fully (I.e., IcebergCompat -> Union -> Shuffle Write)
Use a lot of data with a lot of RAM, but few CPU cores.
Use an unbounded memory pool, I think this issue is more prevalent without spilling, so the operators will accumulate a lot of data without returning.

andygrove · 2025-11-03T19:01:33Z

@EmilyMatt Do you have any tips for finding a good repro for the GC pressure issue? I am trying to reproduce this locally so that I can demonstrate the benefit.

Unfortunately I was also unable to reproduce this locally. The images I sent previously were saved on my machine from a while back^^ I do have the following pointers:

Use multiple sequential scan operators with something that ends with a loop that consumes fully (I.e., IcebergCompat -> Union -> Shuffle Write)

Use a lot of data with a lot of RAM, but few CPU cores.

Use an unbounded memory pool, I think this issue is more prevalent without spilling, so the operators will accumulate a lot of data without returning.

Thanks @EmilyMatt.

Yes, with the unified pool, we will spill to disk and that will release the JVM wrapper objects, so maybe this is not an issue now. Thanks for helping me understand the issue. This has resulted in improved documentation in the contributor guide that explains this issue.

https://datafusion.apache.org/comet/contributor-guide/ffi.html

I will close this PR and will close issue #2661 but feel free to reopen if this is still an issue for you.

andygrove added 3 commits October 29, 2025 09:38

Add config for forcing deep copy

ad50c7f

docs

54fd64d

fix

6b36701

andygrove mentioned this pull request Oct 29, 2025

chore: Remove CopyExec [WIP] #2639

Closed

Add draft FFI docs

635b4d3

andygrove marked this pull request as draft October 29, 2025 19:05

format

4e33fc6

andygrove added 7 commits October 29, 2025 13:28

update

6d2405e

fix

1e3648d

Merge branch 'ffi-guide' of github.com:andygrove/datafusion-comet int…

80ebc8b

…o ffi-guide

merge

c246779

prep for review

e4a03c2

Merge branch 'ffi-guide' into deep-copy-config

b3a6c17

prep for review

8b19df1

andygrove changed the title ~~feat: Add config for forcing deep copy in native ScanExec~~ feat: Fix GC pressure when fetching batches from JVM Oct 29, 2025

andygrove added 4 commits October 29, 2025 14:01

prep for review

98f8c93

prep for review

2615042

prep for review

8bf8afa

prep for review

18e27a5

andygrove marked this pull request as ready for review October 29, 2025 20:05

andygrove marked this pull request as draft October 30, 2025 14:41

andygrove added 2 commits October 30, 2025 09:40

revert docs

ad08eca

Revert

34baeed

andygrove closed this Nov 3, 2025

andygrove deleted the deep-copy-config branch November 3, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Fix GC pressure when fetching batches from JVM #2662

feat: Fix GC pressure when fetching batches from JVM #2662

Uh oh!

andygrove commented Oct 29, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 29, 2025 •

edited

Loading

Uh oh!

andygrove commented Oct 29, 2025

Uh oh!

andygrove commented Oct 29, 2025

Uh oh!

mbutrovich commented Oct 30, 2025

Uh oh!

andygrove commented Oct 30, 2025

Uh oh!

andygrove commented Oct 30, 2025

Uh oh!

andygrove commented Oct 31, 2025

Uh oh!

EmilyMatt commented Nov 2, 2025

Uh oh!

andygrove commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Fix GC pressure when fetching batches from JVM #2662

feat: Fix GC pressure when fetching batches from JVM #2662

Uh oh!

Conversation

andygrove commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Oct 29, 2025

Uh oh!

andygrove commented Oct 29, 2025

Uh oh!

mbutrovich commented Oct 30, 2025

Uh oh!

andygrove commented Oct 30, 2025

Uh oh!

andygrove commented Oct 30, 2025

Uh oh!

andygrove commented Oct 31, 2025

Uh oh!

EmilyMatt commented Nov 2, 2025

Uh oh!

andygrove commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andygrove commented Oct 29, 2025 •

edited

Loading

codecov-commenter commented Oct 29, 2025 •

edited

Loading