Skip to content

Conversation

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Oct 22, 2025

Hi all,

please review these changes to implement the UseGCOverheadLimit functionality for G1 (and make the implementation for Parallel GC have similar output).

The UseGCOverheadLimit feature prematurely returns null from a GC if GC cpu usage limits and heap usage limits are met for some time. This is to avoid a VM limping along if garbage collection gets into an endless cycle of garbage collections or until a "real" OOME is thrown.

What is important here is how this works (derived from the Parallel GC implementation):

Check overheads at the end of the (initial) garbage collection (before upgrading) to see whether we are over the limits for a successive amount of GCs. If so, keep doing GCs without actually allocating memory for the allocation request to keep on measuring gc CPU usage. It is important to measure the correct cpu usage in case of the application being able to free memory on the OOME. Otherwise GC cpu usage will go down again and might result not exceeding the threshold next time, which will reset the gc counter. In that case we may ping-pong between exceeding cpu usage and not all the time.

I do not have an opinion on whether the application trying to handle OOMEs/try to recover is a good idea after having determined that overhead is too large for quite some time now; however this seems to be the only valid reason why after the first time the overhead limit is exceeded Parallel does not just always return null after any subsequent GC.

Note that G1 and Parallel measure CPU time differently (e.g. G1 uses the last 10 GCs, and takes concurrent work into account, while Parallel takes the last 32 GCs as "long term cpu usage". Also the memory usage calculation is different, so the overall sensitivity is different. There is nothing that can be done about this imo.

Testing: tier1-5 without any new OOMEs due to this feature (it's enabled by default in release builds like in Parallel), test case

Thanks,
Thomas


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Integration blocker

 ⚠️ Dependency #27932 must be integrated first

Issue

  • JDK-8212084: Implement UseGCOverheadLimit for G1 (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27936/head:pull/27936
$ git checkout pull/27936

Update a local copy of the PR:
$ git checkout pull/27936
$ git pull https://git.openjdk.org/jdk.git pull/27936/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27936

View PR using the GUI difftool:
$ git pr show -t 27936

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27936.diff

Afshin Zafari and others added 30 commits October 20, 2025 15:09
Reviewed-by: rhalade, mschoene, dlong, coleenp
Reviewed-by: ahgross, rriggs, rhalade, lancea, naoto
Reviewed-by: rhalade, rriggs
Reviewed-by: ahgross, rhalade, jnibedita, ascarpino, naoto
…ompress

Co-authored-by: Jatin Bhateja <[email protected]>
Reviewed-by: jbhateja, xgong, galder, vlivanov
…ero in MemPointerParser::canonicalize_raw_summands

Co-authored-by: Manuel Hässig <[email protected]>
Reviewed-by: mhaessig, kvn
Reviewed-by: rriggs, naoto, scolebourne
Co-authored-by: Joel Sikström <[email protected]>
Reviewed-by: jsikstro, fandreuzzi
Reviewed-by: tschatzl, fandreuzzi
…perator= with named function

Reviewed-by: mhaessig, rcastanedalo
Reviewed-by: asemenyuk
Reviewed-by: kcr, prr
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 22, 2025

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into pr/27932 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 22, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8212084 8212084: Implement UseGCOverheadLimit for G1 Oct 22, 2025
@openjdk
Copy link

openjdk bot commented Oct 22, 2025

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

MBaesken and others added 16 commits October 22, 2025 11:11
…/TestDescription.java fails with OutOfMemoryError: Metaspace

Reviewed-by: dholmes, lmesnik, iklam, syan
…nt safe state

Reviewed-by: fandreuzzi, egahlin
…tion in the training run

Reviewed-by: adinn, iklam
…E_LIVE

Reviewed-by: lmesnik, sspitsyn, amenkov
Reviewed-by: almatvee
Reviewed-by: iwalulya, ayang
Hi all,

  please review these changes to implement the `UseGCOverheadLimit` functionality for G1 (and make the implementation for Parallel GC have similar output).

The `UseGCOverheadLimit` feature prematurely returns `null` from a GC if GC cpu usage limits and heap usage limits are met for some time. This is to avoid a VM limping along if garbage collection gets into an endless cycle of garbage collections or until a "real" OOME is thrown.

What is important here is how this works (derived from the Parallel GC implementation):

* check overheads at the end of the (initial) garbage collection (before upgrading) to see whether we are over the limits for a successive amount of GCs.
* keep doing GCs without actually allocating memory for the allocation request to keep on measuring gc CPU usage. This is important for measuring the correct cpu usage in case of the application being able to free memory on the OOME.

Testing: tier1-5 without any OOMEs due to this feature, test case

Thanks,
  Thomas
@tschatzl tschatzl force-pushed the submit/8212084-usegcoverheadlimit branch from e9d8963 to 389e5a2 Compare October 23, 2025 07:09
@tschatzl tschatzl closed this Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.