Fix root scanning #165

wenyuzhao · 2022-06-28T00:36:01Z

Remove incorrect MMTkRootScanWorkScope
Optimize CodeCache roots scanning
Remove unused compute_*_roots functions

Performance comparison

Machine: Shrew (Core i9-9900K Coffee Lake, 8/16 cores)

Immix

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/3k2Ucr

MarkCompact

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/WjS3eX

openjdk/mmtkUpcalls.cpp

openjdk/mmtkHeap.hpp

k-sareen · 2022-06-30T12:25:04Z

Wow! Nice work! 40% improvement on time.stw for Immix. Weird that MarkCompact is a bit slower for some benchmarks though. I've also noticed that this PR is quite similar to #141 as well. Should we consolidate them or try to merge #141 after this one? I will be able to properly test this PR after I'm done with my MPLR deadline (actual deadline is 15th July, but ideally I'm done before ~10th). I apologize for the delay.

wenyuzhao · 2022-07-04T01:49:11Z

Wow! Nice work! 40% improvement on time.stw for Immix. Weird that MarkCompact is a bit slower for some benchmarks though. I've also noticed that this PR is quite similar to #141 as well. Should we consolidate them or try to merge #141 after this one? I will be able to properly test this PR after I'm done with my MPLR deadline (actual deadline is 15th July, but ideally I'm done before ~10th). I apologize for the delay.

I think the two PRs are different, although they share some common changes to the CodeCache. This PR tries to optimize CodeCache scanning and does not affect correctness. #141 tries to fix memory leak caused by incorrect root scanning. Also when I was working on #141, I haven't figured out how to scan CodeCache efficiently.

After this PR, I will continue working on #141. It still has some difficulty with MarkCompact at the moment...

qinsoon

Very impressive performance improvement. I have some questions about the code, especially about the thread local NMETHOD_SLOTS. Also with the merging of #163, this PR will need to adapt the changes.

openjdk/mmtkHeap.cpp

qinsoon · 2022-07-05T00:47:47Z

mmtk/src/lib.rs

+
+static CODE_CACHE_ROOTS: spin::Lazy<Mutex<HashMap<Address, Vec<Address>>>> =
+    spin::Lazy::new(|| Mutex::new(HashMap::new()));
+static CODE_CACHE_ROOTS_SIZE: AtomicUsize = AtomicUsize::new(0);


I don't think we need this as a separate global variable. Whenever you change this SIZE, you have to acquire a lock to change the CODE_CACHE_ROOTS. And in the only place where you read the SIZE, you also acquire a lock to read the CODE_CACHE_ROOTS.

Also in mmtk_register_nmethod(), you modify the SIZE before acquiring a lock to the ROOTS, which will make the SIZE and the ROOTS inconsistent.

Yes, SIZE is not necessary, it's just an optimization. Because CODE_CACHE_ROOTS is a set of Vec<Address>s, at root scanning time these vectors have to be merged together as a single buffer. Record total SIZE ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.

The consistency is ensured by separating the writes and reads. SIZE is only updated during mutator phases, and reads only happen in pauses.

Apart from tracking SIZE, I also tried putting each vector in the cached roots (HashMap<Address, Vec<Address>>) to a different work packet. There's a slight performance drop by doing so, since this may yield too many vector clones, too many work packets, and most of the vectors are small.

Yes, SIZE is not necessary, it's just an optimization. Because CODE_CACHE_ROOTS is a set of Vec<Address>s, at root scanning time these vectors have to be merged together as a single buffer. Record total SIZE ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.

You can just calculate the total size of the Vec<Address> before creating the vec.

let code_cache_roots = CODE_CACHE_ROOTS.lock().unwrap(); let size = code_cache_roots.values().map(|vec| vec.len()).sum(); let mut vec = Vec::with_capacity(size); for roots in code_cache_roots.values() { for r in roots { vec.push(*r) } }

My point is that if you have acquired the lock to CODE_CACHE_ROOTS whenever you access CODE_CACHE_ROOTS_SIZE, then there is little point having a separate count.

I'm afraid calculating the size in a loop may slow down the GC time. If I can calculate the total size cheaply ahead of time, simply using the pre-calculated size here is the fastest way.

This makes me realize that if I move the SIZE increments after the lock, I can do relaxed instead of atomic increments. This solves the consistency issue, has higher the increments performance, while still having a precise pre-calculated size counter.

If you always access the CODE_CACHE_ROOTS_SIZE while holding the lock of CODE_CACHE_ROOTS, you can merge them into one Mutex<T>, like this:

struct CodeCacheRoots { hash_map: HashMap<Address, Vec<Address>>, size: usize, } lazy_static! { static ref CODE_CACHE_ROOTS: Mutex<CodeCacheRoots> = Mutex::new(CodeCacheRoots { hash_map: HashMap::new(), size: 0, }); }

When using, you lock the mutex and have access to both hash_map and size:

{ let mut lock_guard = CODE_CACHE_ROOTS.lock().unwrap(); *lock_guard.size += slots.len(); lock_guard.hash_map.insert(nm, slots); }

mmtk/src/lib.rs

mmtk/src/api.rs

qinsoon

LGTM

mmtk/Cargo.toml

wenyuzhao added 2 commits June 28, 2022 10:24

Fix root scanning

fa1a26e

minor

084a127

k-sareen reviewed Jun 28, 2022

View reviewed changes

openjdk/mmtkUpcalls.cpp Show resolved Hide resolved

k-sareen reviewed Jun 28, 2022

View reviewed changes

openjdk/mmtkHeap.hpp Show resolved Hide resolved

Fix clippy

f6e121d

Fix duplicate edges

55f1317

qinsoon requested changes Jul 5, 2022

View reviewed changes

wenyuzhao added 3 commits July 6, 2022 14:01

Merge branch 'master' into fix-root-scanning

07f180d

minor

d359b0b

cargo fmt

7ce923f

k-sareen reviewed Jul 6, 2022

View reviewed changes

mmtk/src/api.rs Show resolved Hide resolved

qinsoon approved these changes Jul 7, 2022

View reviewed changes

Add comments

334e0ac

k-sareen reviewed Jul 7, 2022

View reviewed changes

mmtk/Cargo.toml Outdated Show resolved Hide resolved

Remove spin

8aadcbc

k-sareen merged commit 7a99b72 into master Jul 7, 2022

k-sareen deleted the fix-root-scanning branch April 27, 2023 05:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix root scanning #165

Fix root scanning #165

Uh oh!

wenyuzhao commented Jun 28, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

k-sareen commented Jun 30, 2022

Uh oh!

wenyuzhao commented Jul 4, 2022

Uh oh!

qinsoon left a comment

Uh oh!

Uh oh!

qinsoon Jul 5, 2022

Uh oh!

qinsoon Jul 5, 2022

Uh oh!

wenyuzhao Jul 5, 2022

Uh oh!

qinsoon Jul 5, 2022

Uh oh!

wenyuzhao Jul 5, 2022

Uh oh!

wks Jul 7, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinsoon left a comment

Uh oh!

Uh oh!

Uh oh!

Fix root scanning #165

Fix root scanning #165

Uh oh!

Conversation

wenyuzhao commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance comparison

Immix

MarkCompact

Uh oh!

Uh oh!

Uh oh!

k-sareen commented Jun 30, 2022

Uh oh!

wenyuzhao commented Jul 4, 2022

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qinsoon Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

qinsoon Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

wenyuzhao Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

qinsoon Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

wenyuzhao Jul 5, 2022

Choose a reason for hiding this comment

Uh oh!

wks Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenyuzhao commented Jun 28, 2022 •

edited

Loading