Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix root scanning #165

Merged
merged 9 commits into from
Jul 7, 2022
Merged

Fix root scanning #165

merged 9 commits into from
Jul 7, 2022

Conversation

wenyuzhao
Copy link
Member

@wenyuzhao wenyuzhao commented Jun 28, 2022

  • Remove incorrect MMTkRootScanWorkScope
  • Optimize CodeCache roots scanning
  • Remove unused compute_*_roots functions

Performance comparison

Machine: Shrew (Core i9-9900K Coffee Lake, 8/16 cores)

Immix

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/3k2Ucr

Screenshot 2022-06-30 at 11 03 12 am

MarkCompact

http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/WjS3eX

Screenshot 2022-06-30 at 11 03 49 am

@k-sareen
Copy link
Collaborator

Wow! Nice work! 40% improvement on time.stw for Immix. Weird that MarkCompact is a bit slower for some benchmarks though. I've also noticed that this PR is quite similar to #141 as well. Should we consolidate them or try to merge #141 after this one? I will be able to properly test this PR after I'm done with my MPLR deadline (actual deadline is 15th July, but ideally I'm done before ~10th). I apologize for the delay.

@wenyuzhao
Copy link
Member Author

Wow! Nice work! 40% improvement on time.stw for Immix. Weird that MarkCompact is a bit slower for some benchmarks though. I've also noticed that this PR is quite similar to #141 as well. Should we consolidate them or try to merge #141 after this one? I will be able to properly test this PR after I'm done with my MPLR deadline (actual deadline is 15th July, but ideally I'm done before ~10th). I apologize for the delay.

I think the two PRs are different, although they share some common changes to the CodeCache. This PR tries to optimize CodeCache scanning and does not affect correctness. #141 tries to fix memory leak caused by incorrect root scanning. Also when I was working on #141, I haven't figured out how to scan CodeCache efficiently.

After this PR, I will continue working on #141. It still has some difficulty with MarkCompact at the moment...

Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive performance improvement. I have some questions about the code, especially about the thread local NMETHOD_SLOTS. Also with the merging of #163, this PR will need to adapt the changes.

openjdk/mmtkHeap.cpp Show resolved Hide resolved

static CODE_CACHE_ROOTS: spin::Lazy<Mutex<HashMap<Address, Vec<Address>>>> =
spin::Lazy::new(|| Mutex::new(HashMap::new()));
static CODE_CACHE_ROOTS_SIZE: AtomicUsize = AtomicUsize::new(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this as a separate global variable. Whenever you change this SIZE, you have to acquire a lock to change the CODE_CACHE_ROOTS. And in the only place where you read the SIZE, you also acquire a lock to read the CODE_CACHE_ROOTS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in mmtk_register_nmethod(), you modify the SIZE before acquiring a lock to the ROOTS, which will make the SIZE and the ROOTS inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SIZE is not necessary, it's just an optimization. Because CODE_CACHE_ROOTS is a set of Vec<Address>s, at root scanning time these vectors have to be merged together as a single buffer. Record total SIZE ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.

The consistency is ensured by separating the writes and reads. SIZE is only updated during mutator phases, and reads only happen in pauses.


Apart from tracking SIZE, I also tried putting each vector in the cached roots (HashMap<Address, Vec<Address>>) to a different work packet. There's a slight performance drop by doing so, since this may yield too many vector clones, too many work packets, and most of the vectors are small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SIZE is not necessary, it's just an optimization. Because CODE_CACHE_ROOTS is a set of Vec<Address>s, at root scanning time these vectors have to be merged together as a single buffer. Record total SIZE ahead of time and use it later for the merged buffer allocation may prevent frequent vector size expansion.

You can just calculate the total size of the Vec<Address> before creating the vec.

let code_cache_roots = CODE_CACHE_ROOTS.lock().unwrap();
let size = code_cache_roots.values().map(|vec| vec.len()).sum();
let mut vec = Vec::with_capacity(size);
for roots in code_cache_roots.values() {
    for r in roots {
        vec.push(*r)
    }
}

My point is that if you have acquired the lock to CODE_CACHE_ROOTS whenever you access CODE_CACHE_ROOTS_SIZE, then there is little point having a separate count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid calculating the size in a loop may slow down the GC time. If I can calculate the total size cheaply ahead of time, simply using the pre-calculated size here is the fastest way.

This makes me realize that if I move the SIZE increments after the lock, I can do relaxed instead of atomic increments. This solves the consistency issue, has higher the increments performance, while still having a precise pre-calculated size counter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you always access the CODE_CACHE_ROOTS_SIZE while holding the lock of CODE_CACHE_ROOTS, you can merge them into one Mutex<T>, like this:

struct CodeCacheRoots {
    hash_map: HashMap<Address, Vec<Address>>,
    size: usize,
}

lazy_static! {
    static ref CODE_CACHE_ROOTS: Mutex<CodeCacheRoots> = Mutex::new(CodeCacheRoots {
        hash_map: HashMap::new(),
        size: 0,
    });
}

When using, you lock the mutex and have access to both hash_map and size:

{
    let mut lock_guard = CODE_CACHE_ROOTS.lock().unwrap();
    *lock_guard.size += slots.len();
    lock_guard.hash_map.insert(nm, slots);
}

mmtk/src/lib.rs Outdated Show resolved Hide resolved
mmtk/src/api.rs Show resolved Hide resolved
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mmtk/Cargo.toml Outdated Show resolved Hide resolved
@k-sareen k-sareen merged commit 7a99b72 into master Jul 7, 2022
@k-sareen k-sareen deleted the fix-root-scanning branch April 27, 2023 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants