Skip to content

Conversation

@allanjude
Copy link
Contributor

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.

Motivation and Context

Replace dbuf_evict_one() with dbuf_evict_many() to more efficiently evict dbuf objects without looping over the same locked objects over and over.

Description

In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread.

How Has This Been Tested?

Performance benchmarks while the dbuf cache is full

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Comment on lines 1009 to +1007
if (size > dbuf_cache_target_bytes()) {
if (size > dbuf_cache_hiwater_bytes())
dbuf_evict_one();
dbuf_evict();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets assume we have 10 user threads calling this. I suppose each of them will try to create own task sets to evict the same full amount of extra dbuf caches using all the same CPUs. In best case it may end up with empty dbuf cache. I am not sure I greatly like the design of one main eviction thread calling bunch of other taskqs, but each client thread doing that definitely looks weird. I think if user threads has to do evictions, they should do it directly, just doing more than one buffer at a time to be more efficient, as you have said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as I complained before, you are evicting everything extra from every additional caller thread if highwater is reached. But now you are making that thread to queue it to the same set of taskqs as the main eviction path (which may have some sense if main eviction thread started only one task and now waiting for it, but I am not sure is enough). I think we could either remove this eviction path, or if we keep it (), make it always execute synchronously without taskq and may be evict only one buffer same as it was originally here.

@0mp 0mp force-pushed the parallel_dbuf_evict branch 3 times, most recently from 54888d8 to 883cd30 Compare September 12, 2024 11:59
@0mp
Copy link
Contributor

0mp commented Sep 12, 2024

I have updated the patch with a different logic for picking the default maximum number of dbuf eviction threads. The new logic aims to pick the number that is one-eighth of the available CPUs, with a minimum of 2 and a maximum of 16.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 13, 2024
@0mp 0mp mentioned this pull request Oct 23, 2024
13 tasks
@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Oct 29, 2024
@0mp 0mp force-pushed the parallel_dbuf_evict branch from 883cd30 to 47c490b Compare November 13, 2024 14:57
@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Nov 13, 2024
@0mp
Copy link
Contributor

0mp commented Nov 13, 2024

I've rebased onto latest master. I'll address the feedback soon.

@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Nov 13, 2024
@alex-stetsenko alex-stetsenko force-pushed the parallel_dbuf_evict branch 2 times, most recently from 1985db5 to 04f57eb Compare November 21, 2024 09:11
@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Nov 21, 2024
@alex-stetsenko alex-stetsenko force-pushed the parallel_dbuf_evict branch 3 times, most recently from 5615676 to 882150d Compare January 21, 2025 13:47
@alex-stetsenko alex-stetsenko force-pushed the parallel_dbuf_evict branch 2 times, most recently from b996a50 to 3805ae8 Compare January 30, 2025 09:04
@adamdmoss
Copy link
Contributor

adamdmoss commented Feb 2, 2025

Similarly to my issue with the parallel ARC eviction PR, I can pretty quickly end up with the dbuf_evict thread spinning forever consuming one core with this PR.
shot from perftop (the only tool I had at hand)

dbuf_evict(void)
{
int64_t bytes = (zfs_refcount_count(&dbuf_caches[DB_DBUF_CACHE].size) -
dbuf_cache_lowater_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are making every thread to evict everything extra. At best you might end up evicting all the cache.

@amotin amotin added Status: Revision Needed Changes are required for the PR to be accepted and removed Status: Code Review Needed Ready for review and testing labels Mar 5, 2025
In the previous code, dbuf_evict_thread() would called dbuf_evict_one()
in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache,
then walk it from the tail forward, attempting to acquire the lock on
each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the
cache. However, evicting one object is not likely to bring us below the
desired low water mark, so dbuf_evict_one() will be called again, where
it will loop over all of the same busy objects again, until it founds
one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific
sublist as a parameter, as well as a desired amount of data to evict.
It then walks the sublist from the tail forward, evicting what it can
until the number of bytes evicted satisfies the input parameter or
the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to
keep up with demand more easily. For the dbuf cache, if the single
thread was not able to keep up, ZFS would shift the work of evicting
some items to each incoming I/O thread. While that is still the case
it should be seen much less often now that dbuf_evict is more efficient
and no longer bottlenecked to a single thread.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <[email protected]>
Co-authored-by: Mateusz Piotrowski <[email protected]>
Signed-off-by: Alexander Stetsenko <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Mar 17, 2025
@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Jun 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Revision Needed Changes are required for the PR to be accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants