Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making force merge threadpool 1/8th of total cores #17255

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Feb 5, 2025

Description

Currently force merge threads are bounded to size of 1 , irrespective of total cores available. This makes the force merges very slow and doesn't scale it with number of cores

This PR increases the thread count to 1/8th of the total cores, thereby making the force merges to run faster, still capped at 12.5% of overall CPU available.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Feb 5, 2025

✅ Gradle check result for abc0a90: SUCCESS

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.45%. Comparing base (302a3fd) to head (4ec5931).
Report is 3 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17255      +/-   ##
============================================
- Coverage     72.47%   72.45%   -0.03%     
+ Complexity    65618    65551      -67     
============================================
  Files          5291     5291              
  Lines        304347   304331      -16     
  Branches      44182    44181       -1     
============================================
- Hits         220578   220503      -75     
- Misses        65670    65759      +89     
+ Partials      18099    18069      -30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This will be a big improvement for customers that have trouble scaling up that depend on force merges

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, much needed. Should we also make it dynamic?

@gbbafna
Copy link
Collaborator Author

gbbafna commented Feb 7, 2025

LGTM, much needed. Should we also make it dynamic?

OpenSearch Threadpools are already dynamic now starting 2.17 . Reference

Copy link
Contributor

github-actions bot commented Feb 7, 2025

❌ Gradle check result for 846824f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sohami
Copy link
Collaborator

sohami commented Feb 7, 2025

LGTM, much needed. Should we also make it dynamic?

OpenSearch Threadpools are already dynamic now starting 2.17 . Reference

With dynamic I understand we can change it without restart. But to set the default behavior is there any benchmark we have to understand the improvement with increasing the threadpool size in terms of disk IOPS utilization and force merge time reduction ? My question would be why 1/8th and not 1/4th or 1/16th as default ?

@gbbafna
Copy link
Collaborator Author

gbbafna commented Feb 11, 2025

With dynamic I understand we can change it without restart. But to set the default behavior is there any benchmark we have to understand the improvement with increasing the threadpool size in terms of disk IOPS utilization and force merge time reduction ? My question would be why 1/8th and not 1/4th or 1/16th as default ?

We want to start with 1/8th as that will consume 12.5% of CPU . 1/4th would be too high - as it would be a sizeable impact on indexing and 1/16th would be too low.

We have seen many opensearch clusters having >=16 CPUs seeing slow force merges . Whereas for clusters having more nodes but less cores per nodes performing them much better. This is why we want to make this a factor of cores.

Copy link
Contributor

✅ Gradle check result for 846824f: SUCCESS

@jainankitk
Copy link
Collaborator

@gbbafna - This change is good if force_merge were the only merge running on nodes. There is the default lucene merge that runs during indexing. Ideally, the threshold should account for both and limit the behavior depending on the number of threads being consumed for merge already.

Also, merge operation is not just CPU bound, but disk I/O bound as well. Allowing concurrent force_merge operations without considering I/O seems risky to me. Also, force_merge operation can intermittenly consume upto 2x the disk space of segments being merged. With single thread, the validation is easy, not sure how we correctly validate that during concurrent force_merge operations

Copy link
Contributor

❌ Gradle check result for 4ec5931: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gbbafna
Copy link
Collaborator Author

gbbafna commented Feb 12, 2025

@gbbafna - This change is good if force_merge were the only merge running on nodes. There is the default lucene merge that runs during indexing. Ideally, the threshold should account for both and limit the behavior depending on the number of threads being consumed for merge already.

Also, merge operation is not just CPU bound, but disk I/O bound as well. Allowing concurrent force_merge operations without considering I/O seems risky to me. Also, force_merge operation can intermittenly consume upto 2x the disk space of segments being merged. With single thread, the validation is easy, not sure how we correctly validate that during concurrent force_merge operations

You are right about this being risky . But I would rather have it as a default and advice the users to have disk space and IO proportionate to their CPU, which they should be having any ways . When users just scale up CPU w/o disk space and IO proportionately, both of them are going to be the bottlenecks for other operations as well . If it is not proportionate, the advice would be to change the defaults via yml or not do force merges concurrently.

@jainankitk
Copy link
Collaborator

You are right about this being risky . But I would rather have it as a default and advice the users to have disk space and IO proportionate to their CPU, which they should be having any ways . When users just scale up CPU w/o disk space and IO proportionately, both of them are going to be the bottlenecks for other operations as well . If it is not proportionate, the advice would be to change the defaults via yml or not do force merges concurrently.

IMO, defaults should be the one that work for most uses. I am assuming this change is being driven by the performance tuning need for few customers. As such, there are many more customers for which the default of 1 works and don't need this to change. Also given OpenSearch Threadpools are dynamic now, we should tune the number of force_merge threads for customers that need more, learn from those tunings, before changing the default.

Copy link
Contributor

✅ Gradle check result for 4ec5931: SUCCESS

@gbbafna
Copy link
Collaborator Author

gbbafna commented Feb 12, 2025

You are right about this being risky . But I would rather have it as a default and advice the users to have disk space and IO proportionate to their CPU, which they should be having any ways . When users just scale up CPU w/o disk space and IO proportionately, both of them are going to be the bottlenecks for other operations as well . If it is not proportionate, the advice would be to change the defaults via yml or not do force merges concurrently.

IMO, defaults should be the one that work for most uses. I am assuming this change is being driven by the performance tuning need for few customers. As such, there are many more customers for which the default of 1 works and don't need this to change. Also given OpenSearch Threadpools are dynamic now, we should tune the number of force_merge threads for customers that need more, learn from those tunings, before changing the default.

I don't think it is working for most use cases . We have heard from numerous users that force merges are not scaling from them, despite a lot of capacity remaining unused on their clusters . For majority of users who use <16 cores, this change is not going to take in any effect . For rest of them , this is only going in take in effect, when there is large amount of force merges to be done. Hence i think that the pros outweigh the cons here. I do hear your concerns and would put it in our documentation.

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants