Add support for {F, async} as a data_size option #19

martinsumner · 2025-02-27T11:29:42Z

This is intended for backends where it is convenient to return a function, (because the size is too expensive to calculate synchronously), but expensive to recalculate continuously (as would happen with {F, dynamic}, where the size is recalculated for every user CLI request for handoff/transfer status.

It would also be possible for the F() in {F, async} to return {F0, dynamic} should this be required.

This is intended for backends where it is convenient to return a function, (because the size is too expensive to calculate synchronously), but expensive to recalculate continuously (as would happen with {F, dynamic}, where the size is recalculated for every user CLI request for handoff/transfer status). It would also be possible for the F() in {F, async} to return {F0, dynamic} should this be required.

martinsumner · 2025-02-28T12:02:18Z

So the approach of using {F, async} rather than {F, dynamic} means that now an operator call for handoff (or transfer) status will not result in a fresh query. The {F, async} will result in a single query when the riak_core_handoff_manager prompts the outbound transfer - and so it will reflect the value at the stats of the transfer (when the snapshot is taken).

The downside is that is an outbound request is constrained by an inbound transfer limit - each team the request is re-scheduled, after failing because of the max_concurrency at the receiver, the query will be re-run. It needs to be re-run (in the sense that otherwise the size will not be accurate).

This limit is applied after the riak_core.forced_ownership_handoff has been applied - so the validate_size will not be called on anything filtered at this stage.

So if there is an issue with repeated calls to validate_size (and re-running of the size query), then the riak_core.forced_ownership_handoff limit should be set to not exceed the riak_core.handoff_concurrency limit. This is not the default, but this might be necessary if (say) a single node is joining and receiving handoffs from many nodes.

Alternatively the riak_core.vnode_management_timer can be increased.

There is no obvious alternative, as there is no other point to which the size calculation can be deferred, without a significant refactoring of the chain of processes involved in prompting handoff.

As the riak_core_handoff_manager is a singleton process, even in this worst case, only a single CPU per node may be occupied - as the manager can only pick up one vnode at a time. Also the total number of CPUs busied in the cluster will be riak_core.forced_ownership_handoff - riak_core.handoff_concurrency.

src/riak_core_handoff_manager.erl

martinsumner mentioned this pull request Feb 27, 2025

Size estimates in leveled backend for handoff OpenRiak/riak_kv#29

Closed

martinsumner mentioned this pull request Feb 28, 2025

Nhse o32 orkv.i29 leveledasync OpenRiak/riak_kv#30

Merged

ThomasArts approved these changes Mar 19, 2025

View reviewed changes

src/riak_core_handoff_manager.erl Outdated Show resolved Hide resolved

Protect against unexpected inputs to calc_pct_done3/

b727b3e

martinsumner merged commit 1728625 into openriak-3.2 Mar 20, 2025
1 check passed

martinsumner deleted the nhse-o32-orkv.i29-asyncfun branch March 20, 2025 17:02

martinsumner mentioned this pull request Mar 20, 2025

Nhse o34 orkv.i29 asyncfun #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for {F, async} as a data_size option #19

Add support for {F, async} as a data_size option #19

martinsumner commented Feb 27, 2025 •

edited

Loading

martinsumner commented Feb 28, 2025 •

edited

Loading

Add support for {F, async} as a data_size option #19

Add support for {F, async} as a data_size option #19

Conversation

martinsumner commented Feb 27, 2025 • edited Loading

martinsumner commented Feb 28, 2025 • edited Loading

martinsumner commented Feb 27, 2025 •

edited

Loading

martinsumner commented Feb 28, 2025 •

edited

Loading