Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker pool stats broken #6

Open
martinsumner opened this issue Mar 14, 2024 · 5 comments
Open

Worker pool stats broken #6

martinsumner opened this issue Mar 14, 2024 · 5 comments

Comments

@martinsumner
Copy link

When a worker is checked out from a pool (i.e. through a call to riak_core_worker_pool:poolboy_checkout/3 the timestamp of the checkout is updated on the list of checkouts. A queue_time stat is updated by comparing the time the job was added to the queue with now.

When a worker is checked back into the pool (i.e. through a call to riak_core_worker_pool:poolboy_checkin/4, a work_time stat is updated by comparing the checkout time for this pid with now.

However, if work is completed, and there is work queued handle_event({checkin, Worker} bypasses the poolboy_checkin/4 and poolboycheckout/3 functions, and simply allocates the next work in the queue directly to the checked in process.

This means when there is a large queue, and workers are continually recycled fresh work - the stats no longer make sense. For example, only when the queue is emptied will the work_time be recorded, and this will not be the time for the individual piece of work but the time since the worker first became busy.

@martinsumner
Copy link
Author

@aef-
Copy link
Member

aef- commented Aug 9, 2024

Sorry to piggy back on this but we're planning on removing the logging @ https://github.com/nhs-riak/riak_core/blob/nhse-develop/src/riak_core_worker_pool.erl#L253 . Have you all found it helpful?

@martinsumner
Copy link
Author

It isn't particularly important. There are quite a few things where we log at regular intervals, even when confirming there's nothing happening. One of the suggestions at the moment, is to tag all such logs so that in the logger config they can be handled differently (i.e. we might then by default move them from console.log to a metrics.log).

Is the issue just the background noise of these recurring logs?

@aef-
Copy link
Member

aef- commented Aug 9, 2024

Exactly, we've yet to find a use case for it.

@martinsumner
Copy link
Author

Perhaps the right answer is for me to get this done - basho/riak_kv#1883. The node worker pool is only used by the tictac aae ecosystem (for tree rebuilds, aae_folds, reaper, eraser, nextgenrepl full-sync) - so it might be better to have the option not to start it rather than disable the log.

I have a bit of a backlog at the moment, so don't want to commit to when I may get that done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants