-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Couch stats resource tracker v3 rebase main #5602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 020743f.
src/couch_stats/CSRT.md
Outdated
|
||
``` | ||
%% RPC API | ||
-export([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions are no longer available. The replacement functions are
rpc_run/1,
rpc_unsafe_run/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed up.
src/couch_stats/CSRT.md
Outdated
]). | ||
|
||
%% Aggregate Query API | ||
-export([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this list is outdated.
% Aggregate Query API
-export([
active/0,
active/1,
active_coordinators/0,
active_coordinators/1,
active_workers/0,
active_workers/1,
find_by_nonce/1,
find_by_pid/1,
find_by_pidref/1,
find_workers_by_pidref/1,
query_matcher/1,
query_matcher/2
]).
-export([
query/1,
from/1,
group_by/1,
group_by/2,
sort_by/1,
sort_by/2,
count_by/1,
options/1,
unlimited/0,
with_limit/1,
run/1,
unsafe_run/1
]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this in debe52f as well as linked out to the excellent csrt_query docs and Eunit tests you wrote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great, lots of goodies here!
There are some changes requested and extra notes in the review comments.
-
Thanks for writing the CSRT.md! Lots of details there but consider adding something that a CouchDB upstream user might look at in the docs. Focusing on what problem this feature might solve for them, how they could use the HTTP API and the reports to solve it. I think that sort of got lost in the implementation details (ets table, define macros etc). Something high level like "You may have noticed your nodes are throwing timeout errors or there is excessive resource usage. Here is how to use this csrt feature to find out what's inducing that".
-
I see there is an HTTP API interface and a report logger. Is one introspection method the primary? We should probably have the HTTP docs in and some docs blurb of how to use the new feature.
-
For configuration section in default.ini make sure they match the description in the docs (off by default), use commented out values. Maybe add some instruction in the comments of when it would make sense enabling one feature or disabling. We have config docs section those config items should probably go in there.
-
It seems we can track per/coordinator and per/worker stats, do we emit them as the requests are running or when they finish? After the requests finish do we keep the stats around to inspect some aggregate via the HTTP API or that mostly done by some log / report aggregator, so to use this feature implies having a log report aggregator to parse the report and build graphs from there. If we do keep them around do they get cleaned up after some time?
-
CSRT application is about the size of the couch_stats application itself but with its own supervisor, hrl files and separate csrt_utils module. Besides a single line callback in
increment_counter
(maybe_track_local_counter/2
) they don't interact much so it should probably be its own separate application. Then it can use a more conventional name ofcouch_resource_tracker
orcouch_rt
to keep it shorter.
rel/overlay/etc/default.ini
Outdated
enable = true | ||
enable_init_p = true | ||
enable_reporting = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the default ini we normally don't have actual values, just defaults from the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These particular values are enabled for CI purposes during this PR review, the correct default values are immediately below these and will be deleted prior to merging. See my top level PR response to your comments, I've included a local diff I have ready to flip this, but again, we need this to get a green CI with CSRT enabled.
rel/overlay/etc/default.ini
Outdated
fabric_rpc__all_docs = true | ||
fabric_rpc__changes = true | ||
fabric_rpc__get_all_security = true | ||
fabric_rpc__map_view = true | ||
fabric_rpc__open_doc = true | ||
fabric_rpc__open_shard = true | ||
fabric_rpc__reduce_view = true | ||
fabric_rpc__update_docs = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we have commented out values here from the code. Not sure if we should make them enabled or disabled. When would a user chose to enable/disable them?
Also use the more common convention of m:f
of double-underscore https://www.erlang.org/doc/system/funs.html#syntax-of-funs. The typical syntax is fun m:f/a
but we don't care about the arity I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These [csrt.init_p]
values are awkward... and unfortunately do need to be set here by default. All of the other values in this default.ini
have default values to be set to and disabled, but these need to be set to true to have a lookup enabled for these settings and others.
The original idea with [csrt.init_p]
is that this would prevent the counter increments from triggering ever to prevent the tracking space for those counters to be enabled, but the folsom should_notify stuff isn't used directly anymore and besides the metrics are being picked up from the declaration of stats_descriptions.cfg
.
We need a positive lookup table for these values to return true to indicate they are they desired subset of operations spawned in rexi_server:init_p/3
, as otherwise we'd create metrics for all fabric_rpc:*
functions. Alternatively, this could be a static map in the csrt.hrl
file, but we need a way to add more to these later. I'm admittedly not a huge fan of this particular aspect of the init_p logic, but like I said, we need a positive lookup table somewhere.
As for the double underscore, we've talked about this in the last PR, there's zero precedent for the use of :
in the codebase, and the only keyword that isn't [a-z0-9_]
is ||*
in partitioned||*
, eg:
(! 11775)-> grep '^[a-z;].* = ' rel/overlay/etc/default.ini | sed 's/^\([a-z;][^ ]*\) =.*$/\1/' | grep -v ' ' | grep '[^a-z0-9_;]'
partitioned||*
and I was not able to find documentation that :
is valid for default.ini, so I did not use that. Because of the ||
use, I used it in IOQ in the past https://github.com/apache/couchdb-ioq/blob/main/include/ioq.hrl#L26 but found ||
to be ugly, so I went with double underscore.
Anyways... just let's just remove this entirely, the dynamic config toggle approach with the specific metrics didn't workout as I had wanted, so let's just go back to hardcoding these directly, that'll simplify this, eliminate the ini section entirely, and also get rid of the ugly string conversion in csrt_util:fabric_conf_key/1
.
Here's the patch to hardcode it, and I'll drop out the other components in a bit:
(! 11789)-> git diff src/couch_stats/src/csrt.erl
diff --git a/src/couch_stats/src/csrt.erl b/src/couch_stats/src/csrt.erl
index 6c098bfea..3532a3b2b 100644
--- a/src/couch_stats/src/csrt.erl
+++ b/src/couch_stats/src/csrt.erl
@@ -431,8 +431,25 @@ maybe_inc(Stat, Val) ->
end.
-spec should_track_init_p(Stat :: [atom()]) -> boolean().
-should_track_init_p([Mod, Func, spawned]) ->
- is_enabled_init_p() andalso csrt_util:should_track_init_p(Mod, Func);
+%% "Example to extend CSRT"
+%% should_track_init_p([fabric_rpc, foo, spawned]) ->
+%% is_enabled_init_p();
+should_track_init_p([fabric_rpc, all_docs, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, changes, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, get_all_security, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, map_view, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, open_doc, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, open_shard, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, reduce_view, spawned]) ->
+ is_enabled_init_p();
+should_track_init_p([fabric_rpc, update_docs, spawned]) ->
+ is_enabled_init_p();
should_track_init_p(_Metric) ->
false.
; against the provided db for any requests that induce IO in quantities | ||
; greater than the provided threshold on any one of: ioq_calls, rows_read | ||
; docs_read, get_kp_node, get_kv_node, or changes_processed. | ||
[csrt_logger.dbnames_io] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these clustered db name or shards? Or both?
And then are they per-node or per cluster. Say foo/bar = 200 means on a node during the worker request processing it read more than that many docs or kp nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added additional documentation for this over in CSRT.md at ## -define(CSRT_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").
section, and similarly # CSRT Logger Matcher Enablement and Thresholds
for the default matchers.
You can specify a dbname or a shard name, and that will match at either the coordinator context or the RPC worker context, respectively, so if you match on db foo
, that is a clustered dbname that exists at the coordinator level, so it'll match coordinators for requests against db foo
, whereas RPC workers operate on the db shards themselves, so you'd specify shards/80000000-ffffffff/foo.1744078714
to match on a particular RPC worker, but you also need csrt:is_enabled_rpc_reporting()
to trigger logs for RPC workers. The idea is until a properly expressive config syntax is defined for mapping complex matchers to default.ini, we instead have the ability to easily set specific target thresholds on the dimensions and then specify whether or not to enable RPC reporting.
As it says in the comment, this will trigger on any one field being greater than the threshold. Again, this is address the expressiveness of the ini settings and what we can map easily into ets:fun2ms
: this is a key => val
config lookup on a dbname to an integer threshold as a simple way to register a heavy IO matcher against a database. Ideally, this becomes obsolete with more expressive matchers, but pragmatically it's a very useful way to be able to config:set a logger matcher dynamically and get useful data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a bit confusing have both there so maybe add a comment about these being either if using rpc worker matcher or cluster db name if using coordinator matches.
;changes_processed = false | ||
;ioq_calls = false | ||
|
||
[csrt_logger.matchers_threshold] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different than csrt_logger.dbnames_io
they both look like thresholds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the CSRT.md docs, these are for the default matchers, there's [csrt_logger.matchers_enabled]
and [csrt_logger.matchers_threshold]
so that then for each of the default matchers we can check if its enabled and also fetch its threshold value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a short comment next to the section about the difference between the two.
src/couch_stats/src/csrt_util.erl
Outdated
tutc() -> | ||
tutc(tnow()). | ||
|
||
%% Convert a integer system time in milliseconds into UTC RFC 3339 format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a clarification needed, the Time0
argument we get is not in milliseconds but in native
units right? And it's the monotonic timestamp from the same node? So we add the offset + Time0 in native then convert it to rfc3339 with 3 digits millisecond precision.
I think something like this one-liner could work then:
calendar:system_time_to_rfc3339(Time0 + erlang:time_offset(), [{unit, native}, {offset, "z"}]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, good catch and suggestion! I've fixed that and updated the comment to drop mention of milliseconds. That was from when the deltas were in milliseconds rather than native format, I switched that over a while back.
src/couch_stats/src/csrt_util.erl
Outdated
%% possible positive integer value delta. | ||
1; | ||
make_dt(A, B, Unit) when is_integer(A) andalso is_integer(B) andalso B > A -> | ||
case erlang:convert_time_unit(abs(B - A), native, Unit) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we assert B > A then we probably don't need the abs call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, abs
is necessary, see my other comments.
src/couch_stats/src/csrt_util.erl
Outdated
-spec fabric_conf_key(Key :: atom()) -> string(). | ||
fabric_conf_key(Key) -> | ||
%% Double underscore to separate Mod and Func | ||
"fabric_rpc__" ++ atom_to_list(Key). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use :
? Double-underscore is an unusual convention of module:function
references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in the top level PR comment, but the use of double underscore is specifically because that's what existing keys in default.ini
use, there's zero instances in default.ini
of a key with a semicolon in it, and I had trouble finding clarity on the documentation, so I went with an obviously correct approach.
That said, over in #5602 (comment) I came up with an alternative approach I'll get wrapped up tomorrow that'll eliminate the need for this function, so I'll delete this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler is better but for what it's worth the ":" should work as well. I verified it both in config:get() remsh, and HTTP PUT/GET context
% curl -XPUT -H'Content-type:application/json' $DB/_node/_local/_config/bogus/m2:f2 -d'"200"'
{
"m2:f2": "200",
"m:f": "100"
}
http $DB/_node/_local/_config/bogus
{
"m:f": "100"
}
> config:set("bogus", "m:f", "100").
> config:get("bogus").
[{"m:f","100"}]
src/rexi/src/rexi_monitor.erl
Outdated
@@ -35,6 +35,7 @@ start(Procs) -> | |||
%% messages from our mailbox. | |||
-spec stop(pid()) -> ok. | |||
stop(MonitoringPid) -> | |||
unlink(MonitoringPid), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to unlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unlinking is a bug which consumed many many hours of my life.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an artifact of earlier implementation in #4812 that dropped the use of selective receives and needed to handle all of the messages, which exposed a number of places where we get unexpected messages in the mailbox. I found the old issues discussing this bug and others, many of which since have been fixed, and I believe this one included. I'll drop this out of the PR. Here's the old issues:
src/couch_stats/src/couch_stats.erl
Outdated
%% Should maybe_track_local happen before or after notify? | ||
%% If after, only currently tracked metrics declared in the app's | ||
%% stats_description.cfg will be trackable locally. Pros/cons. | ||
ok = maybe_track_local_counter(Name, Value), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the only integration point with the couch_stats application. And even her csrt could be turned off or crashed and we'd still be going. There is a separate supervisor, separate utils module, separate .h files. So I think it makes sense to have it in a separate application. Then it can also have a more conventional names like couch_resource_tracker
or couch_rt
if that's that's too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this more in my top level PR comment in #5602 (comment)
This cleans up the usability of the `csrt:proc_window/3` API as you often want to wrap `csrt:get_resource/1` with `csrt:to_json/1`, but prior to this patch that would cause a stack trace, complicating a common workflow like: > [{PR, csrt:to_json(csrt:get_resource(PR))} || {PR, _, _} <- csrt:proc_window(ioq_calls, 5, 1000)].
Thanks for the feedback @iilyak and @nickva, I'll respond directly to the various other comments, and some higher level responses here:
Yeah most of the CSRT.md docs were written while Ilya was updating the HTTP API and Query API, so I focused on documenting the internals. In debe52f I've updated the docs with direct examples and sample output, as well as linking out the Query and HTTP API documentation from Ilya.
To be specific, there's a real time query API that allows for filtered querying and aggregations of the live running system, by way of the So basically we track the data in real time in a manner that allows for live querying and aggregations, but also for simple report logging of heavy usage requests, with considerable granularity in the desired reporting.
Yes, the
We create a context at the point where want to start tracking, currently that's in When a process creates a context, we spawn a dedicated tracker process to monitor the caller such that when the process exit's or closes out the context, the tracker process fetches the compiled CSRT logger ets match specs and runs them against the final accumulated Earlier versions of CSRT tried to address the short lived process issue to aid in visibility, but it became awkward to handle and query around whether the process was alive or not, for instance, that suddenly means you can't just sum on
You mentioned this in the other PR, but I don't agree, and I still think this makes sense to be contained within As for using a dedicated supervisor, as I elaborated on in the prior PR at #5491 (comment) I believe the use of a dedicated transient supervisor should the the standard operating procedure for all new experimental gen_server's introduced into the codebase. The use of a dedicated supervisor here is for isolation purposes that allow for progressively deploying new functionality in a safe manner, and I hope to see this pattern be a common workflow in the future, with |
Alright I went ahead and reworked the This also removed the ugly string conversion in |
I talked with @nickva out of band and realized I confused myself on the need for using absolute values around the time deltas for negative timestamps, negative negative numbers confused me, but he's correct that the use of |
Okay over in 38608f7 I've dropped the extraneous You can actually create that matcher now, but it needs to be registered directly by way of remsh, for example, the follow dynamically creates a CSRT logger matcher that satisfies the constraint (node1@127.0.0.1)16> rr(csrt_server).
[coordinator,rctx,rpc_worker,st]
(node1@127.0.0.1)17> ets:fun2ms(fun(#rctx{dbname = <<"foo">>, type=#coordinator{mod='chttpd_db', func='handle_changes_req'}, ioq_calls=IC, js_filter=JF}=R) when IC > 1000 andalso JF > 100 -> R end).
[{#rctx{started_at = '_',updated_at = '_',pid_ref = '_',
nonce = '_',
type = #coordinator{mod = chttpd_db,
func = handle_changes_req,method = '_',path = '_'},
dbname = <<"foo">>,username = '_',db_open = '_',
docs_read = '_',docs_written = '_',rows_read = '_',
changes_returned = '_',ioq_calls = '$1',js_filter = '$2',
js_filtered_docs = '_',get_kv_node = '_',get_kp_node = '_'},
[{'andalso',{'>','$1',1000},{'>','$2',100}}],
['$_']}]
(node1@127.0.0.1)18> csrt_logger:register_matcher("custom_foo", ets:fun2ms(fun(#rctx{dbname = <<"food_db', func='handle_changes_req'}, ioq_calls=IC, js_filter=JF}=R) when IC > 1000 andalso JF > 100 -> R end)).
ok That'll dynamically compile the matchspec and push it out by way of persistent_term to be picked up by the tracker pids to decide whether or not to generate a process lifecycle report. The tricky bit is mapping I would love to see a simple translation layer that basically lets us use Mango syntax for declaring these filters, as that would make it much easier to express within the ini files, but also it would allow us to dynamically construct the logger matcher specs on the fly for a given HTTP request, eg you could |
sed -I '' 's/csrt.hrl/couch_srt.hrl/g' src/*/{src,test/eunit}/*.erl sed -I '' 's/csrt:/couch_srt:/g' src/*/{src,test/eunit}/*.erl sed -I '' 's/csrt_\([a-z_]*\):/couch_srt_\1:/g' src/*/{src,test/eunit}/*.erl Cleanup remaining 'csrt\(_[a-z_]*\)\?:' references Hook in couch_srt app sed -I '' 's/^-module(csrt/-module(couch_srt/g' src/*/{src,test/eunit}/*.erl More cleanup
Over in 38a53a0 I migrated CSRT into the
That commit creates a clean separation between One thing that I'd like feedback on in particular, I preserved the name "CSRT" as I think it's a good name for the system and I've grown accustomed to referring to it as such. The public Erlang APIs all use I like the balance of I kept that as an isolated commit to have a logical commit containing all the direct sed changes, and then I waited to run Tomorrow I'll add some additional updates to the overview documentation to better illustrate when/where/how users would initially enable and utilize CSRT. Aside from the additional documentation updates, I think I've addressed all outstanding issues I'm aware of, any further feedback on this PR? |
This PR supercedes #5491 and includes @iilyak's excellent HTTP updates on top of it, as well as some final cleanup and documentation from me. I've copied the contents of
CSRT.md
into the PR description here:Couch Stats Resource Tracker (CSRT)
CSRT (Couch Stats Resource Tracker) is a real time stats tracking system that
tracks the quantity of resources induced at the process level in a live
queryable manner that also generates process lifetime reports containing
statistics on the total resource load of a request, as a function of things like
dbs/docs opened, view and changes rows read, changes returned vs processed,
Javascript filter usage, duration, and more. This system is a paradigm shift in
CouchDB visibility and introspection, allowing for expressive real time querying
capabilities to introspect, understand, and aggregate CouchDB internal resource
usage, as well as powerful filtering facilities for conditionally generating
reports on "heavy usage" requests or "long/slow" requests. CSRT also extends
recon:proc_window
withcsrt:proc_window
allowing for the same style ofbattle hardened introspection with Recon's excellent
proc_window
, but with thesample window over any of the CSRT tracked CouchDB stats!
CSRT does this by piggy-backing off of the existing metrics tracked by way of
couch_stats:increment_counter
at the time when the local process induces thosemetrics inc calls, and then CSRT updates an ets entry containing the context
information for the local process, such that global aggregate queries can be
performed against the ets table as well as the generation of the process
resource usage reports at the conclusions of the process's lifecyle.The ability
to do aggregate querying in realtime in addition to the process lifecycle
reports for post facto analysis over time, is a cornerstone of CSRT that is the
result of a series of iterations until a robust and scalable aproach was built.
The real time querying is achieved by way of a global ets table with
read_concurrency
,write_concurrency
, anddecentralized_counters
enabled.Great care was taken to ensure that zero concurrent writes to the same key
occure in this model, and this entire system is predicated on the fact that
incremental updates to
ets:update_counters
provides really fast andefficient updates in an atomic and isolated fashion when coupled with
decentralized counters and write concurrency. Each process that calls
couch_stats:increment_counter
tracks their local context in CSRT as well, withzero concurrent writes from any other processes. Outside of the context setup
and teardown logic, only operations to
ets:update_counter
are performed, oneper process invocation of
couch_stats:increment_counter
, and one forcoordinators to update worker deltas in a single batch, resulting in a 1:1 ratio
of ets calls to real time stats updates for the primary workloads.
The primary achievement of CSRT is the core framework iself for concurrent
process local stats tracking and real time RPC delta accumulation in a scalable
manner that allows for real time aggregate querying and process lifecycle
reports. This took several versions to find a scalable and robust approach that
induced minimal impact on maximum system throughput. Now that the framework is
in place, it can be extended to track any further desired process local uses of
couch_stats:increment_counter
. That said, the currently selected set of statsto track was heavily influenced by the challenges in reotractively understanding
the quantity of resources induced by a query like
/db/_changes?since=$SEQ
, orsimilarly,
/db/_find
.CSRT started as an extension of the Mango execution stats logic to
_changes
feeds to get proper visibility into quantity of docs read and filtered per
changes request, but then the focus inverted with the realization that we should
instead use the existing stats tracking mechanisms that have already been deemed
critical information to track, which then also allows for the real time tracking
and aggregate query capabilities. The Mango execution stats can be ported into
CSRT itself and just become one subset of the stats tracked as a whole, and
similarly, any additional desired stats tracking can be easily added and will
be picked up in the RPC deltas and process lifetime reports.
CSRT Config Keys
-define(CSRT, "csrt").
Primary CSRT config namespace: contains core settings for enabling different
layers of functionality in CSRT, along with global config settings for limiting
data volume generation.
-define(CSRT_MATCHERS_ENABLED, "csrt_logger.matchers_enabled").
Config toggles for enabling specific builtin logger matchers, see the dedicated
section below on
# CSRT Default Matchers
.-define(CSRT_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").
Config settings for defining the primary
Threshold
value of the builtin loggermatchers, see the dedicated section below on
# CSRT Default Matchers
.-define(CSRT_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").
Config section for setting
$db_name = $threshold
resulting in instantiating a"dbname_io" logger matcher for each
$db_name
that will generate a CSRTlifecycle report for any contexts that that induced more operations on any one
field of
ioq_calls|get_kv_node|get_kp_node|docs_read|rows_read
that is greaterthan
$threshold
and is on database$db_name
.This is basically a simple matcher for finding heavy IO requests on a particular
database, in a manner amenable to key/value pair specifications in this .ini
file until a more sophisticated declarative model exists. In particular, it's
not easy to sequentially generate matchspecs by way
ets:fun2ms/1
, and so analternative mechanism for either dynamically assembling an
#rctx{}
to matchagainst or generating the raw matchspecs themselves is warranted.
-define(CSRT_INIT_P, "csrt.init_p").
Config toggles for tracking counters on spawning of RPC
fabric_rpc
workers byway of
rexi_server:init_p
. This allows us to conditionally enable new metricsfor the desired RPC operations in an expandable manner, without having to add
new stats for every single potential RPC operation. These are for the individual
metrics to track, the feature is enabled by way of the config toggle
config:get(?CSRT, "enable_init_p")
, and these configs can left alone for themost part until new operations are tracked.
CSRT Code Markers
-define(CSRT_ETS, csrt_server).
This is the reference to the CSRT ets table, it's managed by
csrt_server
sothat's where the name originates from.
-define(MATCHERS_KEY, {csrt_logger, all_csrt_matchers}).
This marker is where the active matchers are written to in
persisten_term
forconcurrently and parallelly and accessing the logger matchers in the CSRT
tracker processes for lifecycle reporting.
CSRT Process Dictionary Markers
-define(PID_REF, {csrt, pid_ref}).
This marker is for the core storing the core
PidRef
identifier. The key ideahere is that a lifecycle is a context lifecycle is contained to within the given
PidRef
, meaning that aPid
can instantiate different CSRT lifecycles andpass those to different workers.
This is specifically necessary for long running processes that need to handle
many CSRT context lifecycles over the course of that individual process's
lifecycle independent. In practice, this is immediately needed for the actual
coordinator lifecycle tracking, as
chttpd
uses a worker pool of http requesthandlers that can be re-used, so we need a way to create a CSRT lifecycle
corresponding to the given request currently being serviced. This is also
intended to be used in other long running processes, like IOQ or
couch_js
pidssuch that we can track the specific context inducing the operations on the
couch_file
pid or indexer or replicator or whatever.Worker processes have a more clear cut lifecycle, but either style of process
can be exit'ed in a manner that skips the ability to do cleanup operations, so
additionally there's a dedicated tracker process spawned to monitor the process
that induced the CSRT context such that we can do the dynamic logger matching
directly in these tracker processes and also we can properly cleanup the ets
entries even if the Pid crashes.
-define(TRACKER_PID, {csrt, tracker}).
A handle to the spawned tracker process that does cleanup and logger matching
reprots at the end of the process lifecycle. We store a reference to the tracker
pid so that for explicit context destruction, like in
chttpd
workers after arequest has been serviced, we can update stop the tracker and perform the
expected cleanup directly.
-define(DELTA_TA, {csrt, delta_ta}).
This stores our last delta snapshot to track progress since the last incremental
streaming of stats back to the coordinator process. This will be updated after
the next delta is made with the latest value. Eg this stores
T0
so we can doT1 = get_resource()
make_delta(T0, T1)
and then we saveT1
as the newT0
for use in our next delta.
-define(LAST_UPDATED, {csrt, last_updated}).
This stores the integer corresponding to the
erlang:monotonic_time()
value ofthe most recent
updated_at
value. Basically this lets us utilize a pdictvalue to be able to turn
update_at
tracking into an incremental operation thatcan be chained in the existing atomic
ets:update_counter
andets:update_element
calls.The issue being that our updates are of the form
+2 to ioq_calls for $pid_ref
,which ets does atomically in a guaranteed
atomic
andisolated
manner. Thestrict use of the atomic operations for tracking these values is why this
system works effeciently at scale. This means that we can increment counters on
all of the stats counter fields in a batch, very quickly, but for tracking
updated_at
timestamps we'd need to either do an extra ets call to get the lastupdated_at
value, or do an extra ets call toets:update_element
to set theupdated_at
value tocsrt_util:tnow()
. The core problem with this is that thebatch inc operation is essentially the only write operation performed after the
initial context setting of dbname/handler/etc; this means that we'd literally
double the number of ets calls induced to track CSRT updates, just for tracking
the
updated_at
. So instead, we rely on the fact that the local processcorresponding to
$pid_ref
is the only process doing updates so we know thelast
updated_at
value will be the last time this process updated the data. Sowe track that value in the pdict and then take a delta between
tnow()
andupdated_at
, and thenupdated_at
becomes a value we can sneak into the otherinteger counter updates we're already performing!
Primary Config Toggles
CSRT (?CSRT="csrt") Config Settings
config:get(?CSRT, "enable", false).
Core enablement toggle for CSRT, defaults to false. Enabling this setting
intiates local CSRT stats collection as well as shipping deltas in RPC
responses to accumulate in the coordinator.
This does not trigger the new RPC spawn metrics, and it does not enable
reporting for any of the rctx types.
NOTE: you MUST have all nodes in the cluster running a CSRT aware CouchDB
before you enable it on any node, otherwise the old version nodes won't know
how to handle the new RPC formats including an embedded Delta payload.
config:get(?CSRT, "enable_init_p", false).
Enablement of tracking new metric counters for different
fabric_rpc
operationstypes to track spawn rates of RPC work induced across the cluster. There is
corresponding config lookups into the
?CSRT_INIT_P
namespace for keys of theform:
atom_to_list(Mod) ++ "__" atom_to_list(Fun)
, eg"fabric_rpc__open_doc"
for enabling the specific RPC endpoints.
However, those individual settings can be ignored and this top level config
toggle is what should be used in general, as the function specific config
toggles predominantly exist to enable tracking a subet of total RPC operations
in the cluster, and new endpoints can be added here.
config:get(?CSRT, "enable_reporting", false).
This is the primary toggle for enabling CSRT process lifetime reports containing
detailed information about the quantity of work induced by the given
request/worker/etc. This is the top level toggle for enabling any reporting,
and there also exists
config:get(?CSRT, "enable_rpc_reporting", false).
todisable the reporting of any individual RPC workers, leaving the coordinator
responsible of generating a report with the accumulated deltas.
config:get(?CSRT, "enable_rpc_reporting", false).
This enables the possibility of RPC workers generating reports. They still need
to hit the configured thresholds to induce a report, but this will generate CSRT
process lifetime reports for individual RPC workers that trigger the configured
logger thresholds. This allows for quantifying per node resource usage when
desired, as otherwise the reports are at the http request level and don't
provide per node stats.
The key idea here is that having RPC level CSRT process lifetime reporting is
incredibly useful, but can also generate large quantities of data. For example,
a view query on a Q=64 database will stream results from 64 shard replicas,
resulting in at least 64 RPC reports, plus any that might have been generated
from RPC workers that "lost" the race for shard replica. This is very useful,
but a lot of data given the verbose nature of funneling it through the RSyslog
reports, however, the ability to write directly to something like ClickHouse or
another columnar store would be great.
Until there's an efficient storage mechanism to stream the results to, the
rsyslog entries work great and are very practical, but care must be taken to
not generate too much data for aggregate queries as they generate at least
Qx
more report than an individual report per http request from the coordinator.
This setting exists as a way to either a) utilize the logger matcher configured
thresholds to allow for any rctx's to be recorded when they induce heavy
operations, either Coordinator or RPC worker; or b) to only log workloads at
the coordinator level.
NOTE: this setting exists because we lack an expressive enough config
declaration to easily chain the matchspec constructions as
ets:fun2ms/1
is aspecial compile time parse transform macro that requires the fully definition to
be specified directly, it cannot be iteractively constructed. That said, you
can register matchers through remsh with more specific and fine grained
pattern matching, and a more expressive system for defining matchers are being
explored.
config:get_boolean(?CSRT, "should_truncate_reports", true)
Enables truncation of the CSRT process lifetime reports to not include any
fields that are zero at the end of process lifetime, eg don't include
js_filter=0
in the report if the request did not induce Javascript filtering.This can be disabled if you really care about consistent fields in the report
logs, but this is a log space saving mechanism, similar to disabling RPC
reporting by default, as its a simple way to reduce overall volume
config:get(?CSRT, "randomize_testing", true).
This is a
make eunit
only feature toggle that will induce randomness into thecluster's
csrt:is_enabled()
state, specifically to utilize the test suite toexercise edge case scenarios and failures when CSRT is only conditionally
enabled, ensuring that it gracefuly and robustly handles errors without fallout
to the underlying http clients.
The idea here is to introduce randomness into whether CSRT is enabled across all
the nodes to simulate clusters with heterogeneous CSRT enablement and also to
ensure that CSRT works properly when toggled on/off wihout causing any
unexpected fallout to the client requests.
This is a config toggle specifically so that the actual CSRT tests can disable
it for making accurate assertions about resource usage traacking, and is not
intended to be used directly.
config:get_integer(?CSRT, "query_limit", ?QUERY_LIMIT)
Limit the quantity of rows that can be loaded in an http query.
CSRT_INIT_P (?CSRT_INIT_P="csrt.init_p") Config Settings
config:get(?CSRT_INIT_P, ModFunName, false).
These config toggles exist to conditionaly enable additional tracking of RPC
endpoints of interest, but rather it's a way to selectively enable tracking for
a subset of RPC operations, in a way we can extend later to add more. The
ModFunName
is of the formatom_to_list(Mod) ++ "__" atom_to_list(Fun)
, eg"fabric_rpc__open_doc"
, and right now, only exists forfabric_rpc
modules.NOTE: this is a bit awkward and isn't meant to be used directly, instead,
utilize
config:set(?CSRT, "enable_init_p", "true").
to enable or disable theseas a whole.
The current set of operations, as copied in from
default.ini
CSRT Logger Matcher Enablement and Thresholds
There are currently six builtin default loggers designed to make it easy to do
filtering on heavy resource usage inducing and long running requests. These are
designed as a simple baseline of useful matchers, declared in a manner amenable
to
default.ini
based constructs. More expressive matcher declarations arebeing explored, and matchers of arbitrary complexity can be registered directly
through remsh. The default matchers are all designed around an integer config
Threshold that triggers on a specific field, eg docs read, or on a delta of
fields for long requests and changes requests that process many rows but return
few.
The current default matchers are:
less than was necessarily loaded to complete the request (eg find heavy
filtered changes requests reading many rows but returning few).
Each of the default matchers has an enablement setting in
config:get(?CSRT_MATCHERS_ENABLED, Name)
for toggling enablement of it, and acorresponding threshold value setting in
config:get(?CSRT_MATCHERS_THRESHOLD, Name)
that is an integer value corresponding to the specific nature of thatmatcher.
CSRT Logger Matcher Enablement (?CSRT_MATCHERS_ENABLED)
config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_read", false)
Enable the
docs_read
builtin matcher, with a defaultThreshold=1000
, suchthat any request that reads more than
Threshold
docs will generate a CSRTprocess lifetime report with a summary of its resouce consumption.
This is different from the
rows_read
filter in that a view with?limit=1000
will read 1000 rows, but the same request with
?include_docs=true
will alsoinduce an additional 1000 docs read.
config:get_boolean(?CSRT_MATCHERS_ENABLED, "rows_read", false)
Enable the
rows_read
builtin matcher, with a defaultThreshold=1000
, suchthat any request that reads more than
Threshold
rows will generate a CSRTprocess lifetime report with a summary of its resouce consumption.
This is different from the
docs_read
filter so that we can distinguish betweenheavy view requests with lots of rows or heavy requests with lots of docs.
config:get_boolean(?CSRT_MATCHERS_ENABLED, "docs_written", false)
Enable the
docs_written
builtin matcher, with a defaultThreshold=500
, suchthat any request that writtens more than
Threshold
docs will generate a CSRTprocess lifetime report with a summary of its resouce consumption.
config:get_boolean(?CSRT_MATCHERS_ENABLED, "ioq_calls", false)
Enable the
ioq_calls
builtin matcher, with a defaultThreshold=10000
, suchthat any request that induces more than
Threshold
IOQ calls will generate aCSRT process lifetime report with a summary of its resouce consumption.
config:get_boolean(?CSRT_MATCHERS_ENABLED, "long_reqs", false)
Enable the
long_reqs
builtin matcher, with a defaultThreshold=60000
, suchthat any request where the the last CSRT rctx
updated_at
timestamp is at leastThreshold
milliseconds grather than thestarted_at timestamp
will generate aCSRT process lifetime report with a summary of its resource consumption.
CSRT Logger Matcher Threshold (?CSRT_MATCHERS_THRESHOLD)
config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_read", 1000)
Threshold for
docs_read
logger matcher, defaults to1000
docs read.config:get_integer(?CSRT_MATCHERS_THRESHOLD, "rows_read", 1000)
Threshold for
rows_read
logger matcher, defaults to1000
rows read.config:get_integer(?CSRT_MATCHERS_THRESHOLD, "docs_written", 500)
Threshold for
docs_written
logger matcher, defaults to500
docs written.config:get_integer(?CSRT_MATCHERS_THRESHOLD, "ioq_calls", 10000)
Threshold for
ioq_calls
logger matcher, defaults to10000
IOQ calls made.config:get_integer(?CSRT_MATCHERS_THRESHOLD, "long_reqs", 60000)
Threshold for
long_reqs
logger matcher, defaults to60000
milliseconds.Core CSRT API
The
csrt(.erl)
module is the primary entry point into CSRT, containing APIfunctionality for tracking the lifecycle of processes, inducing metric tracking
over that lifecycle, and also a variety of functions for aggregate querying.
It's worth noting that the CSRT context tracking functions are specifically
designed to not
throw
and be safe in the event of unexpected CSRT failures oredge cases. The aggregate query API has some callers that will actually throw,
but aside from this core CSRT operations will not bubble up exceptions, and will
either return the error value, or catch the error and move on rather than
chaining further errors.
PidRef API
These are functions are CRUD operations around creating and storing the CSRT
PidRef
handle.Context Lifecycle API
These are the CRUD functions for handling a CSRT context lifecycle, where a
lifecycle context is created in a
chttpd
coordinator process by way ofcsrt:create_coordinator_context/2
, or inrexi_server:init_p
by way ofcsrt:create_worker_context/3
. Additional functions are exposed for settingcontext specific info like username/dbname/handler.
get_resource
fetches thecontext being tracked corresponding to the given
PidRef
.Public API
The "Public" or miscellaneous API for lack of a better name. These are various
functions exposed for wider use and/or testing purposes.
Stats Collection API
This is the stats collection API utilized by way of
couch_stats:increment_counter
to do local process tracking, and also inrexi
to adding and extracting delta contexts and then accumulating those values.
NOTE:
make_delta/0
is a "destructive" operation that will induce a new deltaby way of the last local pdict's rctx delta snapshot, and then update to the
most recent version. Two individual rctx snapshots for a PidRef can safely
generate an actual delta by way of
csrt_util:rctx_delta/2
.TODO: RPC/QUERY DOCS
Recon API Ports of https://github.com/ferd/recon/releases/tag/2.5.6
This is a "port" of
recon:proc_window
tocsrt:proc_window
, allowing forproc_window
style aggregations/sorting/filtering but with the stats fieldscollected by CSRT! This is also a direct port of
recon:proc_window
in that itutilizes the same underlying logic and effecient internal data structures as
recon:proc_window
, but rather only changes the Sample function:In particular, our change is
Sample = fun() -> pid_ref_attrs(AttrName) end,
,and in fact, if recon upstream parameterized the option of
AttrName
orSampleFunction
, this could be reimplemented as:This implementation is being highlighted here because
recon:proc_window/3
isbattle hardened and
recon_lib:sliding_window
uses an effecient internal datastructure for storing the two samples that has been proven to work in production
systems with millions of active processes, so swapping the
Sample
functionwith a CSRT version allows us to utilize the production grade recon
functionality, but extended out to the particular CouchDB statistics we're
esepecially interested in.
And on a fun note: any further stats tracking fields added to CSRT tracking will
automatically work with this too.
Core types and Maybe types
Before we look at the
#rctx{}
record fields, lets examine the core datatypesdefined by CSRT for use in Dialyzer typespecs. There are more, but these are the
essentials and demonstrate the "maybe" typespec approach utilized in CSRT.
Let's say we have a
-type foo() :: #foo{}
and-type maybe_foo() :: foo() | undefined
, we then can construct functions of the form-spec get_foo(id()) -> maybe_foo()
and then we can use Dialyzer to statically assert all callers ofget_foo/1
handle themaybe_foo()
data type rather than justfoo()
andensure that all subsequent callers do as well.
This approach of
-spec maybe_<Type> :: <Type> | undefined
is utilizedthroughout CSRT and has greatly aided in the development, refactoring, and
static analysis of this system. Here's a useful snippet for running Dialyzer
while hacking on CSRT:
Above we have the core
pid_ref()
data type, which is just a tuple with apid()
and areference()
, and naturally,maybe_pid_ref()
handles theoptional presence of a
pid_ref()
, allowing for our APIs likecsrt:get_resource(maybe_pidref())
to handle ambiguity of the presence of apid_ref()
.We define our core
rctx()
data type as an empty#rctx{}
, or the morespecific
coordinator_rctx()
orrpc_worker_rctx()
such that we can bespecific about the
rctx()
type in functions that need to distinguish. And thenas expected, we have the notion of
maybe_rctx()
.#rctx{}
This is the core data structure utilized to track a CSRT context for a
coordinator or rpc_worker process, represented by the
#rctx{}
record, andstored in the
?CSRT_ETS
table keyed on{keypos, #rctx.pid_ref}
.The Metadata fields store labeling data for the given process being tracked,
such as started_at and updated_at timings, the primary
pid_ref
id key, thetype of the process context, and some additional information like username,
dbname, and the nonce of the coordinator request.
The Stats Counters fields are
non_neg_integer()
monotonically increasingcounters corresponding to the
couch_stats
metrics counters we're interested intracking at a process level cardinality. The use of these purely integer counter
fields represented by a record represented in an ets table is the cornerstone of
CSRT and why its able to operate at high throughput and high concurrency, as
ets:update_counter/{3,4}
take increment operations to be performed atomicallyand in isolation, in a manner in which does not require fetching and loading the
data directly. We then take care to batch the accumulation of delta updates into
a single
update_counter
call and even sneak in theupdated_at
tracking as ainteger counter update without inducing an extra ets call.
NOTE: the typespec's for these fields include
'_'
atoms as possible types asthat is the matchspec wildcard any of the fields can be set to when using an
existing
#rctx{}
record to search with.Metadata
We use
csrt_util:tnow()
for time tracking, which is anative
formaterlang:monotonic_time()
integer, which, noteably, can be and is often anegative value. You must either take a delta or convert the time to get into a
useable format, as one might suspect by the use of
native
.We make use of
erlang:mononotic_time/0
as per the recommendation inhttps://www.erlang.org/doc/apps/erts/time_correction.html#how-to-work-with-the-new-api
for the suggested way to
Measure Elasped Time
, as quoted:So our
csrt_util:tnow/0
is implemented as the following, and we storetimestamps in
native
format as long as possible to avoid precision loss athigher units of time, eg 300 microseconds is zero milliseconds.
We store timestamps in the node's local erlang representation of time,
specifically to be able to effeciently do time deltas, and then we track time
deltas from the local node's perspective to not send timestamps across the wire.
We then utilize
calendar:system_time_to_rfc3339
to convert the local node'snative time representation to its corresponding time format when we generate the
process life cycle reports or send an http response.
NOTE: because we do an inline definition and assignment of the
#rctx.started_at
and#rctx.updated_at
fields tocsrt_util:tnow()
, wemust declare
#rctx.updated_at
after#rctx.started_at
to avoidfundamental time incongruenties.
#rctx.started_at = csrt_util:tnow() :: integer() | '_',
A static value corresponding to the local node's Erlang monotonic_time at which
this context was created.
#rctx.updated_at = csrt_util:tnow() :: integer() | '_',
A dynamic value corresponding to the local node's Erlang monotonic_time at which
this context was updated. Note: unlike
#rctx.started_at
, this value willupdate over time, and in the process lifecycle reports the
#rctx.updated_at
value corresponds to the point at which the context was destroyed, allowing for
calculation of the total duration of the request/context.
#rctx.pid_ref :: maybe_pid_ref() | {'', ''} | '_',
The primary identifier used to track the resources consumed by a given
pid()
for a specific context identified with a
make_ref()
, and combined together asunit as a given
pid()
, eg thechttpd
worker pool, can have many contextsover time.
#rctx.nonce :: nonce() | undefined | '_',
The
Nonce
value of the http request being serviced by thecoordinator_rctx()
used as the primary grouping identifier of workers across the cluster, as the
Nonce
is funneled throughrexi_server
.#rctx.type :: rctx_type() | undefined | '_',
A subtype classifier for the
#rctx{}
contexts, right now only supporting#rpc_worker{}
and#coordinator{}
, but CSRT was designed to accomodateadditional context types like
#view_indexer{}
,#search_indexer{}
,#replicator{}
,#compactor{}
,#etc{}
.#rctx.dbname :: dbname() | undefined | '_',
The database name, filled in at some point after the initial context creation by
way of
csrt:set_context_dbname/{1,2}
.#rctx.username :: username() | undefined | '_',
The requester's username, filled in at some point after the initial context
creation by way of
csrt:set_context_username/{1,2}
.Stats Counters
All of these stats counters are stricly
non_neg_integer()
counter values thatare monotonically increasing, as we only induce positive counter increment calls
in CSRT. Not all of these values will be nonzero, eg if the context doesn't
induce Javascript filtering of documents, it won't inc the
#rctx.js_filter
field. The
"should_truncate_reports"
config value described in this documentwill conditionally exclude the zero valued fields from being included in the
process life cycle report.
#rctx.db_open = 0 :: non_neg_integer() | '_',
The number of
couch_server:open/2
invocations induced by this context.#rctx.docs_read = 0 :: non_neg_integer() | '_',
The number of
couch_db:open_doc/3
invocations induced by this context.#rctx.docs_written = 0 :: non_neg_integer() | '_',
A phony metric counting docs written by the context, induced by
csrt:docs_written(length(Docs0)),
infabric_rpc:update_docs/3
as a way tocount the magnitude of docs written, as the actual document writes happen in the
#db.main_pid
couch_db_updater
pid and subprocess tracking is not yetsupported in CSRT.
This can be replaced with direct counting once passthrough contexts work.
#rctx.rows_read = 0 :: non_neg_integer() | '_',
A value tracking multiple possible metrics corresponding to rows streamed in
aggregate operations. This is used for view_rows/changes_rows/all_docs/etc.
#rctx.changes_returned = 0 :: non_neg_integer() | '_',
The number of
fabric_rpc:changes_row/2
invocations induced by this context,specifically tracking the number of changes rows streamed back to the client
requeest, allowing for distinguishing between the number of changes processed to
fulfill a request versus the number actually returned in the http response.
#rctx.ioq_calls = 0 :: non_neg_integer() | '_',
A phony metric counting invocations of
ioq:call/3
induced by this context. Aswith
#rctx.docs_written
, we need a proxy metric to reperesent these callsuntil CSRT context passing is supported so that the
ioq_server
pid and returnits own delta back to the worker pid.
#rctx.js_filter = 0 :: non_neg_integer() | '_',
A phony metric counting the number of
couch_query_servers:filter_docs_int/5
(eg ddoc_prompt) invocations induced by this context. This is called by way of
csrt:js_filtered(length(JsonDocs))
which both incrementsjs_filter
by 1, andjs_filtered_docs
by the length of the docs so we can track magnitude of docsand doc revs being filtered.
#rctx.js_filtered_docs = 0 :: non_neg_integer() | '_',
A phony metric counting the quantity of documents filtered by way of
couch_query_servers:filter_docs_int/5
(eg ddoc_prompt) invocations induced bythis context. This is called by way of
csrt:js_filtered(length(JsonDocs))
which both increments
#rctx.js_filter
by 1, and#rctx.js_filtered_docs
bythe length of the docs so we can track magnitude of docs and doc revs being
filtered.
#rctx.get_kv_node = 0 :: non_neg_integer() | '_',
This metric tracks the number of invocations to
couch_btree:get_node/2
inwhich the
NodeType
returned bycouch_file:pread_term/2
iskv_node
, insteadof
kp_node
.This provides a mechanism to quantify the impact of document count and document
size as those values become larger in the logarithmic complexity btree
algorithms. size on the logarithmic complexity btree algorithms as the database
btrees grow.
#rctx.get_kp_node = 0 :: non_neg_integer() | '_'
This metric tracks the number of invocations to
couch_btree:get_node/2
inwhich the
NodeType
returned bycouch_file:pread_term/2
iskp_node
, insteadof
kv_node
.This provides a mechanism to quantify the impact of document count and document
size as those values become larger in the logarithmic complexity btree
algorithms. size on the logarithmic complexity btree algorithms as the database
btrees grow.
%% "Example to extend CSRT"
%%write_kv_node = 0 :: non_neg_integer() | '',
%%write_kp_node = 0 :: non_neg_integer() | ''