metrics: fix input metrics double-counting with conditional routing#11948
metrics: fix input metrics double-counting with conditional routing#11948zshuang0316 wants to merge 2 commits into
Conversation
|
@codex please review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b956a77dd3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ret = flb_input_chunk_append_raw_local(ins, | ||
| ret = flb_input_chunk_append_raw_local_flags(ins, | ||
| FLB_INPUT_LOGS, | ||
| FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, |
There was a problem hiding this comment.
Avoid dropping metrics when the base append is skipped
In conditional routing under mem_buf_limit or storage pause, one of these routed appends can successfully create a route-specific chunk and then pause the input via flb_input_chunk_protect; because the route copy is marked FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, the only planned input-counter update is the later unconditional append. That append runs after this loop and returns before counting when flb_input_buf_paused() is true, so chunks already accepted for conditional routes are delivered without ever being reflected in fluentbit_input_records_total / _bytes_total.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b956a77dd3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ret = flb_input_chunk_append_raw_local(ins, | ||
| ret = flb_input_chunk_append_raw_local_flags(ins, | ||
| FLB_INPUT_LOGS, | ||
| FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, |
There was a problem hiding this comment.
Count route copies if the base append can fail
When conditional/per-record routing is active and a route-copy append succeeds but the later unconditional append fails (for example, the route copy pushes the input over mem_buf_limit/storage limits and flb_input_chunk_protect() pauses it before the base append runs), this flag suppresses the only input metrics for data that remains queued for that route. The function returns an error, but the per-route chunk has already been appended and had its direct outputs applied, so fluentbit_input_records_total / bytes can under-report delivered records on that failure path; count the original ingestion before splitting or account/undo the successful route copies when the base append fails.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Codex Review: Didn't find any major issues. More of your lovely PRs please.
Reviewed commit: a06ca3ca09
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR introduces a ChangesInput Metrics Suppression for Route-Split Copies
Sequence Diagram(s)sequenceDiagram
participant ProcessInternal as input_log_append_processed_internal
participant MetricsCounter
participant SplitRoute as split_and_append_route_payloads
participant AppendFlags as flb_input_chunk_append_raw_flags
participant InputChunk as input_chunk_append_raw
ProcessInternal->>MetricsCounter: pre-add records/bytes once (FLB_HAVE_METRICS)
note over ProcessInternal: base_flags = FLB_INPUT_CHUNK_SKIP_INPUT_METRICS
ProcessInternal->>SplitRoute: split batch per conditional route
SplitRoute->>AppendFlags: append route copy + FLB_INPUT_CHUNK_SKIP_INPUT_METRICS
AppendFlags->>InputChunk: forward flags
InputChunk-->>AppendFlags: skip input metrics (flag set)
ProcessInternal->>AppendFlags: base append + base_flags
AppendFlags->>InputChunk: forward flags
InputChunk-->>AppendFlags: skip input metrics (flag set)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a06ca3ca09
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cmt_counter_add(ins->cmt_records, ts, records, 1, (char *[]) {name}); | ||
| cmt_counter_add(ins->cmt_bytes, ts, buf_size, 1, (char *[]) {name}); | ||
|
|
||
| flb_metrics_sum(FLB_METRIC_N_RECORDS, records, ins->metrics); | ||
| flb_metrics_sum(FLB_METRIC_N_BYTES, buf_size, ins->metrics); |
There was a problem hiding this comment.
Count only after a route append is accepted
With conditional routing enabled, these counters are incremented before split_and_append_route_payloads() has successfully appended anything; if the first route-copy append is rejected (for example input_chunk_append_raw() returns -1 because the input is over its memory/ring-buffer limit and no space can be released, or chunk creation/preparation fails), the function returns -1 at the split failure path without running the base append. That records data as ingested even though no chunk was accepted, whereas the previous accounting happened after the append path passed those guards.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/flb_input_log.c`:
- Around line 1505-1530: The metrics accounting using cmt_counter_add and
flb_metrics_sum is currently executed before split_and_append_route_payloads()
is invoked. If split_and_append_route_payloads() fails and returns -1, the
function exits immediately but input metrics have already been incremented even
though no append actually completed. Move the cmt_counter_add and
flb_metrics_sum calls to execute only after split_and_append_route_payloads()
succeeds, ensuring metrics are only recorded when the append operation actually
completes. Apply this same fix to the other occurrences mentioned in the comment
at the related code sections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b87e7395-ab76-4223-a9dd-63c44eb34ad3
📒 Files selected for processing (3)
include/fluent-bit/flb_input_chunk.hsrc/flb_input_chunk.csrc/flb_input_log.c
Add FLB_INPUT_CHUNK_SKIP_INPUT_METRICS and flags-aware helpers flb_input_chunk_append_raw_flags() and flb_input_chunk_append_raw_local_flags(). The existing public functions forward flags=0 so all current callers are unaffected. The flag is used by the routing layer to suppress per-copy metric updates when the batch is accounted once at the ingestion boundary instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: zshuang0316 <zshuang0316@163.com>
a06ca3c to
ad2beeb
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad2beebdbf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (conditional_active == FLB_TRUE && accepted == FLB_TRUE && | ||
| records > 0 && buf_size > 0) { |
There was a problem hiding this comment.
Preserve zero-record fallback when counting input metrics
When conditional routing is active, all appends now use FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, so this block is the only place input records/bytes are counted. For callers that pass records == 0 and rely on the existing fallback in input_chunk_append_raw() to derive the log record count from the MessagePack payload, the append still succeeds but this guard skips both record and byte metrics entirely. Please derive the count before this check, or otherwise mirror the lower-layer zero-record recovery before suppressing the append-time metrics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Codex Review: Didn't find any major issues. Keep them coming!
Reviewed commit: 098f6c8e3d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
With conditional routing active, each batch is fanned out into one chunk copy per matching route plus a base append. Every append called flb_input_chunk_append_raw(), which incremented cmt_records/cmt_bytes, so a batch matching N routes was counted N+1 times in fluentbit_input_records_total and fluentbit_input_bytes_total. Mark all per-route copies and the base append with FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, then account the batch once in input_log_append_processed_internal() after at least one append is accepted. Counting after acceptance means a fully rejected batch (over mem_buf_limit with no space to free) is not recorded, while a partially accepted batch is not lost if a route copy pauses the input or the base append later fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: zshuang0316 <zshuang0316@163.com>
ad2beeb to
098f6c8
Compare
In
input_log_append_processed_internal()the ingestion is additive:split_and_append_route_payloads()creates one chunk copy per matching route, and then an additional unconditional append of the original buffer
always runs so non-conditional routes still receive the data. Each of those appends incremented the input counters
inside
input_chunk_append_raw(), so a record matching N routes was counted N+1 times. This is most visible within_opentelemetry(OTLP logs) when more than one route matches, where the totals roughly double.This is a metrics-accounting fix only — routing behavior and chunk delivery are unchanged.
Fix
FLB_INPUT_CHUNK_SKIP_INPUT_METRICSflag and thread it through the internal append path via newflb_input_chunk_append_raw_flags()/flb_input_chunk_append_raw_local_flags()helpers. The existing publicfunctions are unchanged (they forward
flags = 0), so no other callers are affected.split_and_append_route_payloads(). Theoriginal ingestion is still counted exactly once by the unconditional append, which preserves the existing pause /
empty-buffer guards in
input_chunk_append_raw().After the change,
fluentbit_input_records_totalequals the original record count andfluentbit_input_bytes_totalequals the original byte size regardless of how many routes match. Metrics, traces,profiles and blobs inputs are unaffected — they append once with no route splitting.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
New Features
Documentation