Skip to content

metrics: fix input metrics double-counting with conditional routing#11948

Open
zshuang0316 wants to merge 2 commits into
fluent:masterfrom
zshuang0316:fix-input-counter
Open

metrics: fix input metrics double-counting with conditional routing#11948
zshuang0316 wants to merge 2 commits into
fluent:masterfrom
zshuang0316:fix-input-counter

Conversation

@zshuang0316

@zshuang0316 zshuang0316 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

In input_log_append_processed_internal() the ingestion is additive: split_and_append_route_payloads()
creates one chunk copy per matching route, and then an additional unconditional append of the original buffer
always runs so non-conditional routes still receive the data. Each of those appends incremented the input counters
inside input_chunk_append_raw(), so a record matching N routes was counted N+1 times. This is most visible with
in_opentelemetry (OTLP logs) when more than one route matches, where the totals roughly double.

This is a metrics-accounting fix only — routing behavior and chunk delivery are unchanged.

Fix

  • Add an FLB_INPUT_CHUNK_SKIP_INPUT_METRICS flag and thread it through the internal append path via new
    flb_input_chunk_append_raw_flags() / flb_input_chunk_append_raw_local_flags() helpers. The existing public
    functions are unchanged (they forward flags = 0), so no other callers are affected.
  • Suppress input-level accounting on the per-route copy appends in split_and_append_route_payloads(). The
    original ingestion is still counted exactly once by the unconditional append, which preserves the existing pause /
    empty-buffer guards in input_chunk_append_raw().

    After the change, fluentbit_input_records_total equals the original record count and
    fluentbit_input_bytes_total equals the original byte size regardless of how many routes match. Metrics, traces,
    profiles and blobs inputs are unaffected — they append once with no route splitting.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented inflated input-level record/byte metrics during conditional routing by suppressing metrics for internal route-copy appends and updating batch metrics only once per ingested batch.
  • New Features

    • Added new “raw append” APIs that accept caller-provided flags, including local variants, to control whether input-level metric accounting is performed.
  • Documentation

    • Introduced a new flag macro to explicitly mark append operations that should skip input-level metrics.

@zshuang0316

Copy link
Copy Markdown
Contributor Author

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b956a77dd3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c
ret = flb_input_chunk_append_raw_local(ins,
ret = flb_input_chunk_append_raw_local_flags(ins,
FLB_INPUT_LOGS,
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping metrics when the base append is skipped

In conditional routing under mem_buf_limit or storage pause, one of these routed appends can successfully create a route-specific chunk and then pause the input via flb_input_chunk_protect; because the route copy is marked FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, the only planned input-counter update is the later unconditional append. That append runs after this loop and returns before counting when flb_input_buf_paused() is true, so chunks already accepted for conditional routes are delivered without ever being reflected in fluentbit_input_records_total / _bytes_total.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b956a77dd3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c
ret = flb_input_chunk_append_raw_local(ins,
ret = flb_input_chunk_append_raw_local_flags(ins,
FLB_INPUT_LOGS,
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count route copies if the base append can fail

When conditional/per-record routing is active and a route-copy append succeeds but the later unconditional append fails (for example, the route copy pushes the input over mem_buf_limit/storage limits and flb_input_chunk_protect() pauses it before the base append runs), this flag suppresses the only input metrics for data that remains queued for that route. The function returns an error, but the per-route chunk has already been appended and had its direct outputs applied, so fluentbit_input_records_total / bytes can under-report delivered records on that failure path; count the original ingestion before splitting or account/undo the successful route copies when the base append fails.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Didn't find any major issues. More of your lovely PRs please.

Reviewed commit: a06ca3ca09

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zshuang0316 zshuang0316 marked this pull request as draft June 15, 2026 15:46
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2db12589-d0f8-43e0-b739-b306dd26e771

📥 Commits

Reviewing files that changed from the base of the PR and between ad2beeb and 098f6c8.

📒 Files selected for processing (1)
  • src/flb_input_log.c

📝 Walkthrough

Walkthrough

The PR introduces a FLB_INPUT_CHUNK_SKIP_INPUT_METRICS flag and _flags variants of the raw chunk append APIs. The internal input_chunk_append_raw function gains a flags parameter that gates metric accounting. In the conditional routing path, per-route duplicate copies suppress input metrics so each batch is counted exactly once.

Changes

Input Metrics Suppression for Route-Split Copies

Layer / File(s) Summary
Public API contracts: macro and _flags function declarations
include/fluent-bit/flb_input_chunk.h
Adds FLB_INPUT_CHUNK_SKIP_INPUT_METRICS macro and declares flb_input_chunk_append_raw_flags and flb_input_chunk_append_raw_local_flags, each taking an extra flags parameter alongside the existing signatures.
Core implementation: flags threading and metric gating
src/flb_input_chunk.c
Extends internal input_chunk_append_raw with a flags parameter; gates input-level metric updates on FLB_INPUT_CHUNK_SKIP_INPUT_METRICS; propagates cr->flags from the ring-buffer collector; refactors flb_input_chunk_append_raw and flb_input_chunk_append_raw_local into thin wrappers around the new _flags variants.
Caller update: conditional routing metrics de-duplication
src/flb_input_log.c
In split_and_append_route_payloads, switches per-route copy appends to _flags variants with FLB_INPUT_CHUNK_SKIP_INPUT_METRICS and tracks success via new any_appended out-parameter. In input_log_append_processed_internal, pre-accounts batch metrics once before the conditional split and passes base_flags = FLB_INPUT_CHUNK_SKIP_INPUT_METRICS to the base append to prevent double-counting.

Sequence Diagram(s)

sequenceDiagram
  participant ProcessInternal as input_log_append_processed_internal
  participant MetricsCounter
  participant SplitRoute as split_and_append_route_payloads
  participant AppendFlags as flb_input_chunk_append_raw_flags
  participant InputChunk as input_chunk_append_raw

  ProcessInternal->>MetricsCounter: pre-add records/bytes once (FLB_HAVE_METRICS)
  note over ProcessInternal: base_flags = FLB_INPUT_CHUNK_SKIP_INPUT_METRICS

  ProcessInternal->>SplitRoute: split batch per conditional route
  SplitRoute->>AppendFlags: append route copy + FLB_INPUT_CHUNK_SKIP_INPUT_METRICS
  AppendFlags->>InputChunk: forward flags
  InputChunk-->>AppendFlags: skip input metrics (flag set)

  ProcessInternal->>AppendFlags: base append + base_flags
  AppendFlags->>InputChunk: forward flags
  InputChunk-->>AppendFlags: skip input metrics (flag set)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • fluent/fluent-bit#11144: Both PRs modify src/flb_input_log.c's routing/append flow to adjust how routing-related metrics are computed—this PR prevents double-counting input metrics during route-copy appends, while the retrieved PR adds/updates metrics for unmatched (dropped) routes.

Suggested reviewers

  • edsiper
  • cosmo0920

🐇 A hop through the metrics maze,
No double-counts to cause a daze!
Each record tallied just once true,
With flags to guide the append through.
The route may split, the chunks may fly —
But input counters never lie! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'metrics: fix input metrics double-counting with conditional routing' accurately describes the main change: fixing a metrics double-counting issue with conditional routing in input-level counters.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a06ca3ca09

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c Outdated
Comment on lines +1523 to +1527
cmt_counter_add(ins->cmt_records, ts, records, 1, (char *[]) {name});
cmt_counter_add(ins->cmt_bytes, ts, buf_size, 1, (char *[]) {name});

flb_metrics_sum(FLB_METRIC_N_RECORDS, records, ins->metrics);
flb_metrics_sum(FLB_METRIC_N_BYTES, buf_size, ins->metrics);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only after a route append is accepted

With conditional routing enabled, these counters are incremented before split_and_append_route_payloads() has successfully appended anything; if the first route-copy append is rejected (for example input_chunk_append_raw() returns -1 because the input is over its memory/ring-buffer limit and no space can be released, or chunk creation/preparation fails), the function returns -1 at the split failure path without running the base append. That records data as ingested even though no chunk was accepted, whereas the previous accounting happened after the append path passed those guards.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex code update, please review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/flb_input_log.c`:
- Around line 1505-1530: The metrics accounting using cmt_counter_add and
flb_metrics_sum is currently executed before split_and_append_route_payloads()
is invoked. If split_and_append_route_payloads() fails and returns -1, the
function exits immediately but input metrics have already been incremented even
though no append actually completed. Move the cmt_counter_add and
flb_metrics_sum calls to execute only after split_and_append_route_payloads()
succeeds, ensuring metrics are only recorded when the append operation actually
completes. Apply this same fix to the other occurrences mentioned in the comment
at the related code sections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b87e7395-ab76-4223-a9dd-63c44eb34ad3

📥 Commits

Reviewing files that changed from the base of the PR and between 47db630 and a06ca3c.

📒 Files selected for processing (3)
  • include/fluent-bit/flb_input_chunk.h
  • src/flb_input_chunk.c
  • src/flb_input_log.c

Comment thread src/flb_input_log.c Outdated
Add FLB_INPUT_CHUNK_SKIP_INPUT_METRICS and flags-aware helpers
flb_input_chunk_append_raw_flags() and
flb_input_chunk_append_raw_local_flags(). The existing public functions
forward flags=0 so all current callers are unaffected. The flag is used
by the routing layer to suppress per-copy metric updates when the batch
is accounted once at the ingestion boundary instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad2beebdbf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c Outdated
Comment on lines +1555 to +1556
if (conditional_active == FLB_TRUE && accepted == FLB_TRUE &&
records > 0 && buf_size > 0) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve zero-record fallback when counting input metrics

When conditional routing is active, all appends now use FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, so this block is the only place input records/bytes are counted. For callers that pass records == 0 and rely on the existing fallback in input_chunk_append_raw() to derive the log record count from the MessagePack payload, the append still succeeds but this guard skips both record and byte metrics entirely. Please derive the count before this check, or otherwise mirror the lower-layer zero-record recovery before suppressing the append-time metrics.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex please review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Didn't find any major issues. Keep them coming!

Reviewed commit: 098f6c8e3d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

With conditional routing active, each batch is fanned out into one chunk
copy per matching route plus a base append. Every append called
flb_input_chunk_append_raw(), which incremented cmt_records/cmt_bytes,
so a batch matching N routes was counted N+1 times in
fluentbit_input_records_total and fluentbit_input_bytes_total.

Mark all per-route copies and the base append with
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, then account the batch once in
input_log_append_processed_internal() after at least one append is
accepted. Counting after acceptance means a fully rejected batch (over
mem_buf_limit with no space to free) is not recorded, while a partially
accepted batch is not lost if a route copy pauses the input or the base
append later fails.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant