Skip to content

Conversation

@jinyongchoi
Copy link
Contributor

@jinyongchoi jinyongchoi commented Dec 19, 2025

This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

fluent/fluent-bit#11269

Summary by CodeRabbit

  • Documentation
    • Added guidance on data reliability and recovery for the Tail input, describing at-least-once delivery, restart-from-last-committed-checkpoint behavior, and potential minimal re-ingestion after unexpected shutdowns (added in two locations).
    • Clarified unicode.encoding guidance: noted that "auto" may fail in some environments, can misguess encoding, and recommended explicitly using UTF-16LE or UTF-16BE when guessing is unreliable.

✏️ Tip: You can customize this high-level summary in your review settings.

@jinyongchoi jinyongchoi requested review from a team as code owners December 19, 2025 14:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

Walkthrough

Added documentation to the Tail input describing data reliability and recovery semantics (at-least-once delivery, restart resumption from last committed checkpoint, and possible minimal re-ingestion after unexpected shutdowns) and clarified unicode.encoding auto-detection limitations with a recommendation to prefer explicit UTF-16LE/UTF-16BE variants when encoding guessing is uncertain. The data reliability block and the unicode.encoding guidance were inserted in two locations within the file.

Changes

Cohort / File(s) Summary
Documentation Updates
pipeline/inputs/tail.md
Added informational blocks on data reliability and recovery (at-least-once delivery, restart resume from last committed checkpoint, potential minimal re-ingestion after crashes) in two places; expanded guidance on unicode.encoding auto-detection limitations in both occurrences and recommended explicit UTF-16LE/UTF-16BE usage when unsure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change - adding a data reliability note to the in_tail plugin documentation, which matches the PR objectives perfectly.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfca97e and c017872.

📒 Files selected for processing (1)
  • pipeline/inputs/tail.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • pipeline/inputs/tail.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinyongchoi jinyongchoi force-pushed the fix/11265-in-tail-data-loss branch from 72849b6 to 8079262 Compare December 19, 2025 14:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fb7ce2 and 72849b6.

📒 Files selected for processing (1)
  • pipeline/inputs/tail.md (1 hunks)
🧰 Additional context used
🪛 GitHub Check: runner / vale
pipeline/inputs/tail.md

[failure] 100-100:
[vale] reported by reviewdog 🐶
[FluentBit.MayMightCan] Use 'can' for permissions or 'might' for possibility.

Raw Output:
{"message": "[FluentBit.MayMightCan] Use 'can' for permissions or 'might' for possibility.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 100, "column": 114}}}, "severity": "ERROR"}


[failure] 98-98:
[vale] reported by reviewdog 🐶
[FluentBit.MayMightCan] Use 'can' for permissions or 'might' for possibility.

Raw Output:
{"message": "[FluentBit.MayMightCan] Use 'can' for permissions or 'might' for possibility.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 98, "column": 138}}}, "severity": "ERROR"}


[failure] 98-98:
[vale] reported by reviewdog 🐶
[FluentBit.Latin] Use 'for example' instead of 'e.g.,'.

Raw Output:
{"message": "[FluentBit.Latin] Use 'for example' instead of 'e.g.,'.", "location": {"path": "pipeline/inputs/tail.md", "range": {"start": {"line": 98, "column": 26}}}, "severity": "ERROR"}

🪛 LanguageTool
pipeline/inputs/tail.md

[style] ~98-~98: ‘lag behind’ might be wordy. Consider a shorter alternative.
Context: ...ivery. The database offset may slightly lag behind the actual processed position if an **u...

(EN_WORDINESS_PREMIUM_LAG_BEHIND)

@jinyongchoi jinyongchoi force-pushed the fix/11265-in-tail-data-loss branch from 8079262 to dfca97e Compare December 19, 2025 14:21
@jinyongchoi
Copy link
Contributor Author

Following our discussion, I have updated the documentation to address the data reliability behavior during unexpected shutdowns (related to fluent/fluent-bit#11269).

Summary of changes:
I added a Hint block in the tail input plugin documentation under the Database file section. This update clarifies that:

  1. Fluent Bit guarantees at-least-once delivery.
  2. In scenarios of unexpected shutdowns (e.g., system crash, power loss), the database offset might slightly lag behind the actual processed position.
  3. Upon restart, the system resumes from the last committed checkpoint. This ensures no data is lost, although it may result in the re-ingestion of a minimal amount of data (duplication).

This addition ensures that users are aware of this behavior as a known corner case, as suggested.

Thanks!

Copy link
Contributor

@esmerel esmerel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good to me also

This commit adds a 'Data Reliability and Recovery' hint to the Tail input plugin documentation.

It clarifies the behavior of the database offset mechanism during unexpected shutdowns (e.g., system crash, power loss). Specifically, it explains that while Fluent Bit guarantees at-least-once delivery, there is a possibility of slight offset lag and minimal data duplication upon recovery. This ensures users understand that no data is lost even in these scenarios.

refs: fluent/fluent-bit#11269

Signed-off-by: jinyong.choi <[email protected]>
@jinyongchoi jinyongchoi force-pushed the fix/11265-in-tail-data-loss branch from dfca97e to c017872 Compare December 23, 2025 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants