Skip to content

Conversation

anurag-ris90
Copy link

@anurag-ris90 anurag-ris90 commented Sep 29, 2025

MirrorMaker2: Detect log truncation (fail-fast) and auto-recover on topic reset

Summary

This PR extends MirrorMaker 2 to handle two important failure scenarios for cross-cluster replication:

  1. Silent data loss (log truncation)

    • Detect when a source partition's earliest available offset has advanced past the expected offset (messages were truncated due to retention).
    • On detection, MirrorMaker logs a clear error (TRUNCATION-DETECTED) with topic/partition/expected/beginning offsets and fails-fast to surface the issue to operators.
  2. Topic reset (deletion & recreation)

    • Detect topic recreation by observing a backward change in the beginning offset or partition disappearance/reappearance.
    • On detection, MirrorMaker logs (TOPIC-RESET-DETECTED with timestamp and topic) and automatically resubscribes / seeks to the beginning offset and clears offset translation state as needed.

Design rationale

  • Uses consumer.beginningOffsets(...) to compare expected offsets before processing a batch.
  • Minimal, localized changes within connect/mirror classes to avoid broad impact.
  • Maintains clear SLF4J logging markers to help automation scripts identify events.

Test plan

  • Included integration demonstration via docker-compose and run_challenge.sh in the project repo:
    • Normal replication: producer -> MM2 -> DR consumer
    • Truncation simulation: pause MM2, wait for retention.ms to expire, resume -> expect TRUNCATION-DETECTED and failure
    • Topic reset simulation: delete and recreate topic -> expect TOPIC-RESET-DETECTED and auto-resubscribe

Notes

@github-actions github-actions bot added triage PRs from the community streams core Kafka Broker producer consumer tools connect performance kraft mirror-maker-2 dependencies Pull requests that update a dependency file storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature KIP-932 Queues for Kafka build Gradle build or GitHub Actions docker Official Docker image generator RPC and Record code generator transactions Transactions and EOS clients group-coordinator labels Sep 29, 2025
Copy link
Contributor

@LiamClarkeNZ LiamClarkeNZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a massive amount of diffed files in here. Looking at the changes, I think you need to merge in the latest apache:trunk branch as your actual changes are lost in the noise.

Once you've done so, ping me and I'll re-review.

@github-actions github-actions bot removed the triage PRs from the community label Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions clients connect consumer core Kafka Broker dependencies Pull requests that update a dependency file docker Official Docker image generator RPC and Record code generator group-coordinator KIP-932 Queues for Kafka kraft mirror-maker-2 performance producer storage Pull requests that target the storage module streams tiered-storage Related to the Tiered Storage feature tools transactions Transactions and EOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants