-
Notifications
You must be signed in to change notification settings - Fork 425
feat: introduce EnterStagingRequest for RegionEngine
#7261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: WenyXu <[email protected]>
7c4ec53 to
4400b66
Compare
4400b66 to
7a3f9f6
Compare
fd2f7be to
c512891
Compare
Signed-off-by: WenyXu <[email protected]>
c512891 to
bb51c1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces the EnterStagingRequest API for the RegionEngine, enabling safe partition rule changes through a carefully orchestrated staging mode transition. The implementation adds a new intermediate EnteringStaging state that stalls writes during the transition, flushes memtables with the old partition expression, creates isolated staging manifests with the new partition expression, and then resumes writes in staging mode.
Key changes include:
- New
EnterStagingRequestAPI andRegionLeaderState::EnteringStagingstate for safe partition expression updates - Modified manifest update signature to use
is_staging: boolparameter instead of region state - Enhanced flush handling with new
FlushReason::EnterStagingvariant andis_stagingparameter propagation
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/store-api/src/region_request.rs |
Adds EnterStagingRequest struct and extends RegionRequest enum with EnterStaging variant |
src/mito2/src/worker/handle_enter_staging.rs |
New file implementing core enter staging logic including request handling, manifest creation, and result processing |
src/mito2/src/worker/handle_write.rs |
Adds write stalling support for EnteringStaging state to queue requests during transition |
src/mito2/src/region.rs |
Adds EnteringStaging state, staging_partition_expr field, and state transition methods |
src/mito2/src/manifest/manager.rs |
Changes update() signature to use is_staging boolean, adds staging manifest tracking and clear_staging_manifests() |
src/mito2/src/manifest/checkpointer.rs |
Simplifies maybe_do_checkpoint() by removing state parameter (staging check moved to caller) |
src/mito2/src/flush.rs |
Adds FlushReason::EnterStaging variant and is_staging field to RegionFlushTask |
src/mito2/src/worker/handle_flush.rs |
Propagates is_staging parameter and updates flush handling to skip metadata updates in staging mode |
src/mito2/src/worker/handle_manifest.rs |
Updates all update_manifest() calls to pass is_staging parameter |
src/mito2/src/worker/handle_alter.rs |
Updates flush task creation to pass is_staging parameter |
src/mito2/src/worker.rs |
Adds EnterStaging request handling and background notification processing |
src/mito2/src/region/opener.rs |
Initializes staging_partition_expr field and refactors manifest options creation |
src/mito2/src/manifest/storage.rs |
Adds manifest_dir() helper function |
src/mito2/src/error.rs |
Adds StagingPartitionExprMismatch error for partition expression validation |
src/mito2/src/engine/staging_test.rs |
Comprehensive tests for enter staging functionality including partition expression validation and write stalling |
src/mito2/src/engine/listener.rs |
Adds NotifyEnterStagingResultListener for test coordination |
src/mito2/src/request.rs |
Adds EnterStagingRequest to worker request types and EnterStagingResult for background notifications |
src/metric-engine/src/engine.rs |
Adds EnterStaging request handling for physical regions |
src/datanode/src/region_server.rs |
Maps EnterStaging request to RegionChange::None |
| Test files (multiple) | Updates existing tests to use new manifest update signature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| e; | ||
| "Failed to clear staging manifest files for region {}", | ||
| region.region_id | ||
| ); |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When clearing staging manifests fails, the function returns early but the region is still in the EnteringStaging state (set on line 104). This leaves the region in an inconsistent state where it's marked as entering staging but the operation failed. The region state should be reverted back to Writable before returning, or this error should trigger a state cleanup.
| ); | |
| ); | |
| // Revert region state to Writable before returning. | |
| region.set_state(RegionLeaderState::Writable); |
|
|
||
| let version = region.version(); | ||
| if !version.memtables.is_empty() { | ||
| // If memtable is empty, we can't enter staging directly and need to flush |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 64 has incorrect logic. It says "If memtable is empty, we can't enter staging directly" but the condition checks !version.memtables.is_empty(), which means it's checking if memtable is NOT empty. The comment should say "If memtable is not empty, we can't enter staging directly and need to flush all memtables first."
| // If memtable is empty, we can't enter staging directly and need to flush | |
| // If memtable is not empty, we can't enter staging directly and need to flush |
| }; | ||
|
|
||
| if let Err(e) = region.switch_state_to_staging(RegionLeaderState::EnteringStaging) { | ||
| error!(e; "Failed to switch region state to entering staging"); |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message on line 211 says "Failed to switch region state to entering staging", but at this point the function is actually switching from EnteringStaging to Staging state. The error message should say "Failed to switch region state to staging" to accurately reflect what's happening.
| error!(e; "Failed to switch region state to entering staging"); | |
| error!(e; "Failed to switch region state to staging"); |
| common_runtime::spawn_global(async move { | ||
| let mut manager = region.manifest_ctx.manifest_manager.write().await; | ||
| if let Err(e) = manager.clear_staging_manifests().await { | ||
| error!(e; "Failed to clear staging manifests after failed to switch region state to entering staging"); |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message says "Failed to clear staging manifests after failed to switch region state to entering staging". This is misleading because the code is called when switching from EnteringStaging to Staging fails, not when entering the EnteringStaging state. It should say "Failed to clear staging manifests after failed to switch region state to staging".
| error!(e; "Failed to clear staging manifests after failed to switch region state to entering staging"); | |
| error!(e; "Failed to clear staging manifests after failed to switch region state to staging"); |
| normal_count_before, 1, | ||
| "Normal manifest directory should initially contain one file" | ||
| // One file for table creation | ||
| // One files for flush operation |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatical error: "One files" should be "One file" (singular form).
| // One files for flush operation | |
| // One file for flush operation |
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
#6558
What's changed and what's your intention?
This PR introduces
EnterStagingRequestas a new region operation that enables safe partition rule changes by transitioning regions into staging mode with isolated manifest management. This allows zero-downtime partition expression updates while maintaining data consistency.The
EnterStagingRequestfollows a carefully designed flow:FlushReason::EnterStagingStagingmodePR Checklist
Please convert it to a draft if some of the following conditions are not met.