-
Notifications
You must be signed in to change notification settings - Fork 425
feat: gc worker metasrv scheduler #6985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9f76fb4 to
6bfa276
Compare
3b5e42f to
6bbe601
Compare
4ac8dda to
3fa76b6
Compare
92dc253 to
ed00d04
Compare
162f2ed to
6a80190
Compare
3f0b058 to
140cc73
Compare
MichaelScofield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive garbage collection (GC) system for managing file lifecycle in the mito2 storage engine. The changes introduce metasrv-coordinated GC scheduling, file removal rate tracking, and improved manifest management for tracking deleted files.
Key changes:
- Added GC scheduler on metasrv to coordinate garbage collection across datanodes
- Implemented file removal rate tracking to monitor GC pressure and prioritize regions
- Refactored manifest management to use reference-based statistics and removed hardcoded file retention policies
- Changed
lingering_timefrom required to optional, allowing immediate file deletion when set to None
Reviewed Changes
Copilot reviewed 44 out of 45 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/store-api/src/region_engine.rs | Added file_removal_rate field to RegionManifestInfo and derived Hash for RegionRole |
| src/mito2/src/region.rs | Added file_removal_rate to ManifestStats and exposed it in region statistics |
| src/mito2/src/manifest/manager.rs | Refactored to use ManifestStats reference instead of individual atomic fields |
| src/mito2/src/manifest/action.rs | Added file removal rate calculation and removed TTL-based file eviction logic |
| src/mito2/src/gc.rs | Refactored GC worker to accept region references and made lingering_time optional |
| src/mito2/src/sst/file_purger.rs | Added gc_enabled parameter to determine purger type selection |
| src/meta-srv/src/gc/*.rs | New GC scheduling infrastructure including scheduler, tracker, candidate selection, and mailbox communication |
| src/datanode/src/heartbeat/handler/gc_worker.rs | Updated to handle GC instructions with improved validation |
| tests-integration/tests/http.rs | Reduced default lingering times for faster testing |
| src/mito2/src/gc/worker_test.rs | New integration tests for GC worker functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/mito2/src/manifest/action.rs
Outdated
| fn file_removed_cnt_after(&self, t_ms: i64) -> (u64, Option<i64>) { | ||
| let mut cnt = 0; | ||
| let mut min_ts_after: Option<i64> = None; | ||
| for record in &self.removed_files { | ||
| if record.removed_at >= t_ms { | ||
| cnt += record.file_ids.len(); | ||
| } | ||
| min_ts_after = match min_ts_after { | ||
| Some(ts) => Some(ts.min(record.removed_at)), | ||
| None => Some(record.removed_at), | ||
| }; | ||
| } | ||
| (cnt as u64, min_ts_after) | ||
| } |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The min_ts_after calculation is incorrect. It computes the minimum timestamp across ALL removed files, not just those after t_ms. This should only track the minimum timestamp of files matching the condition removed_at >= t_ms. The current logic will return timestamps from before the threshold, leading to incorrect file removal rate calculations.
src/mito2/src/manifest/action.rs
Outdated
| /// Count the number of files removed after the given timestamp. Also return the minimum | ||
| /// timestamp of all removed files. |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says 'minimum timestamp of all removed files' but based on the function name and first part of the comment, it should say 'minimum timestamp of removed files after the given timestamp' to match the intended behavior.
| // expect long running queries to be finished(or at least be able to notify it's using a deleted file) within a reasonable time | ||
| lingering_time: Some(Duration::from_secs(60)), |
Copilot
AI
Nov 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'it's' to 'they're' for grammatical correctness. Should be 'they're using a deleted file' since 'queries' is plural.
966b007 to
db5dcc8
Compare
| file_removal_rate, .. | ||
| } => *file_removal_rate as f64 * self.config.file_removal_rate_weight, | ||
| // Metric engine doesn't have file_removal_rate, also this should be unreachable since metrics engine doesn't support gc | ||
| RegionManifestInfo::Metric { .. } => 0.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to panic?
| } | ||
|
|
||
| impl GcScheduler { | ||
| /// Calculate GC priority score for a region based on various metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to doc the alogorithm.
| /// The mailbox to send messages. | ||
| pub(crate) mailbox: MailboxRef, | ||
| /// The server address. | ||
| pub(crate) server_addr: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's server address?
| let dn_stats = self.meta_peer_client.get_all_dn_stat_kvs().await?; | ||
| let mut table_to_region_stats: HashMap<TableId, Vec<RegionStat>> = HashMap::new(); | ||
| for (_dn_id, stats) in dn_stats { | ||
| let mut stats = stats.stats; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren’t the stats already sorted by timestamp? I’m not sure, but it seems like they should be.
| use crate::service::mailbox::{Channel, MailboxRef}; | ||
|
|
||
| #[async_trait::async_trait] | ||
| pub(crate) trait SchedulerCtx: Send + Sync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: love to doc the trait and functions
| fn default() -> Self { | ||
| Self { | ||
| enable: false, | ||
| max_concurrent_tables: 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency seems high; let's be conservative in the initial implementation.
| impl GcSchedulerOptions { | ||
| /// Validates the configuration options. | ||
| pub fn validate(&self) -> Result<()> { | ||
| if self.max_concurrent_tables == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing all these validations with ensure! looks better.
| while let Some(event) = self.receiver.recv().await { | ||
| match event { | ||
| Event::Tick => { | ||
| info!("Received gc tick"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoying log
| info!("Received gc tick"); |
| // 6 hours, for unknown expel time, which is when this file get removed from manifest, it should rarely happen, can keep it longer | ||
| unknown_file_lingering_time: Duration::from_secs(60 * 60 * 6), | ||
| // expect long running queries to be finished(or at least be able to notify it's using a deleted file) within a reasonable time | ||
| lingering_time: Some(Duration::from_secs(60)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change lingering time smaller?
| "Successfully deleted {} unused files for region {}", | ||
| unused_len, region_id | ||
| ); | ||
| // TODO(discord9): update region manifest about deleted files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The todo is outdated.
|
I’m wondering whether running the GC worker concurrently with migration or repartition tasks could lead to any concurrency issues. This is something I’m particularly concerned about. @waynexia @WenyXu @fengjiachun I’ve noticed that the current logic relies on certain assumptions — for example, that a region resides on the local datanode. However, these assumptions might not always hold true. It’s possible that the conditions are valid at the time of verification but become invalid during execution, leading to constraint violations. The core question is: if these assumptions or checks are invalidated during execution due to ongoing migration or repartition operations, what would be the resulting behavior? If such consistency cannot be guaranteed, then it would be safer to execute these processes sequentially rather than in parallel. |
a4c1e27 to
25fb1c3
Compare
25fb1c3 to
7bd7a3c
Compare
72c85c2 to
4324eb9
Compare
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]> feat: gc scheduler wip: gc trigger Signed-off-by: discord9 <[email protected]> feat: dn file removal rate Signed-off-by: discord9 <[email protected]> feat: trigger gc with stats(WIP) Signed-off-by: discord9 <[email protected]> chore Signed-off-by: discord9 <[email protected]> also move files ref manifest to store-api Signed-off-by: discord9 <[email protected]> feat: basic gc trigger impl Signed-off-by: discord9 <[email protected]> wip: handle file ref change Signed-off-by: discord9 <[email protected]> refactor: use region ids Signed-off-by: discord9 <[email protected]> fix: retry using related regions Signed-off-by: discord9 <[email protected]> chore: rm unused Signed-off-by: discord9 <[email protected]> fix: update file reference type in GC worker Signed-off-by: discord9 <[email protected]> feat: dn gc limiter Signed-off-by: discord9 <[email protected]> rename Signed-off-by: discord9 <[email protected]> feat: gc scheduler retry with outdated regions Signed-off-by: discord9 <[email protected]> feat: use real object store purger Signed-off-by: discord9 <[email protected]> wip: add to metasrv Signed-off-by: discord9 <[email protected]> feat: add to metasrv Signed-off-by: discord9 <[email protected]> feat: datanode gc worker handler Signed-off-by: discord9 <[email protected]> fix: no partition col fix Signed-off-by: discord9 <[email protected]> fix: RegionId json deser workaround Signed-off-by: discord9 <[email protected]> fix: find access layer Signed-off-by: discord9 <[email protected]> fix: on host dn Signed-off-by: discord9 <[email protected]> fix: stat dedup Signed-off-by: discord9 <[email protected]> refactor: rm load-based Signed-off-by: discord9 <[email protected]> chore: aft rebase fix Signed-off-by: discord9 <[email protected]> feat: not full scan Signed-off-by: discord9 <[email protected]> chore: after rebase fix Signed-off-by: discord9 <[email protected]> feat: clean tracker Signed-off-by: discord9 <[email protected]> after rebase fix Signed-off-by: discord9 <[email protected]> clippy Signed-off-by: discord9 <[email protected]> refactor: split gc scheduler Signed-off-by: discord9 <[email protected]> feat: smaller linger time Signed-off-by: discord9 <[email protected]> feat: parallel region gc instr Signed-off-by: discord9 <[email protected]> chore: rename Signed-off-by: discord9 <[email protected]> chore: rename Signed-off-by: discord9 <[email protected]> enable is false Signed-off-by: discord9 <[email protected]> feat: update removed files precisely Signed-off-by: discord9 <[email protected]> all default to false&use local file purger Signed-off-by: discord9 <[email protected]> feat: not evict if gc enabled Signed-off-by: discord9 <[email protected]> per review Signed-off-by: discord9 <[email protected]> fix: pass gc config in mito&test: after truncate gc Signed-off-by: discord9 <[email protected]> WIP: one more test Signed-off-by: discord9 <[email protected]> test: basic compact Signed-off-by: discord9 <[email protected]> test: compact with ref Signed-off-by: discord9 <[email protected]> refactor: for easier mock Signed-off-by: discord9 <[email protected]> docs: explain race condition Signed-off-by: discord9 <[email protected]> feat: gc region procedure Signed-off-by: discord9 <[email protected]> refactor: ctx send gc/ref instr with procedure Signed-off-by: discord9 <[email protected]> fix: config deser to default Signed-off-by: discord9 <[email protected]> refactor: gc report Signed-off-by: discord9 <[email protected]> wip: async index file rm Signed-off-by: discord9 <[email protected]> fixme? Signed-off-by: discord9 <[email protected]> typo Signed-off-by: discord9 <[email protected]> more ut Signed-off-by: discord9 <[email protected]> test: more mock test Signed-off-by: discord9 <[email protected]> more Signed-off-by: discord9 <[email protected]> refactor: split mock test Signed-off-by: discord9 <[email protected]> clippy Signed-off-by: discord9 <[email protected]> refactor: rm stuff Signed-off-by: discord9 <[email protected]> test: mock add gc report per region Signed-off-by: discord9 <[email protected]> fix: stricter table failure condition Signed-off-by: discord9 <[email protected]> sutff Signed-off-by: discord9 <[email protected]> feat: can do different table gc same time&more todos Signed-off-by: discord9 <[email protected]> after rebase check Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
Signed-off-by: discord9 <[email protected]>
e9db6c6 to
e7fd871
Compare
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
gc worker metasrv trigger, which trigger gc worker on datanode depending on
file_removal_rateandsst_numTODO:
still need to change the way datanode updatedone, but maybe also need to persistent actually delete files inremoved_filesto reduce the number of delete operation it send to object storeRegionEditTODO:
PR Checklist
Please convert it to a draft if some of the following conditions are not met.