-
Notifications
You must be signed in to change notification settings - Fork 473
Persist: Add new API for run based spine replacement #33044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new API for run-based spine replacement in the persist client, extending the existing merge resolution functionality to support fine-grained replacement of specific runs within hollow batches rather than just entire batches.
- Adds
RunLocation
struct to identify specific runs within hollow batches usingSpineId
andRunId
pairs - Introduces a new
apply_merge_res_checked
method that can replace individual runs within batches - Extends
FueledMergeRes
withinputs
andnew_active_compaction
fields to support the new replacement logic
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
src/persist-client/src/internal/trace.rs | Core implementation of run-based spine replacement with new data structures and methods |
src/persist-client/src/internal/state_serde.json | Test data updates reflecting the new run-based structure |
src/persist-client/src/internal/state_diff.rs | Updates to populate new fields in FueledMergeRes |
src/persist-client/src/internal/state.rs | Test utilities and method renaming for compatibility |
src/persist-client/src/internal/machine.rs | Updates to construct FueledMergeRes with new fields |
src/persist-client/src/internal/compact.rs | Updates to construct FueledMergeRes with new fields |
src/persist-client/src/cli/admin.rs | Updates to construct FueledMergeRes with new fields |
f035a90
to
23e21c8
Compare
23e21c8
to
f497c4a
Compare
f497c4a
to
ae7312a
Compare
ae7312a
to
2bb9775
Compare
2bb9775
to
a15c651
Compare
a15c651
to
85910d2
Compare
85910d2
to
0274b1f
Compare
0274b1f
to
ee85994
Compare
ee85994
to
3c38325
Compare
3c38325
to
a97ff26
Compare
a97ff26
to
a946d3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good changes! If nightlies are happy then so am I.
This PR could use a description also.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] | ||
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub enum CompactionInput { | ||
/// We don't know what our inputs where, this should only be used for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// We don't know what our inputs where, this should only be used for | |
/// We don't know what our inputs were; this should only be used for |
Legacy, | ||
/// This compaction output is a total replacement for all batches in this id range. | ||
#[allow(dead_code)] | ||
IdRange(BTreeSet<SpineId>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be a SpineId
instead of a set, since a single id is already a range. (Optional, but would save you some validation below.)
let old_batch_diff_sum = Self::diffs_sum::<D>(batch.parts.iter(), metrics); | ||
let old_diffs_sum = Self::diffs_sum_for_runs::<D>(batch, &run_ids, metrics); | ||
|
||
if let (Some(old_diffs_sum), Some(new_diffs_sum)) = (old_diffs_sum, new_diffs_sum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also assert that the old_diffs_sum
is None
when the new_diffs_some
is None
... for new diffs None
should always mean 0, and it would be bad if we let that replace data with nonzero sum!
let Some((i, batch)) = part else { | ||
return ApplyMergeResult::NotAppliedNoMatch; | ||
}; | ||
let replacement_range = i..i + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let replacement_range = i..i + 1; | |
let replacement_range = i..(i + 1); |
} | ||
|
||
let parts = &self.parts[replacement_range.clone()]; | ||
let id = SpineId(parts.first().unwrap().id.0, parts.last().unwrap().id.1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is oddly baroque... we're only targeting a single batch. Why a single-element range that we then unwrap?
.prop_filter("non-empty batch", |(batch, _)| batch.run_meta.len() >= 1) | ||
.prop_flat_map(|(batch, to_replace)| { | ||
let batch_len = batch.run_meta.len(); | ||
(0..batch_len).prop_flat_map(move |start| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd extract this gen out to a let
above so you get proper formatting.
(1..=max_len).prop_map(move |len| { | ||
let range = start..(start + len); | ||
(batch_clone.clone(), to_replace_clone.clone(), range) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test still bakes in the "must replace a contiguous range of runs from the original batch" idea that the function under test no longer requires. Might as well choose a random subset instead and get better coverage?
Looks like you need to opt out the new test from I find the scalability benchmark result sort of implausible, given the limited stuff you've changed... maybe spurious? |
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.