Skip to content

Commit b02d08b

Browse files
committed
feat: late materialization of vectors in filtered vector search
KNN search is performed when a vector index is not present. When a table is partially covered by a vector index, we perform a union of an ANN search over the indexed data, and a KNN search over the unindexed data. If the table is completely unindexed it is just a KNN search on the data. Prior to this commit, when we would execute the KNN portion of a filtered vector search, we would perform a scan of all columns and remove results that did not match the filter. For large vectors, this amounts to a lot of overfetch from storage. When filters are selective, it is more efficient to read the filter column (typically much smaller than the vector), apply the filter, and then select matching vectors by row ID. This patch implements that strategy as well as an adaptive mechanism for deciding when to apply it. There is a new configuration concept in the scanner for specifying the filter selectivity at which it will be cheaper to do a scan. We will compute a target rowcount based on that threshold and scan the filter column for matches. If we encounter more matches than the target, we will give up and switch to a scan.
1 parent 96cfdf2 commit b02d08b

File tree

4 files changed

+710
-94
lines changed

4 files changed

+710
-94
lines changed

rust/lance-datafusion/src/exec.rs

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,21 @@ pub struct ExecutionSummaryCounts {
411411
pub all_counts: HashMap<String, usize>,
412412
}
413413

414+
impl ExecutionSummaryCounts {
415+
/// Create a new ExecutionSummaryCounts with all values initialized to zero
416+
pub fn new() -> Self {
417+
Self::default()
418+
}
419+
420+
/// Create a new ExecutionSummaryCounts with only custom counts
421+
pub fn with_counts(counts: impl IntoIterator<Item = (impl Into<String>, usize)>) -> Self {
422+
Self {
423+
all_counts: counts.into_iter().map(|(k, v)| (k.into(), v)).collect(),
424+
..Default::default()
425+
}
426+
}
427+
}
428+
414429
fn visit_node(node: &dyn ExecutionPlan, counts: &mut ExecutionSummaryCounts) {
415430
if let Some(metrics) = node.metrics() {
416431
for (metric_name, count) in metrics.iter_counts() {

rust/lance-tools/src/meta.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ impl LanceToolFileMetadata {
4747
.open_file(&path, &CachedFileSize::unknown())
4848
.await?;
4949
let file_metadata = FileReader::read_all_metadata(&file_scheduler).await?;
50-
let lance_tool_file_metadata = LanceToolFileMetadata { file_metadata };
50+
let lance_tool_file_metadata = Self { file_metadata };
5151
Ok(lance_tool_file_metadata)
5252
}
5353
}

rust/lance/src/dataset.rs

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8873,12 +8873,7 @@ mod tests {
88738873
}
88748874

88758875
fn make_tx(read_version: u64) -> Transaction {
8876-
Transaction::new(
8877-
read_version,
8878-
Operation::Append { fragments: vec![] },
8879-
None,
8880-
None,
8881-
)
8876+
Transaction::new(read_version, Operation::Append { fragments: vec![] }, None)
88828877
}
88838878

88848879
async fn delete_external_tx_file(ds: &Dataset) {
@@ -8939,7 +8934,6 @@ mod tests {
89398934
ds.load_indices().await.unwrap().as_ref().clone(),
89408935
&tx_file,
89418936
&ManifestWriteConfig::default(),
8942-
None,
89438937
)
89448938
.unwrap();
89458939
let location = write_manifest_file(

0 commit comments

Comments
 (0)