Commit b02d08b
committed
feat: late materialization of vectors in filtered vector search
KNN search is performed when a vector index is not present. When a table
is partially covered by a vector index, we perform a union of an ANN
search over the indexed data, and a KNN search over the unindexed data.
If the table is completely unindexed it is just a KNN search on the
data.
Prior to this commit, when we would execute the KNN portion of a
filtered vector search, we would perform a scan of all columns and
remove results that did not match the filter. For large vectors, this
amounts to a lot of overfetch from storage.
When filters are selective, it is more efficient to read the filter
column (typically much smaller than the vector), apply the filter, and
then select matching vectors by row ID.
This patch implements that strategy as well as an adaptive mechanism for
deciding when to apply it. There is a new configuration concept in the
scanner for specifying the filter selectivity at which it will be
cheaper to do a scan. We will compute a target rowcount based on that
threshold and scan the filter column for matches. If we encounter more
matches than the target, we will give up and switch to a scan.1 parent 96cfdf2 commit b02d08b
File tree
4 files changed
+710
-94
lines changed- rust
- lance-datafusion/src
- lance-tools/src
- lance/src
- dataset
4 files changed
+710
-94
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
414 | 429 | | |
415 | 430 | | |
416 | 431 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8873 | 8873 | | |
8874 | 8874 | | |
8875 | 8875 | | |
8876 | | - | |
8877 | | - | |
8878 | | - | |
8879 | | - | |
8880 | | - | |
8881 | | - | |
| 8876 | + | |
8882 | 8877 | | |
8883 | 8878 | | |
8884 | 8879 | | |
| |||
8939 | 8934 | | |
8940 | 8935 | | |
8941 | 8936 | | |
8942 | | - | |
8943 | 8937 | | |
8944 | 8938 | | |
8945 | 8939 | | |
| |||
0 commit comments