Replies: 2 comments 19 replies
-
Thank you for your proposal! Here are some comments after a brief review.
I think we should also store the max number of buckets of each filter. i.e. how many buckets is stored in one filter? Also IMHO
Before "expansion" (BTW I prefer to use "rehashing" rather than "expansion")? Or before "giving up"?
All filters should be useful. If we only care about the last filter, I think we should delete all other filters. Here we should locate TWO filters (maybe one, if the two buckets are in the same filter) via its two buckets (located by its two hash values). |
Beta Was this translation helpful? Give feedback.
-
I don't think it is a good idea of one-to-one mapping between filters and count of rehashing. Before any rehashing, we can also have multiple filters. And after one rehashing, we can increase numbers of filters from e.g. 2 to something like 4. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1. Introduction
In #2534, we plan to support the cuckoo filter.
This document outlines the design for implementing a Cuckoo Filter in kvrocks. The Cuckoo Filter is a high-performance, probabilistic data structure used for approximate set membership testing. It allows for fast checks to see if an item is in a set and, unlike Bloom Filters, supports item deletion.
A key requirement for this implementation is to support the counting of duplicate items (via the CF.COUNT command). To robustly handle scenarios where a single item may be inserted many more times than the capacity of its two candidate buckets, we have chosen a sharded (or chained) expansion strategy. This approach ensures functional correctness and system stability under all conditions, accepting a trade-off in query performance degradation in favor of feature completeness and robust handling of high-frequency items.
2. Core Concepts
This evicted item is then re-inserted into its alternate location, which may cause a cascade of evictions. This process continues until all items find a home or a maximum number of evictions is reached.
3. Data Structure Design
To efficiently implement the Cuckoo Filter in kvrocks, we will adopt a "Bucket as Key" storage model, combined with a sharded (or chained) expansion strategy. This model maps each bucket of each sub-filter to an individual key-value pair in RocksDB, ensuring high I/O performance and cache efficiency.
3.1. Metadata
The metadata stores the overall configuration for the Cuckoo Filter chain. Each Cuckoo Filter instance corresponds to a single metadata key-value pair in RocksDB.
CuckooFilterMetadata Structure
3.2. Bucket Data
Each bucket within each sub-filter is stored as a separate key-value pair. This granular approach is the key to performance.
Key Format: cf:{namespace_key}:{filter_index}:{bucket_index}
Value Format:
3.3. Storage Layout in RocksDB
The following diagram illustrates the storage layout for a Cuckoo Filter that has expanded once (n_filters = 2). It details both the Metadata entry and the structure of the individual "Bucket as Key" entries.
4. Operational Flow
4.1. Add Operation (CF.ADD)
4.1.1. Internal Expansion Flow
4.3. Exists Operation (CF.EXISTS)
True
immediately and terminate the operation.4.4. Count Operation (CF.COUNT)
This operation is the primary reason for choosing the chained expansion strategy, as it correctly handles high-frequency duplicate items.
4.5. Delete Operation (CF.DELETE)
The delete operation is similar to EXISTS but only needs to find and remove the first occurrence of the item.
True
immediately, and terminate the operation.5. Disadvantages and Trade-offs
This design is a pragmatic compromise, and it's important to understand its inherent limitations.
Query Performance Degradation: The most significant trade-off is that the performance of read operations (EXISTS, COUNT, DELETE) degrades linearly as the number of sub-filters (n_filters) increases. Each expansion adds another set of lookups to every query.
Low Utilization of Older Sub-Filters: The "write-to-tail" strategy means that once a sub-filter triggers an expansion, it becomes "frozen." It will never receive new insertions, even if space is freed up by DELETE operations. This can lead to poor overall space utilization, as older sub-filters may become sparse over time while new ones are created.
Hotspot-Driven Expansion: As identified, frequent insertion of a single, high-cardinality item can rapidly trigger multiple expansions. This creates numerous sub-filters that are largely dedicated to holding this single item, further exacerbating the space utilization problem. To mitigate this, it is highly recommended to use a larger
bucket_size
(e.g., 4 or ) instead of the default of 2 if this usage pattern is anticipated. A larger bucket size increases the capacity for duplicate items within a single sub-filter, delaying the need for expansion.6. Concurrency Safety
Atomic Writes: All write operations must be atomic to ensure data consistency. This is achieved by using RocksDB's WriteBatch.
Locking: To prevent race conditions during a read-modify-write cycle, a lock should be acquired at the application level for the user key. This is especially critical during expansion to ensure that concurrent operations do not interfere with each other.
7. Reference
Beta Was this translation helpful? Give feedback.
All reactions