-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bitsets with more than 16m values #32
Comments
Yes you're correct about max range https://docs.rs/hi_sparse_bitset/0.6.0/hi_sparse_bitset/config/index.html. Implementing BitBlockYou can add a larger configuration. You probably want Implementing ConfigNow you need to make Config struct with that pub struct _512bit;
impl Config for _512bit {
type Level0BitBlock = block_512;
// Here we use u16, instead of u8, since u8 can not handle 512 element
// You can still use u8 here, if you somehow sure that you'll have <256 blocks in level1.
type Level0BlockIndices = [u16; 512];
type Level1BitBlock = block_512;
// data level can have 512*512 blocks max. We need u32 for that.
// Again, if you can guarantee that you'll have less blocks, then theoretical max,
// you can use smaller type.
type Level1BlockIndices = [u32; 512];
type DataBitBlock = block_512;
/// This is default one, used in all pre-defined configs
/// Have effect only on `Reduce`.
type DefaultCache = FixedCache<32>;
}
// This is optional. For SmallBitSet.
impl SmallConfig for _512bit{
// same type as in Level1BlockIndices
// 12 - is the size of inlined storage for index-pointers
//
// Usually SIMD types have big align. Like 64 bytes here.
// Block roughly looks like:
// SmallBlock{ mask: SIMD, mask_pops: Level1MaskU64Populations, small_indices: Level1SmallBlockIndices }
// Level1SmallBlockIndices + Level1MaskU64Populations should fit 64 bytes,
// otherwise Block instantly become 64bytes more due to align.
type Level1SmallBlockIndices = [u32;12];
// See https://docs.rs/hi_sparse_bitset/0.6.0/hi_sparse_bitset/config/trait.SmallConfig.html#associatedtype.Level1MaskU64Populations
type Level1MaskU64Populations = [u16; 8];
} I wanted to make a macro for that, but decided to not pollute crate namespace with that... P.S. Alternatively, theoretically it is possible to increase levels depth (currently 3 in total). Which would greatly increase max range, but at a price of performance and memory overhead. But configurable depth is not implemented, and I would like not to implement it, at least until Rust will not get better support of meta-programming. |
What do you mean? What is "in-memory indexes"? |
Thanks for the detailed response. I will probably not attempt to implement 512-bit block myself, as I don't use nightly and do not have the time to get familiar with the implementation. I hoped to test
I was referring to a data structure that serves similar purpose to a database index, but lives in memory, and is not persisted, but built from scratch at application startup. For example, assume you have some data:
We have up to 50 million of such entities, and need to be able to query entities restricted by some of these properties - e.g. by start_date being in some range, system_type being SystemType1 or SystemType2, targets containing a particular Target, etc. Any combination of these restrictions can be issued by the frontend, but they're all logically AND-ed. The result set has to be ordered by a display field (a separate index for that purpose) and narrowed down to a page specified by frontend. It's perfectly possible for a query to result in a couple million entities, and the backend needs to be able to order them and return the page consisting of the first 50 or the last 50 or anywhere in the middle. (The first couple of pages will be typically requested, but the API needs to support getting any page.) Modeling this in an SQL database proved very challenging - we tried sqlite and postgres - because a significant percentage of these entities change from day to day based on external data. These daily updates were unacceptably slow because the 30+ indexes for searching and ordering made inserts and updates crawl. Neither rebuilding the database from scratch (with index building delayed until the end) nor updating only the entities that changed from previous day resulted in satisfactory performance. Also, even with all the indexes, the performance of retrieval wasn't exactly stellar, and the more complex queries could take several seconds to complete. We are now prototyping an approach that just keeps everything (data and indexes) in memory, as we are running on servers with a lot of memory for other reasons. If we ignore sorting for a moment, a lookup index is a Here I'd like to replace Sorting and pagination also presents interesting challenges, but this comment has already gotten too long. |
Ok, I think I can do that for you. What is your target platform?
From what you described - looks like you want to I don't understand why you need specifically Also, if your intersection will be HUGE - like hundreds of thousands elements - you can split intersection session with Cursor. And chew intersection in a few attempts. That trick will work even if source bitsets will change between sessions. Otherwise - described task should fit ideally for hi_sparse_bitset. |
Thanks, but please note that I cannot promise that I will actually use it. FWIW we are currently targeting x86-64-v3. Fast fn extract_sorted(&self, order_by: &OrderBy, results: FixedBitset, range: &Range<usize>) -> Vec<Rowid> {
let order_index: &Vec<u32> = &self.order_indexes[order_by]; // maps row ids to their sort order by field
let mut results: Vec<u32> = results.ones().map(|rowid| rowid as u32).collect();
results.par_sort_unstable_by_key(|row| order_index[row]);
results[range]
} Here the sorting steps takes a non-negligible amount of time when the result set is large, and even collecting into
This is especially fast if the result set covers most of the data, and we are looking for a page near the beginning. I might even do one or the other depending on the size of the result set. But this should illustrate why fast Another motivation for fast I'm sure this is not a theoretically optimal design, but the idea is to find out how much can be achieved with a comparatively small amount of maintainable code under the constraints of our setup. Compressed bitmaps definitely look like an attractive solutions, but I think they'll require rethinking of the part of the design that heavily relies on |
It's just that all results are lazy and I still did not implemented efficient You could try it as is, but you'll likely take performance hit if you call I don't understand how your sorted algorithm works, but if your fn extract_sorted(&self, order_by: &OrderBy, results: FixedBitset, range: &Range<usize>) -> Vec<Rowid> {
let order_index: &BitSetSmall = &self.order_indexes[order_by];
let filtered_result = order_index & results;
filtered_result.iter().take(range.end).skip(range.start).collect()
} Maybe there is even faster solution... Remember, you can make your own binary operation.
Ok, I'll make it as |
Ahh nevermid my last suggestion with |
Looks like you need the thing I'm currently working on - HiSparseArray. Its like BitSet but with T on DataLevel, instead of bitblock.
You can mimic that with indirection array(don't know how about memory thou) or HashMap now:
|
Here custom 512bit implementation - hope it'll work fine |
Note that I don't plan to use bitmaps for order index. Order index is pre-created with a more sophisticated variant of: let name_order = (0..data.len()).sorted_by_key(|ind| data[ind].name); So e.g. if you have data |
Hmm.... Yes I see. You'll need efficient Just to note. Due to specific of how fn extract_sorted(
&self,
order_by: &OrderBy,
results: impl BitSetInterface /*consider that we did not collected intersection into BitSet yet, but have something like Reduce<...>*/,
range: &Range<usize>
) -> Vec<Rowid> {
let order_index: (&Vec<u32>, &BitSet) = &self.order_indexes[order_by];
// If your `results` already IS BitSet - skip this and have your initial implementation.
// This bunch of intersections - to minimize resulting BitSet and speed up its construction.
let results = BitSet::from( results & BitSetRange(range.start, range.end) & order_index.1);
order_index.0.iter().filter(|&&rowid| results.contains(rowid)).take(range.end).skip(range.start).collect()
} By the way, how many sets you expect to be intersected, and how many |
In this branch https://github.com/tower120/hi_sparse_bitset/tree/collect implemented // make lazy intersection like this
let intersection = reduce(ops::And, bitsets.iter());
// or like this (slightly faster due to fixed bitsets count)
let intersection = apply(ops::And, &bitset1, &bitset2);
// or like this (same as apply)
let intersection = &bitset1 & &bitset2;
let bitset = BitSet::from(intersection) Now with 512bit config you should have everything you need for your tests. |
Just wanted to drop you a note that I will likely not be using I hope that this issue will still be useful to others who wish to use your library for representing bitsets with more than 2^24 distinct values. I'd like to take some time to answer your questions about my use case, in case they turn out relevant for the design of your
The typical number of intersections is relatively small, no more than 5 or so, with 1-3 being typical. But some of the bitsets being intersected are the results of unions, and the number of those unions can be in the hundreds. Take, for example, the following filter:
This requires intersecting three bitsets (i.e. performing two intersections), but computing the first bitset requires calculating a union of 365 bitsets, assuming every date has some entities with that date. Currently I do that with code that looks like this: fn lookup_ranged(&self, index_name: &str, range: impl RangeBounds<u32>, set: &mut FixedBitSet) {
match &self.lookup_indexes[index_name] {
ByValue::FewValues(index) => {
for (_, rowids) in index.range(range) {
*set |= rowids;
}
}
ByValue::ManyValues(index) => {
set.extend(
index
.range(range)
.flat_map(|(_, rowids)| rowids.as_ref())
.map(|rowid| rowid.as_usize()),
);
}
}
} This approach requires different code paths depending on whether the index has few or many distinct values. Here "few" is less than 100, an arbitrarily chosen limit, as exemplified by properties like "status", or "many" as in the case of "date" (and even more so for something like "datetime", which is not hard to imagine.) With few distinct values I can just keep the rowids in a bitset and use This is the area where I hoped
Worst case would be in the millions, more typical case would be in the thousands. This is because the frontend is typically requesting the first page with 50 or so results. So if the query filter selects e.g. 1% of entities (say 500k out of 50 million), then getting the first 50 means iterating through ~5000 entities by sorted index, in order to get the first 50. If the filters select few entities (less than 65k in the current code), the performance of The worst case by far is having a filter that selects (say) 99% of entities, and then selecting something near the end. This is currently done by going through everything by ordered index, using |
I see - well good luck then. If you'll consider using compressed bitsets in the future - I just would like to mention that I found a solution for configurable-depth hierarchical structure - so in future versions you'll can not only configure block width but level counts as well (from 2 to 4 in current prototype) - which would allow to config range from 4096 to 68_719_476_736. P.S. I wonder that you do not have performance issues with intersection and union on plain bitsets. That issue driven me to hierarchical bitsets in the first place.... |
I think the secret is that the small number of intersections I do just happen to perform fast enough on my data size. And when I need a larger number of unions, I do by going through numbers pre-stored in a vector. It's far from free, but it allows a response of 4-30ms, which is great for my use case (and far far better than what we had using a somewhat simple-minded SQL-based approach). Good luck with the library! |
I'm trying to test hi_sparse_bitset on some custom in-memory indexes.
BitSet
is generic over a configuration type, and the crate offers 64-bit, 128-bit, and 256-bit configurations. If my understanding is correct, the largest one,_256bit
, supports up to 16777216 distinct values. Is that correct?For my use case, I need to support up to 50m entities. Can I add a larger configuration? I looked at the definitions of existing ones, but it was not exactly obvious how to come up with a new one.
The text was updated successfully, but these errors were encountered: