btw @Frank McSherry one thing I was wanting to ping you about at some point. I was looking at the "hierarchical bit set" crate hibitset from this "specs" ECS system and it reminded me of datafrog, but using spare bitsets instead of vecs of tuples.
we have unfortunately had to suspend work on polonius/datafrog until we can ship the first version of NLL
but I've been wondering if we'll hit scalability problems — in some cases we have some awfully large bit sets :)
anyway if/when that happens, I was toying with whether we could integrate something like hibitset (or a similar approach) into datafrog...
...one of the key operations in ECS being something like "iterate over all entities that have a foo and bar component", basically a JOIN operation
There may be a connection. The EmptyHeaded folks at Stanford have based a DB on similar algorithmic behavior, but with layout optimizations based on using bitsets for representing dense regions of sparse sets. The hibitset link doesn't have any examples or docs, it seems; do you have another link where you learned more?
tbh I've not looked closely at it. I was talking to @Kyrenite (Twitter handle) at RustConf and they were telling me about it
from what I understood you have a sort of btree of bitsets
and you want to find the intersection quickly
so you keep some "fingers" pointing in to them, and when you find that the "next index" from one of the bitsets is ahead of the others, you can skip whole subtrees
something something waves hands
anyway NLL is looking closer and closer so I hope that we'll be turning back to polonius in a month or two (maybe faster?)
torkleyy is generally active on the specs gitter if you want to ask any questions / informal docs
last update i saw was a comment they made on reddit
Ah, didn't think that anybody would know hibitset :) I don't work on it very often, so docs and readme aren't very shiny, but the crate itself works. You should note that hibitset is not just a usual bitset, but a multi-layer bitset for a sparse data structures. Currently, we're using 4 layers and that's also the unstable bit of the crate; we want to make the number of layers generic, then I'll stabilize that crate.
I've read the comments/docs on hibitset. It looks like a judy array / adaptive radix tree. There are some times when this is good, and some times when it is not great. The HyPer people in Munich have had good luck with ARTs, but you need to engineer them a lot more than it looks like the repo is engineered.
hmm is there no simd in hibitset ? :thinking:
SIMD all the things
I've had such SIMD sorting networks on my todo list of things to try in :frog: — the 40y old sorting vs hashing idea, which might be getting interesting with AVX-512 (for DB people and their u64) but for our u32s AVX-2 could be enough whistles