Hey @Aaron Hilland @Tyson Nottingham, since you've both been doing work in this area I wanted to give you a heads up that I'm experimenting with an alternative implementation of lazy DefPathTable decoding along the lines this description: https://github.com/rust-lang/rust/pull/80177#issuecomment-749484088
I have a promising implementation of a hashtable that can be used directly in its binary format (i.e. without any upfront decoding) and making DefPaths "structured" turned out to be pretty simple. Overall this seems to simplify things quite a bit. The on-disk hashtable adds some complexity -- but it is a generic data structure that might come in handy for other use cases too.
Let me know if you'd like to get a ping when things have progressed far enough for a performance testing PR.
Sure! Sounds neat. :)
I just created a public repo for the hash table implementation at https://github.com/michaelwoerister/odht.
It uses robin hood hashing (i.e. what libstd used before HashBrown) and seems to perform pretty well (although HashBrown can be up to 2x faster)
It should not be too hard to change the underlying implementation in the future, if needed
Nice. I'll try to check it out sometime this week and let you know if I run into any problems.
Great. I'm interested in any kind of feedback, especially concerning the public API.
I just opened https://github.com/rust-lang/rust/pull/81635, which implements the "structured DefPathHash" pre-requisite.
@mw, I haven't read the whole implementation, but I can give a little feedback on the ODHT API. By the way, is your intent for it to be general purpose and possibly release it on crates.io, or for it to be purpose-built for DefPathTable decoding? I might be less inclined to nitpick if it's the latter.
Now to feedback:
Is it possible yet to use const generics to enforce constraint that RawKey and RawValue are byte arrays? If not, is it possible to easily enforce it some other way?
I don't think it's possible for clients of odht to implement ByteArray directly for byte arrays as suggested since both the trait and byte arrays themselves are foreign. You can newtype the byte arrays and impl ByteArray on that I suppose. Requiring ByteArray: Default adds another wrinkle for byte arrays longer than 32 elements. In any case, not a big deal if we just care about the DefPathTable issue. By the way, ByteArray also isn't implementable by clients because it's currently in a private module.
Raw is a frequently overloaded term. Is it worth using something more descriptive, albeit longer? Maybe Encoded?
The documentation with respect to endianness or platform-independence could mention that the ODHT implementation supports being able to encode on, for example, an LE platform and correctly decode on a BE platform, but (I believe) that it depends on the provided hashing and encoding functions working in a platform-independent way to do that. And similarly for moving between 32-bit vs 64-bit platforms.
Overall, seems good!
Awesome, thank you for the feedback, @Tyson Nottingham!
rustc_hashcrate). It's much easier to work on a standalone crate than on something that is part of the compiler (because of build times). It also encourages loose coupling.
smallveccrate had found a workaround (i.e. use fixed size arrays), so I used that. I haven't tried hard to find a different way of enforcing that
RawValueare fixed size arrays.
ByteArray. Rather it's the library's job to provide implementations up to a reasonable size (as it is the case for the
smallveccrate, I think). I wonder if that is too restrictive. We could switch from the
ByteArraytrait to an
Encodedtrait with a single
fn as_le_bytes_with_no_padding(&self) -> &[u8]method (and enforce that the type has an alignment of 1). That would allow for types like
PackedFingerprintto be used directly on little-endian platforms. But I am not sure if that is worth the trouble. LLVM is able to basically optimize away the encoding/decoding step if the byte representation is the same. And I like that fixed size byte arrays make it hard to get the alignment wrong and make it rather explicit what's expected.
Encodedis better than
Raw, especially for the type names in the
RawTableit makes less sense, maybe.
RawTableis called that because it works only with raw bytes and does not know about what types are encoded in them.
min_const_generics are stable on nightly right now so should be possible to use, but maybe not for this use case
Last updated: Oct 21 2021 at 22:01 UTC