Do we have any documentation/pattern/proc_macro already on what's the best way to write simd-specific variants?
What do you mean by “variants”?
you have some function you can reimplement using arch-specific simd (or even assembly directly)
you would like to select at runtime which one to use
The students trying to add neon support in rav1e shown me that what I consider self-explaining and basic is not
There are crates in crates.io for doing that but the std arch docs should cover the rest, otherwise there is also the RFC
@Luca Barbato cfg-if 0.1.10 recently got support so it works inside methods, you can also do runtime detection of course
Do you have other crates in mind?
What you're using SIMD for greatly affects the sort of API you want to build.
I recently released 0.1 of the
wide crate which aims to have an f32x4 type that is as close as possible to being a drop-in replacement for normal f32. It supports all traits and methods that f32 does except for eq/ord. I'm honestly not sure how I'll handle those.
You can just use intrinsics/asm and runtime detection, but runtime detecting has a small cost so usually you need to run your check once, do fair amount of SIMD to make up for the cost of the check (not just 1 add or something), and then have fallbacks too and all that. I wouldn't do that myself. I've always stuck to compile time checks only
Also, if you use arrays of 4 and align it to 16 then _most_ of the work will be done for you by llvm
of course that's with optimizations on, you need to write it by hand if you want debug performance too
unfortunately it's kinda specific to the SIMD set you want. A neon oriented library will probably end up different from an sse based library
In my case I need to boost the dispatch part, not the writing part (we share asm code with another project).
just gotta use std. Unfortunately, detecting it at runtime demands that you interface with the OS to handle all the edge cases properly. you can check the implementation of the
is_x86_feature_detected! macro if you want to get into the details there.
that part is done, I'm thinking on make the whole experience more streamlined
e.g. auto-populate/auto-generate impl blocks and provide facilities to instantiate the right variant to call around the code
since currently the experience isn't better than the C-way of making a struct of fn, populate it and then call from the struct
@Luca Barbato check out the “multiversion” crate
That's quite similar to what I had in mind, I wanted to use attributes and impl blocks, this is probably even better :)
(even if it is a bit overkill in the way it works... )