Stream: t-libs/stdarch

Topic: simd-variants for impls

Luca Barbato (Oct 01 2019 at 09:07, on Zulip):

Do we have any documentation/pattern/proc_macro already on what's the best way to write simd-specific variants?

gnzlbg (Oct 01 2019 at 09:23, on Zulip):

What do you mean by “variants”?

Luca Barbato (Oct 01 2019 at 09:38, on Zulip):

you have some function you can reimplement using arch-specific simd (or even assembly directly)

Luca Barbato (Oct 01 2019 at 09:38, on Zulip):

you would like to select at runtime which one to use

Luca Barbato (Oct 01 2019 at 09:42, on Zulip):

The students trying to add neon support in rav1e shown me that what I consider self-explaining and basic is not

gnzlbg (Oct 01 2019 at 13:54, on Zulip):

There are crates in for doing that but the std arch docs should cover the rest, otherwise there is also the RFC

Lokathor (Oct 01 2019 at 16:05, on Zulip):

@Luca Barbato cfg-if 0.1.10 recently got support so it works inside methods, you can also do runtime detection of course

Luca Barbato (Oct 01 2019 at 16:37, on Zulip):

Do you have other crates in mind?

Lokathor (Oct 01 2019 at 21:12, on Zulip):

What you're using SIMD for greatly affects the sort of API you want to build.

I recently released 0.1 of the wide crate which aims to have an f32x4 type that is as close as possible to being a drop-in replacement for normal f32. It supports all traits and methods that f32 does except for eq/ord. I'm honestly not sure how I'll handle those.

You can just use intrinsics/asm and runtime detection, but runtime detecting has a small cost so usually you need to run your check once, do fair amount of SIMD to make up for the cost of the check (not just 1 add or something), and then have fallbacks too and all that. I wouldn't do that myself. I've always stuck to compile time checks only

Lokathor (Oct 01 2019 at 21:12, on Zulip):

Also, if you use arrays of 4 and align it to 16 then _most_ of the work will be done for you by llvm

Lokathor (Oct 01 2019 at 21:13, on Zulip):

of course that's with optimizations on, you need to write it by hand if you want debug performance too

Lokathor (Oct 01 2019 at 21:14, on Zulip):

unfortunately it's kinda specific to the SIMD set you want. A neon oriented library will probably end up different from an sse based library

Luca Barbato (Oct 02 2019 at 10:15, on Zulip):

In my case I need to boost the dispatch part, not the writing part (we share asm code with another project).

Lokathor (Oct 04 2019 at 01:58, on Zulip):

just gotta use std. Unfortunately, detecting it at runtime demands that you interface with the OS to handle all the edge cases properly. you can check the implementation of the is_x86_feature_detected! macro if you want to get into the details there.

Luca Barbato (Oct 06 2019 at 13:38, on Zulip):

that part is done, I'm thinking on make the whole experience more streamlined

Luca Barbato (Oct 06 2019 at 13:40, on Zulip):

e.g. auto-populate/auto-generate impl blocks and provide facilities to instantiate the right variant to call around the code

Luca Barbato (Oct 06 2019 at 13:41, on Zulip):

since currently the experience isn't better than the C-way of making a struct of fn, populate it and then call from the struct

gnzlbg (Oct 10 2019 at 05:54, on Zulip):

@Luca Barbato check out the “multiversion” crate

Luca Barbato (Oct 10 2019 at 06:56, on Zulip):

That's quite similar to what I had in mind, I wanted to use attributes and impl blocks, this is probably even better :)

Luca Barbato (Oct 10 2019 at 07:02, on Zulip):

(even if it is a bit overkill in the way it works... )

Last update: Jul 02 2020 at 11:35UTC