Stream: general

Topic: abi of llvm half


gnzlbg (Apr 25 2019 at 09:19, on Zulip):

@rkruppe Do you know what the ABI of LLVM half type is on other language frontends?

gnzlbg (Apr 25 2019 at 09:20, on Zulip):

AFAICT all assume that its layout is the same as i16 and interface by passing i16 through ABIs. The SysV spec does not mention half, so I have no idea whether this is correct.

rkruppe (Apr 25 2019 at 09:21, on Zulip):

Here be dragons

rkruppe (Apr 25 2019 at 09:22, on Zulip):

Because C doesn't have a standard or even nonstandard-yet-widely-adopted half type and many targets (including x86_64) have no special support for it, its ABI is likely a wild west mess.

gnzlbg (Apr 25 2019 at 09:53, on Zulip):

@rkruppe so clang errors when half is used in APIs

gnzlbg (Apr 25 2019 at 09:53, on Zulip):

https://godbolt.org/z/gnPqjw

rkruppe (Apr 25 2019 at 09:53, on Zulip):

that's... more sane that I had feared

rkruppe (Apr 25 2019 at 09:54, on Zulip):

more careful, to be precise

gnzlbg (Apr 25 2019 at 09:54, on Zulip):

it only allows passing them by memory

gnzlbg (Apr 25 2019 at 09:56, on Zulip):

in rust we can use i16 to interface with the LLVM intrinsics just fine

gnzlbg (Apr 25 2019 at 09:57, on Zulip):

appears that i16 and f16 have the same layout

gnzlbg (Apr 25 2019 at 09:59, on Zulip):

for stuff like fadd we do need a bitcast

rkruppe (Apr 25 2019 at 10:01, on Zulip):

right but we could have Rust intrinsics like fadd_f16h(i16, i16) -> i16 that expand to bitcast -> fadd -> bitcast

gnzlbg (Apr 25 2019 at 10:01, on Zulip):

sure, that's what I meant - i was asked to write a proper constructive reply to the RFC

rkruppe (Apr 25 2019 at 10:01, on Zulip):

If we really wanted to avoid introducing a new primitive type for binary16

gnzlbg (Apr 25 2019 at 10:01, on Zulip):

did not want to put anything wrong in there

gnzlbg (Apr 25 2019 at 10:02, on Zulip):

you mentions that implementing the type in a library could be problematic, if it extends to f32, but i don't think that's the only implementation possible

gnzlbg (Apr 25 2019 at 10:02, on Zulip):

like sure, a library could do that, and then it would have the problems you mention, but it could also just implement arithmetic without extending to f32 AFAICT

gnzlbg (Apr 25 2019 at 10:03, on Zulip):

that would most certainly incur a perf cost in most targets

rkruppe (Apr 25 2019 at 10:03, on Zulip):

Yeah that would be even worse for performance on the affected targets (those without native f16 instructions)

gnzlbg (Apr 25 2019 at 10:03, on Zulip):

but then using f32 internally could be done with a fast_math cargo feature

rkruppe (Apr 25 2019 at 10:03, on Zulip):

I don't understand what problem you're trying to solve by that

rkruppe (Apr 25 2019 at 10:04, on Zulip):

Expanding to f32 for arithmetic is perfectly fine

gnzlbg (Apr 25 2019 at 10:04, on Zulip):

I thought IEEE754 required that to be opt-in

rkruppe (Apr 25 2019 at 10:04, on Zulip):

Ok hold on

gnzlbg (Apr 25 2019 at 10:04, on Zulip):

Otherwise the results would be different between a target that does that, and one that does have f16 arithmetic in hardware

gnzlbg (Apr 25 2019 at 10:05, on Zulip):

IIRC that's IEEE754:2008 10.4

gnzlbg (Apr 25 2019 at 10:05, on Zulip):

(in C it would be fine though, because C does not guarantee that a float is really 32-bit wide)

rkruppe (Apr 25 2019 at 10:08, on Zulip):

If you explicitly round back to bfloat16 after every single operation, then some operations are automatically correctly rounded and the rest are off in deterministic ways (independent of optimizer choices). So there's a reasonable level of predictability and reproducibility there, just not across different hardware, but that's already not true currently for f{32,64} for various reasons.

gnzlbg (Apr 25 2019 at 10:10, on Zulip):

I guess my question is whether, e.g., rn(a: f16 + b: f16) produces the same result as rn(a: f16 as f32 + b: f16 as f32) as f16 where a rn happens for the 32-bit addition, and then i suppose anotherone happens in the conversion back to f16

rkruppe (Apr 25 2019 at 10:11, on Zulip):

IIRC for addition that does produce the same result. Multiplication and sqrt and so on I'm not sure about

rkruppe (Apr 25 2019 at 10:12, on Zulip):

Furthermore, if we demand an actual correctly rounded (= software) implementation of everything, then I am really skeptical whether implementing that is economical (i.e., it would make me lean towards not providing the type portably at all). And why should we demand that? It's likely not going to help achieve reproduciblity because on hardware with native support for there are going to be all the same obstacles to reproducibility as we currently have with f32, f64.

gnzlbg (Apr 25 2019 at 10:12, on Zulip):

I'm not suggesting that we should, only that we could provide different implementations

rkruppe (Apr 25 2019 at 10:13, on Zulip):

Probably including not-quite-compliant hardware that you can't paper over except by massive slowdown, as with x87. Only nobody cares about x87 but bfloat16 implementations are hot right now.

gnzlbg (Apr 25 2019 at 10:14, on Zulip):

I think we shouldn't paper over x87, but we shouldn't make it constrain our design space either

gnzlbg (Apr 25 2019 at 10:15, on Zulip):

We can fix the precision of x87 o 24 or 5x mantissa bits by default, and offer a flag to let users do whatever they want, or provide a f80 type that explicitly does that, or...

gnzlbg (Apr 25 2019 at 10:15, on Zulip):

anyhow this is a different topic

gnzlbg (Apr 25 2019 at 10:15, on Zulip):

I think you have answered the question

gnzlbg (Apr 25 2019 at 10:16, on Zulip):

I think that while a library can do the as f32 , if we were to provide intrinsics for f16 or bfloat16, these would deal in i16, and whether things are casted to f32 or not would be up to rustc to decide

gnzlbg (Apr 25 2019 at 10:16, on Zulip):

if we were to do that, I expect the intrinsics that rustc provides to be portable

gnzlbg (Apr 25 2019 at 10:16, on Zulip):

such that a library wouldn't need to do the casts

rkruppe (Apr 25 2019 at 10:18, on Zulip):

It wouldn't need to, but it could (with fp_contract related caveats noted). That was my point: even with only target specific intrinsics in core::arch, a third party crate could provide a decent (if non-optimal) portable implementation of bfloat16

gnzlbg (Apr 25 2019 at 10:21, on Zulip):

To check that I understood. Those fp_contract caveats only would hold if LLVM can't apply them properly to whatever IR we would generate right ?

gnzlbg (Apr 25 2019 at 10:21, on Zulip):

I think you have raised a similar point that I did not fully get. You mention that an advantage of a native type would be "arithmetic operators".

gnzlbg (Apr 25 2019 at 10:22, on Zulip):

How is that different from implementing the std::ops (e.g. where Add::add just forwards to core::intrinsics::fadd_f16 or similar) ?

rkruppe (Apr 25 2019 at 10:24, on Zulip):

To check that I understood. Those fp_contract caveats only would hold if LLVM can't apply them properly to whatever IR we would generate right ?

Right, if we generate IR that works with fadd bfloat16 or whatever without explicitly casting back and forth, then we're giving LLVM backends the option of not rounding after every operation. If we have the rounding steps explicitly in there, either because we used a "pure library implementation" or because rustc intrinsics expand to f16b adds to cast; fadd float; cast, then I don't think they can be eliminated.

rkruppe (Apr 25 2019 at 10:25, on Zulip):

How is that different from implementing the std::ops (e.g. where Add::add just forwards to core::intrinsics::fadd_f16 or similar) ?

You can provide intrinsics and a wrapper type that implements them, but at that point as far as the user is concerned you might as well expose the type instead of the intrinsics and then the details are mostly noise (including the decision of whether it's an actual literal primitive type or a core-provided library type that's in the prelude)

rkruppe (Apr 25 2019 at 10:26, on Zulip):

Well except for literals but ¯\_(ツ)_/¯

gnzlbg (Apr 25 2019 at 10:27, on Zulip):

A couple of things.

So if we were to go the intrinsics way, i'd expect a core::f16 module containing f16 operations, one would be core::f16_as_f32(i16) -> f32, and another would be core::f16::add(i16, i16) -> i16.

I don't expect rustc to expand add to cast, fadd, cast, but to an fadd on f16 (with bitcasts). If that does not work on all targets right now, then the library would need to work around that on that target (rustc could just emit a unimplemented! for those).

gnzlbg (Apr 25 2019 at 10:28, on Zulip):

I think the issues in favor of a native type are literals, and as casts. But C++ does not have these issues because of user-defined literals and casts

gnzlbg (Apr 25 2019 at 10:28, on Zulip):

And many rust libraries would benefit from those

gnzlbg (Apr 25 2019 at 10:29, on Zulip):

A problem with providing f16 native types is that we would need to pick a rust ABI, wouldn't be usable in extern "C", etc. etc.

gnzlbg (Apr 25 2019 at 10:29, on Zulip):

I suppose these problems are minor

rkruppe (Apr 25 2019 at 10:30, on Zulip):

Very. We don't need to pick any rust ABI because rust ABI is unstable, and rejecting it in extern "C" is like two lines of code added to improper_ctypes.

gnzlbg (Apr 25 2019 at 10:30, on Zulip):

but at the end our f16 primitive type implementation wouldn't be much better than that of a library, and would be quite harder to iterate on, since it can't be in a crate

gnzlbg (Apr 25 2019 at 10:30, on Zulip):

(because of a cyclic dep with core at least)

gnzlbg (Apr 25 2019 at 10:30, on Zulip):

although we could hack that

gnzlbg (Apr 25 2019 at 10:31, on Zulip):

I just don't see many advantages either way. We already have a crate for f16, if the issues with casts, perf, literals, are real, we might just want to extend the language to fix those for the half crate, and all other crates that would benefit.

rkruppe (Apr 25 2019 at 10:32, on Zulip):

Oh sure for prototyping it would probably be great to do as much out of tree as possible but there's still the question of what interface we ultimately want to expose to end users and that should IMO be unclouded by development process concerns.

gnzlbg (Apr 25 2019 at 10:32, on Zulip):

As in, adding a new primitive type is a big change, that for f16 and bfloat16 benefits a minority, while user-defined literals would be a more self-contained change, that would benefit everyone

rkruppe (Apr 25 2019 at 10:32, on Zulip):

User-defined literals are a gigantic can of worms and I don't want to touch it

gnzlbg (Apr 25 2019 at 10:32, on Zulip):

I think C++ solves this nicely.

gnzlbg (Apr 25 2019 at 10:33, on Zulip):

trait UDL { const fn from_literal_string(s: &str) -> Self; }

rkruppe (Apr 25 2019 at 10:33, on Zulip):

tiredly gestures at the extensive pre-RFC discussions

gnzlbg (Apr 25 2019 at 10:34, on Zulip):

oh i wasn't aware of those

gnzlbg (Apr 25 2019 at 10:34, on Zulip):

https://internals.rust-lang.org/t/pre-rfc-custom-literals-via-traits/8050 ?

rkruppe (Apr 25 2019 at 10:37, on Zulip):

That is the one I had in mind but I can't swear there isn't more. If there is it's hopefully cross-referenced.

gnzlbg (Apr 25 2019 at 10:39, on Zulip):

ok so I read the RFC and was like, this is what i had in ming, read the comments, and is like udls should solve units of measure :laughter_tears:

Last update: Nov 22 2019 at 00:50UTC