Stream: t-compiler

Topic: fast-math #21690


pnkfelix (Oct 01 2019 at 12:55, on Zulip):

Oh, also, since I don't see it linked anywhere in this topic thus far: #21690

centril (Oct 01 2019 at 12:55, on Zulip):

Sometimes the spec needs to change (if that is even possible), but such a change should be intentional and go through the regular channels (i.e. FCP / RFC / ...).

pnkfelix (Oct 01 2019 at 12:55, on Zulip):

and in fact maybe I'll add it to the topic

rkruppe (Oct 01 2019 at 12:56, on Zulip):

I agree re: "sometimes the spec is wrong, sometimes the impl is wrong, sometimes you could argue either". I brought up the ambiguity of the spec under a very strict reading as more ammo for the arguing :)

rkruppe (Oct 01 2019 at 13:06, on Zulip):

Speaking of, it's kind of funny to me that https://github.com/rust-lang-nursery/reference/pull/607 is not just the only explicit definition of any fp operation's semantics but probably the only one which would explicitly contradict -ffast-math

centril (Oct 01 2019 at 13:09, on Zulip):

The step from "use this newtype" to "write ALL your math out as intrinsics" seems too steep to be acceptable, so I fear people will either fall back to no FMFs at all (bad since they won't get C-competitive performance) or use the blunt "fast-math" hammer and try to work around the issues it causes as they go (bad since they jeopardize their program's reliability).

It seems to me a small minority of will want to use a lot of tweaks a lot of the time. If you just do it sometimes then using some intrinsics doesn't feel very onerous. Also, you can still implement your own Add and such impls which should make it even less onerous. Maybe user defined literals (which folks want for different things anyways, e.g. units and such) and some enhanced inference would make it feel even more first-class.

On the flip-side, if you provide too many of these baked into the language I think that is contributing to decision paralysis.

rkruppe (Oct 01 2019 at 13:30, on Zulip):

I don't know how small a minority it actually would be if we provided comprehensible and convenient knobs. Contraction is widely desired (to be able to use FMA instructions), and trading accuracy of built-in functions for performance is super common in some domains such as graphics, while algebraic rewrites are more unconstrained and scary (e.g., more likely to cause numeric accuracy problems in practice) and mostly relevant for automatic vectorization. I think that -ffast-math in C is mostly a binary matter is also partially due to history and bad flag naming.

Decision paralysis is a real risk, but can be mitigated by presentation. I mentioned shortcuts for common options earlier, and the docs could put them front-and-center (with e.g. one sentence pointing out that more control is available and link to a document with the details). Besides, solving decision paralysis by making most options so unattractive that people never consider them isn't exactly a great solution ;-)

centril (Oct 01 2019 at 13:33, on Zulip):

Besides, solving decision paralysis by making most options so unattractive that people never consider them isn't exactly a great solution ;-)

Granted but I think "so unattractive" is rather debatable ^^ -- a newtype with a short, say 2-letter name for the constructor + Add&-friends impls + maybe user defined literals feels attractive-ish

rkruppe (Oct 01 2019 at 13:36, on Zulip):

I refrained from writing a whole response to that suggestion but in brief this is a significant amount of boilerplate for any one library to hand-roll. And if it gets centralized into a shared library, more generality and thus more boilerplate is needed. (And none of this does anything to address the other problems with newtypes, which you didn't claim but let's not forget that it's still controversial whether newtypes are a decent option no matter who writes them.)

Josh Triplett (Oct 01 2019 at 13:49, on Zulip):

I think we should have a couple of types, one to opt into optimizations that increase precision or are similarly safe, and one to carefully opt into flexible precision. The precise semantics of the former should be selectable via global flags; the latter should be controlled by global flags but limitable locally. Personal opinion.

Josh Triplett (Oct 01 2019 at 13:51, on Zulip):

I would call the former types f32 and f64 (and eventually f16b), and I don't know what I'd call the latter types. I absolutely understand the argument for calling the former types r32 and r64 instead though.

gnzlbg (Oct 01 2019 at 13:56, on Zulip):

You can make correct unsafe code exhibit unsafe behavior by changing any defined feature in the language.

Which is why we don’t do that?

gnzlbg (Oct 01 2019 at 14:05, on Zulip):

I think we should have a couple of types, one to opt into optimizations that increase precision or are similarly safe, and one to carefully opt into flexible precision. The precise semantics of the former should be selectable via global flags; the latter should be controlled by global flags but limitable locally. Personal opinion.

The problem is specifying which optimizations each of the types and flags allows/disallows and how to satisfy the granularity requirement of users without breaking existing code as well as how to compose these flags with existing code.

Those global flags can cause UB on any of your dependencies, add another couple of rows to the test matrixes that numeric crates need to cover, provide no barrier of defense for crates that are known not to work with those flags enabled, as-hoc cfg-like fixes to detect the flags and eg compile_error on them cause ecosystem splits, etc.

gnzlbg (Oct 01 2019 at 14:08, on Zulip):

IMO the assumption that the caller or the binary builder can correctly choose which optimizations are sound for all code in a binary is flawed

gnzlbg (Oct 01 2019 at 14:08, on Zulip):

As flawed as offering a global option to make all pointers “noalias” or similar

Josh Triplett (Oct 01 2019 at 14:12, on Zulip):

You can make correct unsafe code exhibit unsafe behavior by changing any defined feature in the language.

Which is why we don’t do that?

See above. Sometimes the spec is wrong and the implementation is correct.

Josh Triplett (Oct 01 2019 at 14:17, on Zulip):

how to satisfy the granularity requirement of users

I would argue that to a first approximation, users don't have that fine of granularity requirements. "Exactly IEEE 754" and "feel free to make it more precise, just never less" are two obvious convergence points.

As I said, it's debatable which of those two types should be called f32/f64.

provide no barrier of defense for crates that are known not to work with those flags enabled, as-hoc cfg-like fixes to detect the flags and eg compile_error on them cause ecosystem splits, etc.

There would of course be a well-supported mechanism: use a type that guarantees IEEE 754.

centril (Oct 01 2019 at 14:24, on Zulip):

I would argue that to a first approximation, users don't have that fine of granularity requirements. "Exactly IEEE 754" and "feel free to make it more precise, just never less" are two obvious convergence points.

@Josh Triplett I think that is a balance well struck :+1:

centril (Oct 01 2019 at 14:24, on Zulip):

(I just disagree on f32 tho as you know ^^)

Josh Triplett (Oct 01 2019 at 14:25, on Zulip):

I'm aware. But I also think it's useful to get down to the smallest and most precise (heh) point of disagreement, rather than leaving that point...floating. ;)

centril (Oct 01 2019 at 14:25, on Zulip):

:rofl:

Josh Triplett (Oct 01 2019 at 14:26, on Zulip):

So, you would support having r32 and r64 types that defaulted to contraction and increased precision, even target-specific precision?

Josh Triplett (Oct 01 2019 at 14:26, on Zulip):

You just wouldn't support making f32 and f64 those types?

centril (Oct 01 2019 at 14:27, on Zulip):

That seems right, yeah

Josh Triplett (Oct 01 2019 at 14:28, on Zulip):

Would you support crate-local flags to make the f types have that behavior, and just not global flags?

centril (Oct 01 2019 at 14:30, on Zulip):

At first instance I would not. I think it would be better, due to e.g. interactions with generics and higher order functions that the f* types have a single canonical meaning no matter what

Josh Triplett (Oct 01 2019 at 14:31, on Zulip):

Interesting. I find that a rather compelling argument against crate-local flags, though not against global flags.

Josh Triplett (Oct 01 2019 at 14:32, on Zulip):

Also, would you support building in the r types so that they can have the appropriate LLVM behavior enabled for them?

rkruppe (Oct 01 2019 at 14:32, on Zulip):

"Global" flags would be rustc command line flags and so would not be truly global (i.e., apply through the entire crate graph) due to separate compilation

Josh Triplett (Oct 01 2019 at 14:33, on Zulip):

@rkruppe Something something std-aware cargo. ;)

rkruppe (Oct 01 2019 at 14:33, on Zulip):

Even then someone can build one crate with one set of flags and another crate with different flags and link them

rkruppe (Oct 01 2019 at 14:34, on Zulip):

We'd need a mechanism like global allocators or panic runtimes (not to be confused with panic strategy) to truly enforce "one crate graph = one semantics"

Josh Triplett (Oct 01 2019 at 14:35, on Zulip):

Fair point.

centril (Oct 01 2019 at 14:35, on Zulip):

Modulo what @rkruppe said I'd agree with you @Josh Triplett

centril (Oct 01 2019 at 14:36, on Zulip):

(but there are other arguments wrt. back compat and "undoing the expectations that crates were written with that are unrelated to separate compilation)

Josh Triplett (Oct 01 2019 at 14:37, on Zulip):

I should also say that to a first approximation having the r types wouldn't be terrible. I'd then end up effectively telling people (with a footnote for the details) that r32 and r64 are the floating-point types and f32 and f64 are the slow floating-point types that you shouldn't use unless you have reason to know you need them.

rkruppe (Oct 01 2019 at 14:37, on Zulip):

We're probably all aware but for the record: https://github.com/rust-lang/rfcs/pull/2686

Josh Triplett (Oct 01 2019 at 14:38, on Zulip):

It would be annoying to go through and fix everything to use generics though.

Josh Triplett (Oct 01 2019 at 14:39, on Zulip):

And any algorithm that works fine with extra precision (which is to say, just about every algorithm) should work with r32 and r64.

rkruppe (Oct 01 2019 at 14:39, on Zulip):

I should also say that to a first approximation having the r types wouldn't be terrible. I'd then end up effectively telling people (with a footnote for the details) that r32 and r64 are the floating-toint types and f32 and f64 are the slow floating-point types that you shouldn't use unless you have reason to know you need them.

Well, we probably can't get around providing a third option (at least) for the parts of -ffast-math that don't fit under your definition of rN (higher precision). In particular, numerically unsafe algebraic transformations (e.g. reassociation) would not be covered by it but are often a key component of the large performance gains which motivate people to compile with -ffast-math.

Josh Triplett (Oct 01 2019 at 14:41, on Zulip):

I don't have as much of a problem with the idea that if you want things like reassociation that lower precision you need to use special operations or a specific block.

rkruppe (Oct 01 2019 at 14:42, on Zulip):

Sure.

Josh Triplett (Oct 01 2019 at 14:42, on Zulip):

Those should definitely be case-by-case opt in.

Josh Triplett (Oct 01 2019 at 14:42, on Zulip):

Only if they don't break your algorithm.

Josh Triplett (Oct 01 2019 at 14:44, on Zulip):

But it should be easy to get FMA or multiple operations in a higher precision register with minimal effort and without rewriting any code (or at most, changing types).

rkruppe (Oct 01 2019 at 14:45, on Zulip):

I bring it up because it's a highly related matter and ideally we'd have a coherent story for all of this rather than several things which were designed completely separately and don't fit together conceptually

Josh Triplett (Oct 01 2019 at 14:45, on Zulip):

Also, there should be trivial infallible conversions via into and from to go between f and r types.

rkruppe (Oct 01 2019 at 14:47, on Zulip):

This is not to say these things all need to use the same mechanism, but we'd often be telling people not just "these are the fast types and these are the slow ones" but also mention next that there's also <whatever other fast-math-y things we add>

Josh Triplett (Oct 01 2019 at 14:50, on Zulip):

I understand. I get that people do want to carefully opt into "go even faster at the expense of accuracy" mode.

Josh Triplett (Oct 01 2019 at 14:50, on Zulip):

And in that mode, you do need control over "how much inaccuracy".

Josh Triplett (Oct 01 2019 at 14:51, on Zulip):

Unlike the r types, where it makes sense to enable all "more precision" optimisations at once.

rkruppe (Oct 01 2019 at 14:54, on Zulip):

I actually have a soft spot for C-standard-like "contraction at the source level only" which can be deterministic in ways other optimizations (even contraction, when the FMAs are formed by the optimizer) aren't. But yeah it's a sensible option to group them all together.

rkruppe (Oct 01 2019 at 14:59, on Zulip):

Anyway, if we're going to end up with local controls of some of these optimizations in some form, then I do wonder whether we could use the same . New types, built-in or not, have drawbacks that have been discussed extensively. Plus, to pick up your point about getting FMAs and higher precision registers without rewriting code, changing types everywhere can still be really involved -- a module level annotation is much easier.

Josh Triplett (Oct 01 2019 at 21:11, on Zulip):

A module level annotation seems like the worst of both worlds to me.

Josh Triplett (Oct 01 2019 at 21:11, on Zulip):

Not global, possible type system interactions/limitations...

Josh Triplett (Oct 01 2019 at 21:43, on Zulip):

To your other point: I do want a coherent picture here. But I'd like the common cases to be simple.

Josh Triplett (Oct 01 2019 at 21:44, on Zulip):

(And I'm much less concerned about things that would require reprogramming the floating-point unit's behavior.)

rkruppe (Oct 02 2019 at 08:22, on Zulip):

I'm curious why "not global" is a drawback? The C precedent is a (translation-unit-)global flag, but if we're all in agreement that there should be some opt-in to the optimizations from the code author (e.g., by using a different data type) then why do we need a global(-ish) flag in addition? Why not just always apply the optimizations to the code that opts in?

nikomatsakis (Oct 22 2019 at 12:53, on Zulip):

Hey participants of this thread! It looks like seem really useful information was exchanged here. I'm not quite sure what because the thread is really long and I'm trying to catch up on hundreds of messages. =) Do you think somebody could try to write a summary of what was said? (I don't expect a consensus was reached, but just outlining the major points would be amazing.)

I'm not 100% sure, admittedly, where to push such a summary. This seems like largely a lang-design thing -- I think perhaps that we could create a directory on the lang-team repo to push summaries on interesting topics that should be considered in the future. (I would happily create such a directory to house this summary).

cc @rkruppe @centril @Josh Triplett @Alexander Droste @simulacrum

Last update: Nov 20 2019 at 02:35UTC