Stream: t-lang/wg-unsafe-code-guidelines

Topic: are optimizations allowed to change program results?


gnzlbg (Apr 29 2019 at 11:31, on Zulip):

In some Rust targets, like i586, optimizations currently change the program results, e.g., when FP arithmetic is involved

gnzlbg (Apr 29 2019 at 11:32, on Zulip):

For example, when the x87 FPU is used, and the precision of the FPU registers is not appropriately set, which is expensive, at different optimization levels (e.g. -C opt-level=0 vs -C opt-level=3) sample programs produce different results.

gnzlbg (Apr 29 2019 at 11:32, on Zulip):

https://github.com/rust-lang/unsafe-code-guidelines/issues/123

RalfJ (Apr 29 2019 at 11:33, on Zulip):

The very definition of a correct optimization is that it does not change what the program does. Or rather, more carefully stated: every behavior of the optimized program must have been a possible behavior of the source program.

RalfJ (Apr 29 2019 at 11:33, on Zulip):

So the way I see it, either whatever happens on x87 is unsound, or the semantics are not what we think they are.

RalfJ (Apr 29 2019 at 11:33, on Zulip):

Debug builds and release builds behaving differently is not an issue per se, it just means that the program is non-deterministic and the compiler chose different executions.

gnzlbg (Apr 29 2019 at 11:33, on Zulip):

I think it might make sense to, as part of the glossary of the UCGs, define what an optimization is, because we already guarantee some (e.g. niche Optimizations)

RalfJ (Apr 29 2019 at 11:33, on Zulip):

niche optimizations isnt really an optimization in this sense, yeah

RalfJ (Apr 29 2019 at 11:34, on Zulip):

it's more part of the data layout stuff that is partially unspecified

gnzlbg (Apr 29 2019 at 11:34, on Zulip):

and maybe answer the fundamental question of whether rust optimizations are allowed to change rust program semantics or not (without going into FP), and maybe RFCing that

gnzlbg (Apr 29 2019 at 11:34, on Zulip):

@RalfJ indeed, niche optimizations are a bad example

RalfJ (Apr 29 2019 at 11:34, on Zulip):

and maybe answer the fundamental question of whether rust optimizations are allowed to change rust program semantics or not (without going into FP), and maybe RFCing that

uh. of course they are not. that's like asking if the rust compiler is allowed to erase your hard disk. we dont need an RFC for that, there is decades of research on what is a compiler.^^

RalfJ (Apr 29 2019 at 11:35, on Zulip):

it might be worth repeating that here to have it documented, but having an RFC on this to me feels like having an RFC on the value of pi.^^

gnzlbg (Apr 29 2019 at 11:47, on Zulip):

indeed | EDIT: this was part of a private thread, so what follows does not make much sense here - the idea being explored here was how miri could implement fp non-determinism, e.g, by storing multiple values per float (e.g. an f64 and an f80 one)

gnzlbg (Apr 29 2019 at 11:48, on Zulip):

so per FP stack you don't only need two values, f64 and f80, you need all possible values

RalfJ (Apr 29 2019 at 11:49, on Zulip):

"FP stack"?

gnzlbg (Apr 29 2019 at 11:49, on Zulip):

e.g. if you want to diagnose if x > y { unsafe { foo() } and see if that can be called, where x and y have arrived from a chain of N and M FP arithmetic operations

gnzlbg (Apr 29 2019 at 11:49, on Zulip):

each of which individual operation could have been performed with a different precision, you have many x and many y possible values

gnzlbg (Apr 29 2019 at 11:50, on Zulip):

and are probably interested in "will this branch always or never be taken"

gnzlbg (Apr 29 2019 at 11:51, on Zulip):

that is, if you want to avoid executing the program an infinite amount of times, you'd have to track all FP values that are getting accumulated

RalfJ (Apr 29 2019 at 11:52, on Zulip):

sure, the usual problem with non-determinism is that it can be infeasible to explore all possible executions

gnzlbg (Apr 29 2019 at 12:24, on Zulip):

so I asked on #llvm in IRC and it seems that fixing this for i586 would be too expensive, and even then might not work properly

gnzlbg (Apr 29 2019 at 12:25, on Zulip):

the best bet would be to make all loads and stores of f32 and f64 in i586 volatile, instead of playing with rounding modes

gnzlbg (Apr 29 2019 at 12:33, on Zulip):

that would not produce the same results as SSE, but that's actually something we could implement in miri

rkruppe (Apr 29 2019 at 20:07, on Zulip):

Yes, it's not fundamentally difficult to force some arbitrary deterministic behavior out of x87 if you accept a big performance hit. But that doesn't solve the actual underlying question for UCG: what degree of non-determinism in floating point is possible (and hence, how much can results vary between different executions). Some amount is clearly needed to account for platform differences and compiler optimizations (see above, also: NaN bit patterns) but presumably there should be some limits even in the worst case?

Tom Phinney (Apr 29 2019 at 21:46, on Zulip):

@gnzlbg FP arithmetic is imprecise: optimizations can and often do change program results. You don't have to conditionally-invoke unsafe behavior to show this; all you need do is apply the associative and distributive laws of arithmetic to reorder the terms and/or factors in a computation, which is something that LLVM-like optimizers do. Often such reordering will lead to slightly different FP numeric results, which when compared with an "expected" result can lead to taking different branches in a program.

gnzlbg (Apr 29 2019 at 21:47, on Zulip):

@Tom Phinney we dont do those optimizations, and llvm doesn’t do them by default either

gnzlbg (Apr 29 2019 at 21:48, on Zulip):

One needs to opt-into -ffast-math in llvm and clang because of that

gnzlbg (Apr 29 2019 at 21:48, on Zulip):

So AFAICT the x87 situation is an LLVM bug

gnzlbg (Apr 29 2019 at 21:49, on Zulip):

On IRC some devs mentioned they would accept patches for that target, even if they make things much slower for the target by default

gnzlbg (Apr 29 2019 at 21:51, on Zulip):

I don’t know whether that would really end up happening, but optimizations should not change your program results

gnzlbg (Apr 29 2019 at 21:51, on Zulip):

At least not without explicitly opting into unsound optimizations

rkruppe (Apr 29 2019 at 22:39, on Zulip):

So AFAICT the x87 situation is an LLVM bug

I think this is quite arguable. The behavior of LLVM's codegen on x87 is closer to what FLT_EVAL_METHOD in standard C blesses as an implementation choice (only that the extra precision is not stripped at source-level assignments and casts but at ill-defined other points) than to the decidedly more shaky -ffast-math. You could argue LLVM IR should have stricter semantics by default and make the current behavior opt-in but AFAICT it deliberately doesn't (it's important for many targets, including very modern ones, to have some leeway in IEEE conformance). An option for more determinism would be great, but I think we can only seriously contemplate this in 2019 because nobody really cares about non-SSE2 hardware any more, and for the very same reason it'll be hard to get people to care enough to even classify this as a bug that needs fixing instead of sweeping it under the rug as being covered by the lack of guarantees about IEEE conformance.

gnzlbg (Apr 30 2019 at 06:13, on Zulip):

Which other modern targets have this particular behavior?

gnzlbg (Apr 30 2019 at 06:59, on Zulip):

Most / all LLVM FP intrinsics do mention IEEE conformance in the LangRef (and the standard is exact for many operations, e.g., +-*...), and there are fast-math flags in place to enable this kind of behavior already AFAICT

rkruppe (Apr 30 2019 at 07:11, on Zulip):

Various embedded platforms and GPUs have settings like "flush denormals to zero". Even when those are optional settings, they are commonly enabled and compilers want to be able to Just Work in their presence (unlike e.g. access to floating point exceptions and changes to rounding mode, which need special accomodation). And then there's the fact that some operations may be lowered to libcalls that are, in practice, not as accurate as IEEE mandates, but LLVM still wants to be able to generate code for those targets.

rkruppe (Apr 30 2019 at 07:13, on Zulip):

I just took another look at LangRef and indeed some intrinsics make reference to the corresponding libm function, but fadd/fsub/fmul/fdiv notably don't refer to IEE 754!

Last update: Nov 19 2019 at 19:05UTC