Stream: t-lang/wg-unsafe-code-guidelines

Topic: UB and non-determinism and I/O

RalfJ (Mar 29 2019 at 09:14, on Zulip):

@Jake Goulding

you understood me correctly. I was under the impression that Rust was taking the position of "if any code flow path could cause UB, it is always UB".

"if any code flow path could cause UB, it is always UB" is correct, but it does not mean what you (seem to) think it means.
it means that a program like let x = Box::new(0); if Box::into_raw(x) as usize % 512 == 0 { trigger_ub(); } has UB because the allocator is non-deterministic, and the compiler can decide for the allocator that it will pick a sufficiently aligned address here, and then take the then-branch.
however, "if any code flow path could cause UB, it is always UB" does not mean that let b = read_boolean_from_stdin(); if b { trigger_ub(); } always has UB! Here, there is no non-deterministic choice the compiler can affect; the choice is made by the outside world, it might be deterministic or whatever, and the compiler has to respect that. If that program is only ever run in an environment where b ends up false, there is no UB as the branch is never taken!

Formally speaking, there are two kinds of choices in a program:
Internal choice is when the semantics of the program say "this or that may happen and there is no way for the programmer or user to influence that". Examples include the memory allocator and the thread scheduler. The compiler is allowed to refine this choice, meaning it is allowed to take away certain possibilities here, like in my first example above where the compiler decided that the allocator will pick an address that is 512-aligned. (But then the entire execution has to be consistent with that refined choice!)
In contrast, external choice is when the program interacts with the outside world and receives inputs from there. The compiler cannot affect what inputs will come. So if a program has UB only in case of certain malformed inputs, then the compiler has to translate it in a way that the resulting binary behaves just like the original program did for any input that did not trigger UB. You can think of the program as representing an answer to every possible question from the outside, like in my 2nd example above "input false? answer: nothing happens, program terminates" and "input true? answer: UB". All the input-answer-input-answer traces without UB must be preserved by the compiler.

That got longer than I expected... I hope it makes some sense?

Last update: May 26 2020 at 11:35UTC