Stream: t-lang/wg-unsafe-code-guidelines

Topic: Triggering SROA in LLVM?


RalfJ (Jun 13 2019 at 17:45, on Zulip):

Does someone know a good way to trigger SROA in LLVM? I am trying to demonstrate concretely that in LLVM, padding can turn into undef any time, but I don't know how to write that example.^^

RalfJ (Jun 13 2019 at 17:47, on Zulip):

ah I think I got it

nagisa (Jun 13 2019 at 18:04, on Zulip):

I don’t think expansion in arguments is SROA at all.

RalfJ (Jun 13 2019 at 18:06, on Zulip):

well what I wanted to see is LLVM copying fields separately instead of doing a big memcpy

RalfJ (Jun 13 2019 at 18:06, on Zulip):

I thought that's SROA?

rkruppe (Jun 13 2019 at 18:07, on Zulip):

Look at the debug LLVM IR, rustc already emits it as per-field copies.

nagisa (Jun 13 2019 at 18:11, on Zulip):

it was trivial to show that padding was undef before miri, but now that we have our constants be byte arrays, that is harder

nagisa (Jun 13 2019 at 18:11, on Zulip):

before a plain static A = (0u32, 0u64); ended up generating something like {u32, u32, u64} = {0, undef, 0}

nagisa (Jun 13 2019 at 18:13, on Zulip):

@RalfJ how are you expecting to show that padding is undef with SROA?

RalfJ (Jun 13 2019 at 18:14, on Zulip):

Look at the debug LLVM IR, rustc already emits it as per-field copies.

d'oh.^^
Well, ultimately the argument was about padding in Rust, not LLVM, so good enough^^

rkruppe (Jun 13 2019 at 18:15, on Zulip):

@nagisa The trickier (and more relevant to https://github.com/crossbeam-rs/crossbeam/issues/315) aspect is: after you fully initialized all the size_of::<T>() bytes of a location, and then resume treating it as a T with padding, when (if ever) do the padding bytes get reset to undef?

RalfJ (Jun 13 2019 at 18:15, on Zulip):

RalfJ how are you expecting to show that padding is undef with SROA?

well like in the example above:

  %y = alloca { i32, i64 }, align 8
  %0 = bitcast { i32, i64 }* %y to i8*
  %1 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %y, i64 0, i32 0
  store i32 %x.0, i32* %1, align 8
  %2 = getelementptr inbounds { i32, i64 }, { i32, i64 }* %y, i64 0, i32 1
  store i64 %x.1, i64* %2, align 8
  call void %f({ i32, i64 }* noalias nonnull readonly align 8 dereferenceable(16) %y)

clearly when f gets called the padding in %y is still undef because it has never been written to

RalfJ (Jun 13 2019 at 18:16, on Zulip):

nagisa The trickier (and more relevant to https://github.com/crossbeam-rs/crossbeam/issues/315) aspect is: after you fully initialized all the size_of::<T>() bytes of a location, and then resume treating it as a T with padding, when (if ever) do the padding bytes get reset to undef?

actually, having any undef in padding is enough for that -- with get_mut we dont get a guarantee that whatever gets written was ever fully initialized

RalfJ (Jun 13 2019 at 18:17, on Zulip):

I just realized that making it Copy if T: Copy would also break this because copying the Atomic<T> will reset to undef^^

rkruppe (Jun 13 2019 at 18:18, on Zulip):

If that's enough then even initalizers like let pair = (0u16, 0u32); should demonstrate undef.

nagisa (Jun 13 2019 at 18:18, on Zulip):

A trivial way to show that is https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=69d8bed54286c5eba34132b2ce36bab4 probably

nagisa (Jun 13 2019 at 18:18, on Zulip):

rather than trying to use SROA

RalfJ (Jun 13 2019 at 18:19, on Zulip):

@nagisa not sure what that gets us? sure unsafe code can do weird stuff^^

RalfJ (Jun 13 2019 at 18:19, on Zulip):

or do you mean because this can optimize to return undef? hm

nagisa (Jun 13 2019 at 18:19, on Zulip):

You see undef in optimised LLVM IR which is a nice way to show that "hey, padding bytes are indeed undef"

nagisa (Jun 13 2019 at 18:20, on Zulip):

and not, say, zeros.

RalfJ (Jun 13 2019 at 18:20, on Zulip):

yeah true

RalfJ (Jun 13 2019 at 18:20, on Zulip):

now extra points if we can do this with any incoming X, not just a locally initialized one.

rkruppe (Jun 13 2019 at 18:20, on Zulip):

I doubt LLVM has enough information about padding for that.

RalfJ (Jun 13 2019 at 18:20, on Zulip):

Well that was easy: https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=ae1e2eae63468d5a9eb80f3f140fb3b9

rkruppe (Jun 13 2019 at 18:21, on Zulip):

Oh, chaining it with the move-resets-padding

nagisa (Jun 13 2019 at 18:21, on Zulip):

you can just remove the local let x

nagisa (Jun 13 2019 at 18:21, on Zulip):

but if you pass in x as a reference argument, it is no longer undef.

nagisa (Jun 13 2019 at 18:21, on Zulip):

(see https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=91ade64c05ea5ca31f7ba3e970fefa92)

rkruppe (Jun 13 2019 at 18:22, on Zulip):

Yes, the key is that passing the argument by value already resets any padding (because those bytes aren't even passed)

RalfJ (Jun 13 2019 at 18:24, on Zulip):

well or they are passed but gets reset

RalfJ (Jun 13 2019 at 18:24, on Zulip):

the detail is in the wording, doesnt really matter though

gnzlbg (Jun 18 2019 at 09:09, on Zulip):

Yes, the key is that passing the argument by value already resets any padding (because those bytes aren't even passed)

That matches what I would expect. We only memcpy fields on move/copy, so the answer to:

after you fully initialized all the size_of::<T>() bytes of a location, and then resume treating it as a T with padding, when (if ever) do the padding bytes get reset to undef?

is probably something like "On every copy/undef padding fields (might) be reset to undef." Since one cannot really reason about when copy/moves happen (this changes with the optimization level, and the compiler is allowed to do any of these "whenever"), then it feels simpler for me to just tell people that even if you fully initialize all bytes of an allocation, the moment you start treating it as a T, the padding bytes are always undef. I don't know how would one implement AtomicCell under this constraints (can it use an [u8; 16] internally?).

nagisa (Jun 18 2019 at 11:22, on Zulip):

It cannot use [u8; 16] internally if you expect to be able to compare_xchg the whole struct – it would still compare padding bytes. Basically the only thing you can compare_xchg are contiguous data (data for which all bytes have non-padding purpose).

gnzlbg (Jun 18 2019 at 14:12, on Zulip):

@nagisa what I had in mind is that it would take T, transmute it to [u8; 16] which we should specify as a nop, freeze the [u8; 16] which should also be a nop, and then do a compare_xchg on the result of the freeze

RalfJ (Jun 18 2019 at 14:12, on Zulip):

I assume you mean "NOP on the hardware", because it's clearly not a NOP in Rust or else why would you even do it

nagisa (Jun 18 2019 at 14:13, on Zulip):

The [u8; 16] stored in the atomiccell and your transmuted [u8; 16] will still have junk in their padding unless you explicitly keep it some particular value.

gnzlbg (Jun 18 2019 at 14:13, on Zulip):

@nagisa the freeze should give the undef some particular indeterminate value

nagisa (Jun 18 2019 at 14:15, on Zulip):

It gives undef an arbitrary value for that particular undef. Since compare_exchange works on two distinct values, freeze does not help you there whatsoever – you’ll just be comparing two indeterminate values with a nonsensical outcome.

gnzlbg (Jun 18 2019 at 14:15, on Zulip):

@nagisa there was a proposed algorithm that worked around that

gnzlbg (Jun 18 2019 at 14:16, on Zulip):

@nagisa see: https://github.com/rust-lang/unsafe-code-guidelines/issues/71#issuecomment-460097424

RalfJ (Jun 18 2019 at 14:17, on Zulip):

@nagisa AtomicCell already runs this in a loop to handle that problem

RalfJ (Jun 18 2019 at 14:18, on Zulip):

but the main issue we still have is that there might be undef bytes in the memory

RalfJ (Jun 18 2019 at 14:18, on Zulip):

so we'd need a way to "freeze" them that does not introduce a read-write race...

nagisa (Jun 18 2019 at 14:18, on Zulip):

yeah, you cannot freeze memory and I was already writing that the linked algorithm is, therefore, nonsense.

gnzlbg (Jun 18 2019 at 14:19, on Zulip):

@RalfJ yes nop was a poor choice

RalfJ (Jun 18 2019 at 14:19, on Zulip):

well maybe you can freeze memory^^ also the loop is needed for other stuff (like PartialEq::eq and byte-wise equality not agreeing)

nagisa (Jun 18 2019 at 14:20, on Zulip):

like PartialEq::eq and byte-wise equality not agreeing

If we’re going there then byte-wise equality does not imply semantic equality either.

RalfJ (Jun 18 2019 at 14:21, on Zulip):

well that's what I said

nagisa (Jun 18 2019 at 14:24, on Zulip):

anyway, my strong opinion that, with what LLVM semantics are around compare_exchange and padding bytes, this is not currently salvageable for any "loop"

gnzlbg (Jun 18 2019 at 14:29, on Zulip):

maybe we need a different intrinsic

gnzlbg (Jun 18 2019 at 14:31, on Zulip):

we might need that anyways, e.g., if creating an invalid usize is instant UB

RalfJ (Jun 18 2019 at 14:32, on Zulip):

even with direct access to the abstract machine I don't know how to define a compare_exchange on data involving undef that makes sense and is actually implementable...

gnzlbg (Jun 18 2019 at 14:33, on Zulip):

not without huge drawbacks

RalfJ (Jun 18 2019 at 14:33, on Zulip):

I mean we could say "compare frozen versions of the data" (both src and what is in memory). but then (a) there's still no guarantee to converge and (b) this rules out replacing CAS by simple comparison if the compiler can prove that the code is sequential.

gnzlbg (Jun 18 2019 at 14:33, on Zulip):

a wild and bad idea would be to say that moves/copys do not touch padding

gnzlbg (Jun 18 2019 at 14:34, on Zulip):

so if you have a [u8; 16] and you zero it, and then you move a T on top of it, you know the padding is zero

gnzlbg (Jun 18 2019 at 14:34, on Zulip):

but this would mean that moving a struct isn't as simple as a memcpy over the whole struct

gnzlbg (Jun 18 2019 at 14:34, on Zulip):

which would be generally bad for perf

gnzlbg (Jun 18 2019 at 14:35, on Zulip):

Instead, a less bad idea might be to add a finer grained intrinsic that only moves/copies non-padding bytes

gnzlbg (Jun 18 2019 at 14:41, on Zulip):

So that you can do:

let value: T;
let mut storage = 0_usize;
ptr::write_fields(&mut storage as *mut _ as *mut T, value);

You would still be working with storage: usize, but padding bytes wouldn't be undef.

RalfJ (Jun 18 2019 at 15:29, on Zulip):

how would that help AtomicCell though? It can still be moved like normal variables can...

gnzlbg (Jun 18 2019 at 15:34, on Zulip):

It would help the implementation of the AtomicCell::compare_exchange algorithm.

gnzlbg (Jun 18 2019 at 15:35, on Zulip):

Right now the problem is that some of the bytes in the usize that's used internally for the CAS are undef, so a compare_and_exchange of an usize with undef bytes does "whatever"

gnzlbg (Jun 18 2019 at 15:37, on Zulip):

AtomicCell<T> could use ptr::write_fields to ensure that the padding bytes of the T stored inside it is always zero (or some other value), and use ptr::write_fields in AtomicCell::compare_and_exchange argument to set its padding bytes to the same value

gnzlbg (Jun 18 2019 at 15:38, on Zulip):

That is, the CAS would only fail if the non-padding bytes of T differ, which is better than "if any bytes differs"

Last update: Nov 20 2019 at 12:40UTC