Stream: t-lang/wg-unsafe-code-guidelines

Topic: Padding bytes UB


Hadrien Grasland (Oct 15 2019 at 15:11, on Zulip):

So, I could use a quick refresher on what the current UCG consensus is regarding observation of padding bytes.

Hadrien Grasland (Oct 15 2019 at 15:13, on Zulip):

I think I remember that reading from padding bytes was UB, but I'm not sure how such a contract can be considered tenable when it means that e.g. memcpy-ing a struct from one memory location to another is UB.

Hadrien Grasland (Oct 15 2019 at 15:15, on Zulip):

(unless memcpy has special superpowers that allow it to ignore the fact that observing padding bytes is UB, in which case I could be interested in figuring out how I can get similar superpowers in different code)

Hadrien Grasland (Oct 15 2019 at 15:16, on Zulip):

Which means that moving Rust objects around is UB, since our moves are basically memcpy, again unless the compiler has special superpower to move objects with padding the safe way.

Hadrien Grasland (Oct 15 2019 at 15:20, on Zulip):

It seems to me that memcpy would somehow need to implement the elusive freeze semantics that can magically turn uninitialized bytes into non-deterministic valid bytes.

RalfJ (Oct 16 2019 at 17:16, on Zulip):

So, I could use a quick refresher on what the current UCG consensus is regarding observation of padding bytes.

in my view, it is that there's no such thing as "padding bytes" in the memory of the rust abstract machine. so "observing padding bytes" is kind of an ill-typed question.
see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md#the-role-of-the-value-representation-in-the-operational-semantics for all the glory details.

RalfJ (Oct 16 2019 at 17:16, on Zulip):

but if we ignore some details, reading from padding is the same as reading uninitialized memory

RalfJ (Oct 16 2019 at 17:16, on Zulip):

and that's decidedly not UB. see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md#the-role-of-the-value-representation-in-the-operational-semantics for the details.

RalfJ (Oct 16 2019 at 17:17, on Zulip):

the reason that let b: bool = mem::uninitialized() is UB is not that we read uninit memory. it's that we constructed a bool that is not valid.

RalfJ (Oct 16 2019 at 17:17, on Zulip):

this requires thinking of Rust memory as storing an Option<u8> for each location (tracking initializedness)

RalfJ (Oct 16 2019 at 17:17, on Zulip):

and then the validity invariant for bool says that None is not valid

RalfJ (Oct 16 2019 at 17:42, on Zulip):

however, all types (currently) rule out uninit bytes in their validity invariant. the only places where reading uninit bytes is not UB is MaybeUninit, and when the uninit byte is a padding byte of the type used for the access

RalfJ (Oct 16 2019 at 17:43, on Zulip):

I should have prefaced: the key mindset here is that memory is untyped (just a sequence of uninterpreted Option<u8> -- okay actually sth more complicated than that but good enough for now), but operations are types (when doing a load at type T, you take the sequence of raw bytes and interpret them according to the rules of type T -- and types have padding, but memory does not)

Hadrien Grasland (Oct 16 2019 at 18:17, on Zulip):

Then, to bring this closer to my underlying use case, how would you implement memcpy() in a UB-free way that still allows the implementation of Rust's move operation?

Hadrien Grasland (Oct 16 2019 at 18:17, on Zulip):

(Knowing that during a Rust move operation the memcpy operation may be used to copy the internal representation of a certain type T, which has uninitialized padding bytes)

Hadrien Grasland (Oct 16 2019 at 18:25, on Zulip):

I think what I'm trying to do is to implement the equivalent of the "typed copy" operation that you are describing in https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md#the-role-of-the-value-representation-in-the-operational-semantics .

Hadrien Grasland (Oct 16 2019 at 18:26, on Zulip):

I don't really want to preserve the value of the padding bytes. It just happens that I am not the Rust compiler, and therefore have no knowledge of where they are and cannot cautiously implement a custom memcpy operation that refrains from reading from them.

Hadrien Grasland (Oct 16 2019 at 18:27, on Zulip):

(even though I know what the type T which I am in the process of copying is, I don't know its layout)

RalfJ (Oct 16 2019 at 18:30, on Zulip):

Then, to bring this closer to my underlying use case, how would you implement memcpy() in a UB-free way that still allows the implementation of Rust's move operation?

you mean implement inside Rust? By copying MaybeUninit<u8>, bytewise

RalfJ (Oct 16 2019 at 18:30, on Zulip):

but that's not the same as a Rust move

RalfJ (Oct 16 2019 at 18:31, on Zulip):

Rust's move is typed, memcpy is untyped

RalfJ (Oct 16 2019 at 18:31, on Zulip):

it is correct to use memcpy to implement Rust's move, but not "complete" -- as in, you make some undefined programs defined by doing that replacement.

RalfJ (Oct 16 2019 at 18:32, on Zulip):

Rust's move can be done in unsafe code using dest.write(src.read()), where dest: *mut T and src: *const T and T is the type at which you want to do the ("typed") copy

Hadrien Grasland (Oct 16 2019 at 18:34, on Zulip):

Mmmm... the thing is, this is inside of Abomonation, and I need to send the bytes to a Write implementation. Which may do all kind of strange and wonderful and IO-ish things with them. And then at the end someone somewhere will get an &[u8] and deserialize it into an &T.

Hadrien Grasland (Oct 16 2019 at 18:34, on Zulip):

So I cannot use MaybeUninit<u8>, because Write expects u8s.

Hadrien Grasland (Oct 16 2019 at 18:34, on Zulip):

I guess I'm stuck with UB here, until / if ever Rust gets a freeze() operation.

Hadrien Grasland (Oct 16 2019 at 18:37, on Zulip):

Then I can transmute &T into &[MaybeUninit<u8>], freeze the bytes to get &[u8], and send that to my Write implementation.

Hadrien Grasland (Oct 16 2019 at 19:17, on Zulip):

Posted a summary on the abomonation bugtracker, feel free to cross-check and correct it : https://github.com/TimelyDataflow/abomonation/issues/32

RalfJ (Oct 16 2019 at 19:33, on Zulip):

yeah, using [u8] to represent "any kind of data" is (currently) wrong

RalfJ (Oct 16 2019 at 19:34, on Zulip):

you're not alone in making that mistake though; libstd's Read trait also did it^^

RalfJ (Oct 16 2019 at 19:35, on Zulip):

IMO this is actually strong motivation to not make that immediate UB, but https://github.com/rust-lang/unsafe-code-guidelines/issues/71 kind of stalled (mostly needs a write-up)

Last update: Nov 19 2019 at 18:10UTC