Stream: t-lang/wg-unsafe-code-guidelines

Topic: mem::zeroed guarantees that all bytes are zero


gnzlbg (Jul 31 2019 at 13:32, on Zulip):

I think the API docs of mem::zeroed are incorrect. They currently say:

Creates a value whose bytes are all zero.

gnzlbg (Jul 31 2019 at 13:33, on Zulip):

From which it follows that:

gnzlbg (Jul 31 2019 at 13:34, on Zulip):
let x: #[repr(C)] (u8, u16) = zeroed();
assert_eq!((&x as ... as *const u8).add(1).read(), 0); // always passes
gnzlbg (Jul 31 2019 at 13:35, on Zulip):

I don't think we can guarantee anywhere which values mem::zeroed() writes to padding bytes, and it probably should write anything to them

gnzlbg (Jul 31 2019 at 13:35, on Zulip):

definetely not be required / guarantee that it does

rkruppe (Jul 31 2019 at 14:12, on Zulip):

Yeah. Please file an issue.

RalfJ (Jul 31 2019 at 17:16, on Zulip):

I see. In my view what happens is that indeed there was an all-0 value, but then when the function returns that value, "typed copy" rules apply.

RalfJ (Jul 31 2019 at 17:16, on Zulip):

This expalains both why there is UB if 0 does not satisfy the validity invariant, and why padding is not preserved.

rkruppe (Jul 31 2019 at 17:43, on Zulip):

That is technically correct but we should definitely note explicitly what happens with padding, because it's subtle

RalfJ (Jul 31 2019 at 18:01, on Zulip):

indeed, the docs don't need to talk about "typed copies" or so.

RalfJ (Jul 31 2019 at 18:01, on Zulip):

I was thinking of the abstract machine spec and making sure it doesnt have to account separately for the padding here.

RalfJ (Jul 31 2019 at 18:01, on Zulip):

And I was also thinking of the abstract machine when writing those docs, which is why I missed the padding aspect. ;)

gnzlbg (Jul 31 2019 at 18:06, on Zulip):

@RalfJ @rkruppe so just to triple check. IIUC even with the current API docs, the assert here can fail right ?

union U {
    x: T,
    y: [u8; std::mem::size_of::<T>()]
}
let u = U { x: std::mem::zeroed() };
assert_eq!(u.y, [0_u8; 4]);
gnzlbg (Jul 31 2019 at 18:07, on Zulip):

(supposing that all zeroes is a valid representation for T, etc.

rkruppe (Jul 31 2019 at 18:08, on Zulip):

Yes. Arguably it might even be UB, if we say padding is always written as 0xUU

gnzlbg (Jul 31 2019 at 18:11, on Zulip):

Well I suppose the question is whether this code is UB or not then.

gnzlbg (Jul 31 2019 at 18:12, on Zulip):

I think that even with the current docs that say that it writes zero to all bytes of T, the behavior is undefined, because the value of the padding bytes can change to undef before the read of y

rkruppe (Jul 31 2019 at 18:14, on Zulip):

Oh wait, what do you mean by "even with the current API docs"?

gnzlbg (Jul 31 2019 at 18:27, on Zulip):

The current docs guarantee that all bytes of T are actually zeroed

rkruppe (Jul 31 2019 at 18:48, on Zulip):

I guess my confusion is whether you mean what a normal user would intuit or what follows from the "typed copy" perspective

RalfJ (Jul 31 2019 at 19:33, on Zulip):

I think that even with the current docs that say that it writes zero to all bytes of T, the behavior is undefined, because the value of the padding bytes can change to undef before the read of y

that depends on whether there was ever a type at type T

RalfJ (Jul 31 2019 at 19:33, on Zulip):

as @rkruppe argued on GH, there is no such thing as a "padding byte" per se. There is only a "padding byte for some type T".

rkruppe (Jul 31 2019 at 19:35, on Zulip):

evaluating zeroed() and putting the result in the union is uncontroversially a typed copy, right?

RalfJ (Jul 31 2019 at 19:38, on Zulip):

yes. but @gnzlbg was hypothesizing "with the current docs".

RalfJ (Jul 31 2019 at 19:38, on Zulip):

I guess that's a moot point because the current docs are not operational enough to answer this question.

gnzlbg (Aug 01 2019 at 07:29, on Zulip):

@RalfJ They might be barely enough. For example, is this a valid transformation of the program above?

union U {
    x: T,
    y: [u8; std::mem::size_of::<T>()]
}
let tmp: T = x: std::mem::zeroed();
let u = U { x: tmp };
assert_eq!(u.y, [0_u8; 4]);
gnzlbg (Aug 01 2019 at 07:29, on Zulip):

I think so, so even with the current docs, we do not guarantee that the assert will pass. The behavior of the assert is undefined.

RalfJ (Aug 01 2019 at 07:30, on Zulip):

@gnzlbg I think we both agree on what the semantics of zeroed are, don't we? and that the docs can be improved?

RalfJ (Aug 01 2019 at 07:30, on Zulip):

I see no point in trying to interpret more into these docs that I wrote than I had in my mind when writing them^^

RalfJ (Aug 01 2019 at 07:30, on Zulip):

"can and should be improved", that is

gnzlbg (Aug 01 2019 at 07:42, on Zulip):

@RalfJ shall I fill an issue ? or shall we try to come up with wording that fixes them and just modify them ?

gnzlbg (Aug 01 2019 at 07:42, on Zulip):

I think that saying that it zeroes the bytes "of the value representation of T" should be enough

RalfJ (Aug 01 2019 at 07:52, on Zulip):

except nobody will understand what that means

RalfJ (Aug 01 2019 at 07:52, on Zulip):

like, not even in the UCG glossary with all my PRs merged have we defined "the bytes of the value representation of T"

RalfJ (Aug 01 2019 at 07:54, on Zulip):

@gnzlbg alternative proposal: something along the lines of "transmutes zero-ed memory to T". and then we can go on explaining that, since padding bytes are not preserved when e.g. returned from a function, this means that some bytes of the returned datum can be 0.

RalfJ (Aug 01 2019 at 07:54, on Zulip):

yet another proposal: actually functions return values, not lists of bytes or so, so the "return value" doesn't even have padding. so we could try something along the lines of "returns the value represented by the all-0 bit pattern".

rkruppe (Aug 01 2019 at 08:01, on Zulip):

These two are very clever but I don't think there is any wording that avoids the need for an explicit note to the effect that padding is not preserved in this case either, and so if T has padding, the result will not be entirely zeroes. We don't don't need a precise official definition of padding (and when it is lost) or go into the differences between values and bytes lists to say that.

RalfJ (Aug 01 2019 at 08:08, on Zulip):

sure, I am all for explicitly stating the consequences of the definition. but at the same time I dont want this to sound like a weird magic special case, which it is not.

gnzlbg (Aug 01 2019 at 08:09, on Zulip):

The actual wording with such a note might even be enough?

gnzlbg (Aug 01 2019 at 08:09, on Zulip):

Otherwise, saying that "mem::zeroed<T> returns the value represented by the all-0 bit pattern" seems good as well.

RalfJ (Aug 01 2019 at 08:47, on Zulip):

I like "the value represented by the all-0 bit pattern" because it does nit involve a lot of weird jargon -- the terms there have a precise meaning but you can also read it and kind of understand it without knowing that precise meanings.
but we should definitely add a note saying that this means e.g. for (u8, u16) the padding byte does not have to be 0.

gnzlbg (Aug 01 2019 at 09:08, on Zulip):

Sounds good to me.

RalfJ (Aug 01 2019 at 09:20, on Zulip):

do you want to make a PR?

gnzlbg (Aug 01 2019 at 09:23, on Zulip):

i could at some point, but every time I need to make a PR, I need to branch, branches get out of sync, switching between branches takes days if LLVM needs to be recompiled, etc.

RalfJ (Aug 01 2019 at 09:25, on Zulip):

having multiple checkouts helps

RalfJ (Aug 01 2019 at 09:26, on Zulip):

so you can have multiple branches checked out and built (I use git worktree for that)

RalfJ (Aug 01 2019 at 09:26, on Zulip):

of course it also means when LLVM updates it needs to eventually be recompiled once for each worktree^^

RalfJ (Aug 01 2019 at 09:27, on Zulip):

but for this PR you dont even need LLVM

RalfJ (Aug 01 2019 at 09:27, on Zulip):

./x.py test --stage 0 src/libcore should be all you need

RalfJ (Aug 01 2019 at 09:27, on Zulip):

that takes just a few min even on a fresh checkout

gnzlbg (Aug 01 2019 at 09:58, on Zulip):

@RalfJ each check out takes ~30 Gb

gnzlbg (Aug 01 2019 at 09:58, on Zulip):

my laptop has ~256 Gb and is quite full, so I'm bottle necked at one checkout

RalfJ (Aug 01 2019 at 10:23, on Zulip):

ah. yeah Rust eats disk space like there's no tomorrow.

RalfJ (Aug 01 2019 at 10:23, on Zulip):

I recently freed 100GB by cleaning my compiler checkouts and a few projects.

gnzlbg (Aug 01 2019 at 11:57, on Zulip):

I need to start using the GCC compile farm again

centril (Aug 01 2019 at 21:11, on Zulip):

@gnzlbg you don't even need to do anything locally; just make a branch via the github UI and edit the text there...

Lokathor (Aug 08 2019 at 02:15, on Zulip):

yeah for just a docs change the webpage works great.

RalfJ (Aug 08 2019 at 08:15, on Zulip):

@Lokathor what were you respond to? The docs change has already landed. Which webpage?

Lokathor (Aug 08 2019 at 08:26, on Zulip):

Sorry, I hadn't noticed the date on the thread. I was agreeing with Centril that just editing docs through the github webpage is much easier than cloning a local copy of rustc

RalfJ (Aug 08 2019 at 08:29, on Zulip):

dunno, I am usually doing this in one of the 3 local copies I have and I prefer having a proper editor ;)

Last update: Nov 19 2019 at 17:35UTC