Stream: t-lang/wg-unsafe-code-guidelines

Topic: blog post about `MaybeUninit`


nikomatsakis (May 02 2019 at 20:13, on Zulip):

Hey @RalfJ! We were discussing https://github.com/rust-lang/rust/pull/60445 in the lang team meeting today, and we were thinking that it would be great if we had a good write-up about the deprecation of mem::uninitialized. Such a write-up would, I think, motivate the change, explain what is deprecated, and epxlain what people should do instead. We thought you would be the perfect person to write it. Interested? :)

(cc @centril )

nikomatsakis (May 02 2019 at 20:13, on Zulip):

Now that I write this, I am wondering if "blog post" is the right term

nikomatsakis (May 02 2019 at 20:13, on Zulip):

Maybe it should be posted as a github issue and then 'cross-posted' to internals, for example

nikomatsakis (May 02 2019 at 20:13, on Zulip):

(The idea is that the deprecation notice can also link to this)

nikomatsakis (May 02 2019 at 20:13, on Zulip):

That is, we want to (a) notify folks in advance and then (b) have something to link them to

RalfJ (May 05 2019 at 13:22, on Zulip):

Interested, yes. Does this come with a ticket for my nearest time machine? :P

RalfJ (May 05 2019 at 13:23, on Zulip):

@nikomatsakis @centril what did you have in mind in terms of timing and where this would end up being posted? When I look at my Rust folder I already feel like I made too many commitments, so I am a bit hesitant right now to add new ones^^. But also this is a topic I'd very much like to help "get right".

centril (May 05 2019 at 13:26, on Zulip):

Does this come with a ticket for my nearest time machine? :P

Sorry; we are all out of those. :P

what did you have in mind in terms of timing and where this would end up being posted?

Close to the 1.36 release would be good so we can link to it in blog.rust-lang.org
I was thinking you can post it in your own blog but maybe Niko thought something else.

nikomatsakis (May 06 2019 at 17:12, on Zulip):

If writing the post is too much work, @RalfJ, helping to edit it might be another option

RalfJ (May 16 2019 at 07:57, on Zulip):

I wonder what the post should say about MaybeUninit and structs...

Gankro (May 16 2019 at 18:43, on Zulip):

Is there anything to say about MaybeUninit other than: mem::uninitialized doesn't really work because types can have validity constraints, and uninit basically says "please assume any constraints are violated". So here's a type-level solution so that the compiler can understand what you're trying to do and not apply those assumptions?

Gankro (May 16 2019 at 18:44, on Zulip):

I am kinda bored, so if that's right, I am willing to do the writeup

Gankro (May 16 2019 at 18:45, on Zulip):

(Might also mention that undef/poison is a disaster in LLVM, so avoiding that as much as possible is also For The Best)

Gankro (May 16 2019 at 18:45, on Zulip):

(Although does MaybeUninit still just lower to undef..?)

RalfJ (May 16 2019 at 19:04, on Zulip):

yeah that's basically it

RalfJ (May 16 2019 at 19:04, on Zulip):

it lowers to cosntructing a union by initializing a zero-sized field

RalfJ (May 16 2019 at 19:04, on Zulip):

so, its effectively undef

Gankro (May 16 2019 at 19:23, on Zulip):

is it not undef in any particular way?

RalfJ (May 16 2019 at 20:07, on Zulip):

only in the way how we codegen (the uninit intrinsic is not involved)

Gankro (May 16 2019 at 23:40, on Zulip):

Drafted the post up. Do Not Distribute: https://gankro.github.io/blah/initialize-me-maybe/

centril (May 17 2019 at 00:00, on Zulip):

// statically uninit, init it

I prefer to expand the initialisms here (+ consequence changes in the whole text)

Gankro (May 17 2019 at 00:03, on Zulip):

I didn't want the lines to get long, and also it gets super repetitive

centril (May 17 2019 at 00:03, on Zulip):

rust has the Option type (or any enum, really):

You sometimes capitalize Rust and sometimes not... pick one :P

Also, Option<T> is the type, Option isn't a type.

centril (May 17 2019 at 00:03, on Zulip):

and also very poorly specified.

s/very//g

centril (May 17 2019 at 00:09, on Zulip):

s/can't/cannot/g

Gankro (May 17 2019 at 00:10, on Zulip):

disagreed on "can't" and "Option"

centril (May 17 2019 at 00:11, on Zulip):

https://github.com/rust-lang/rfcs/blob/master/text/1574-more-api-documentation-conventions.md#referring-to-types

centril (May 17 2019 at 00:36, on Zulip):

What Is MaybeUninit?

I don't think you link to the type in the standard library anywhere in the post, would be good to do so

gnzlbg (May 17 2019 at 10:48, on Zulip):

@Gankro I don't have much time, but the general impression is that the post is too long

gnzlbg (May 17 2019 at 10:50, on Zulip):

We should just explain "What's uninitialized memory?" "Why is that useful for Rust? (optimizations)" "Why is that useful for users? (optimizations)" "Why is it dangerous?" (user optimization with bug gets ""misoptimized"") "Why is mem::uninitialized deprecated?" (makes it almost impossible to write code that doesn't get misoptimized), "What is MaybeUninit and how does it improve on mem::uninitialized failures?"

gnzlbg (May 17 2019 at 10:51, on Zulip):

I don't really think we have to cover the heap to explain any of that

gnzlbg (May 17 2019 at 10:52, on Zulip):

A more comprehensive blog post covering everything there is to know about uninitialized memory might want to do that (although that might belong in the nomicon), but the blog post that accompanies the release should be short and to the point, such that hopefully most people read it.

Gankro (May 17 2019 at 15:32, on Zulip):

I could definitely see removing the "working with safe uninit memory" section, but everything else seems relevant and fairly brief

RalfJ (May 17 2019 at 17:26, on Zulip):

So as a conservative model it's reasonable to just declare that it is Undefined Behaviour to read uninitialized memory. Full stop.

Uh, I am not sure if I agree. memcpy of uninitialized memory is generally considered okay. In fact, given that padding is uninitialized, this occurs in safe Rust.

RalfJ (May 17 2019 at 17:27, on Zulip):

Also, I feel the first section doesnt go far enough in saying how exotic uninit memory is -- namely, that it is unstable and can change when you look at it multiple times

RalfJ (May 17 2019 at 17:27, on Zulip):

so even x == x (for x: i32) can legimiately be made false by the compiler

RalfJ (May 17 2019 at 17:27, on Zulip):

so IMO one should really think of bits as having 3 possible states (0, 1, U). I feel that's easier to explain than the "magic substance"^^

RalfJ (May 17 2019 at 17:30, on Zulip):

Also I tend to agree about the length -- I think the "safe working with uninit memory" can be shortened, and I am not sure if a survey of all sources of uninit memory in Rust is the best approach here. My thinking was that the post would explain mem::uninit (that can come nicely after the safe section, basically as a way to "trick" the "overly strict" static checks described in the safe section), and how its wrong, and then how MaybeUninit saves the day

RalfJ (May 17 2019 at 17:31, on Zulip):

the fact that it is a union shouldnt matter

Gankro (May 17 2019 at 17:40, on Zulip):

I wasn't sure if the "can change value" thing was a concensus semantic (or if that was undef vs poison)

Gankro (May 17 2019 at 17:41, on Zulip):

but that's a good point on memcopying

Gankro (May 17 2019 at 17:41, on Zulip):

also idk, in my mind it's very interesting that it's just "yo use a union" and not like "ah we made this brilliant new thing that's magic"

RalfJ (May 17 2019 at 17:43, on Zulip):

I wasn't sure if the "can change value" thing was a concensus semantic (or if that was undef vs poison)

undef has it and poison makes it not observable

RalfJ (May 17 2019 at 17:43, on Zulip):

so yes I'd say it is pretty much consesus

RalfJ (May 17 2019 at 17:44, on Zulip):

I mean this is but one way to explain the three-valued-bit thing

RalfJ (May 17 2019 at 17:45, on Zulip):

the other is to just say "it is special and not like any initialized memory, hence -> bits have three states"

RalfJ (May 17 2019 at 17:46, on Zulip):

also idk, in my mind it's very interesting that it's just "yo use a union" and not like "ah we made this brilliant new thing that's magic"

it is interesting, but I think when using MaybeUninit one should generally treat it as an opaque abstraction, and hence it should be possible to explain it as such

RalfJ (May 17 2019 at 17:46, on Zulip):

also re: the post structure, that was just what I had in mind. maybe other structures work better. I guess I am curious why you chose yours :)

Gankro (May 17 2019 at 17:49, on Zulip):

I just wrote it from the perspective of "how do I explain this to someone who has never heard of any of this"

RalfJ (May 17 2019 at 17:51, on Zulip):

I guess also your goal was different -- my goal would have been to explain "just" mem::uninit vs MaybeUninit

Gankro (May 17 2019 at 17:51, on Zulip):

How do you feel about "So as a conservative model it's reasonable to just declare that if you do anything with uninitialized memory other than just copying it around, it is Undefined Behaviour. Full stop."

RalfJ (May 17 2019 at 17:51, on Zulip):

I agree factually :)

RalfJ (May 17 2019 at 17:52, on Zulip):

I still think this is a good opportunity to instill the idea in people that uninit is not "magic" and not "random bits" but just that bits can be more than 0 and 1. People need to free themselves from the bounds of the hardware that constraints their imagination in terms of what the abstract machine they are really programming looks like :D

Gankro (May 17 2019 at 17:53, on Zulip):

I'm not convinced that nature of uninitialized memory is as well-defined as you claim

RalfJ (May 17 2019 at 17:57, on Zulip):

this is our language, we can define it :D

RalfJ (May 17 2019 at 17:57, on Zulip):

and this is basically the poison model

RalfJ (May 17 2019 at 17:57, on Zulip):

which is the most reasonable model out there (and you linked to the paper showing that)

RalfJ (May 17 2019 at 17:58, on Zulip):

so while there may be changes in the fine print, I think it is more useful to teach a concrete model (even if it is preliminary) than to wobble around with nothing concrete to say or point at

rkruppe (May 17 2019 at 18:00, on Zulip):

I second this. "Bits can be more than 0 or 1" seems a fairly useful intuition to instill in people. There's a lot of decisions to be made about whether it's actually per-bit or more coarse, how it propagates exactly, etc. but the general idea seems like it can be stretched to cover practically any plausible semantics. It's only completely misleading for a model where uninitialized memory is really just a non-deterministic string of 1s and 0s but I see no way we'll actually end up with that.

RalfJ (May 17 2019 at 18:04, on Zulip):

also, this helps when eventually we have to teach people that pointers are more than an integer (because pointers got provenance). if they already gave up the idea that Rust's memory equals hardware memory, that will be an easier sell. ;)

RalfJ (May 17 2019 at 18:05, on Zulip):

shameless plug: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

RalfJ (May 17 2019 at 18:06, on Zulip):

There's a lot of decisions to be made about whether it's actually per-bit or more coarse, how it propagates exactly, etc

yeah, that's the part I meant by "fine print"

Gankro (May 17 2019 at 18:09, on Zulip):

The thing is, we're slaves to our compiler backends, so as long as uninit memory becomes llvm undef, we have to cope with that mess

RalfJ (May 17 2019 at 18:10, on Zulip):

compiling from this bitwise-poison model to undef is sound

RalfJ (May 17 2019 at 18:11, on Zulip):

(we have to make the right choices for how operators propagate poison, but it can be done)

RalfJ (May 17 2019 at 18:12, on Zulip):

basically, a bitstring like 0UU1 maps to the set of LLVM values such that the non-U parts of the bitstring are the same

RalfJ (May 17 2019 at 18:12, on Zulip):

so UUUUUUUU = undef (at i8)

RalfJ (May 17 2019 at 18:13, on Zulip):

and then we say that UUUUUUUU * 00000000 = UUUUUUUU, while LLVM says the result is 00000000, but that's okay because we are less defined (so in the worst case, a program that has UB in Rust will not have UB in LLVM)

Gankro (May 17 2019 at 18:24, on Zulip):

ok I update the section definiing uninit memory, and also removed the safe use section (replaced with links to the nomicon and option): https://gankro.github.io/blah/initialize-me-maybe/

Gankro (May 17 2019 at 18:25, on Zulip):

Considering there's a "skip to the good parts" link, I think this is a pretty reasonable structure

Gankro (May 17 2019 at 18:25, on Zulip):

(most of the vertical height is code examples)

RalfJ (May 17 2019 at 18:33, on Zulip):

does your template supports subsections? looks like the "three kind of uninit" should be subsections of the "unsafe" section

RalfJ (May 17 2019 at 18:34, on Zulip):

not having clicked the "skip to the good parts" link, I thought these were all top-level, so I didnt really see anything stand out as I went to the "good parts"

RalfJ (May 17 2019 at 18:34, on Zulip):

also, seems odd to have "Finally, we come to the focus of this post." in the "skip this" part?

RalfJ (May 17 2019 at 18:35, on Zulip):

hm. your wording is dancing the edge to make it sound like an uninit i32 would be okay because it has no extra invariant...

RalfJ (May 17 2019 at 18:35, on Zulip):

and it may well be but we dont want to commit to that yet^^

RalfJ (May 17 2019 at 18:37, on Zulip):

also, if you want to add another example, "out pointers" are a nice one

Gankro (May 17 2019 at 19:16, on Zulip):

They are subsections in the markup, I just make h1/h2 look the same atm

Gankro (May 21 2019 at 14:10, on Zulip):

is this good to go?

Gankro (May 21 2019 at 14:12, on Zulip):

@nikomatsakis ^

RalfJ (May 21 2019 at 14:39, on Zulip):

I left some comments here that you didn't reply to, such as

seems odd to have "Finally, we come to the focus of this post." in the "skip this" part?

Gankro (May 21 2019 at 14:40, on Zulip):

@RalfJ made a tweek to "what went wrong" that should hopefully satisfy the concern you had with that: Screen-Shot-2019-05-21-at-10.39.27-AM.png

Gankro (May 21 2019 at 14:40, on Zulip):

(looking at skip this now...)

Gankro (May 21 2019 at 14:44, on Zulip):

ok yeah made it link to the previous subsection. It's short enough and easy to skip forward from if people feel familiar with the type.

RalfJ (May 21 2019 at 15:02, on Zulip):

what about?

your wording is dancing the edge to make it sound like an uninit i32 would be okay because it has no extra invariant...

Gankro (May 21 2019 at 15:08, on Zulip):

@RalfJ is that not addressed by the screenshot i just posted?

RalfJ (May 21 2019 at 15:15, on Zulip):

oh, sorry, missed that oops

Gankro (May 21 2019 at 15:50, on Zulip):

so should I post this..?

Gankro (May 21 2019 at 18:04, on Zulip):

squirms impatiently

Gankro (May 21 2019 at 18:13, on Zulip):

I'm gonna post this at 3pm EST (in 40 mins) unless someone objects, because people are already discussing the change

RalfJ (May 21 2019 at 18:14, on Zulip):

the release is happening in 6 weeks, why the urge?^^

RalfJ (May 21 2019 at 18:15, on Zulip):

people were discussing this for months

RalfJ (May 21 2019 at 18:15, on Zulip):

but also, you addressedd all the comments I can remember, so I wont object ;)

Gankro (May 21 2019 at 18:16, on Zulip):

Because people using nightly will start seeing this, and they need a clear explanation

RalfJ (May 21 2019 at 18:16, on Zulip):

For the compiler people out there, mem::uninitialized simply lowers to llvm's undef.

might be worth relating this to the "uninitailized" state of the three-state boolean you mention earlier in the post?

RalfJ (May 21 2019 at 18:17, on Zulip):

they wont see a deprecation warning unless they specifically opt-on via warn(deprecated_in_future) if that's what you mean

Gankro (May 21 2019 at 18:18, on Zulip):

oh! didn't know that was a thing

RalfJ (May 21 2019 at 18:19, on Zulip):

I wasn't 100% sure if I could claim that arr[i] = x doesn't create a reference,

this is a MIR primitive, so you can be sure. but the moment Deref or Index are involved you are hosed...

RalfJ (May 21 2019 at 18:20, on Zulip):

And to be absolutely clear, it's not obvious to the Unsafe Code Guidelines team that mem::uninitialized is usable even for always-valid types like u32.

again might be worth saying that this is because of the three-state boolean (so u32 isn't really always-valid, it's just always-valid for 0s and 1s)

RalfJ (May 21 2019 at 18:21, on Zulip):

other than that, looks great :)

rkruppe (May 21 2019 at 18:21, on Zulip):

this is a MIR primitive, so you can be sure. but the moment Deref or Index are involved you are hosed...

I don't think we want to guarantee such MIR details to users right now. More conservative to pretend it goes through the (actually existing) impl Index<usize> for [T].

Gankro (May 21 2019 at 18:21, on Zulip):

@RalfJ does both slices and arrays have the "builtin" impl?

RalfJ (May 21 2019 at 18:24, on Zulip):

@rkruppe fair

RalfJ (May 21 2019 at 18:24, on Zulip):

@Gankro looks like it: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c2a206fa78dcd76fdee9d5776877b7d1

Gankro (May 21 2019 at 18:25, on Zulip):

neat

Gankro (May 21 2019 at 18:25, on Zulip):

but yeah I agree it's too subtle to declare atm

RalfJ (May 21 2019 at 18:26, on Zulip):

it's sad that we lose the bounds check that way though

RalfJ (May 21 2019 at 18:27, on Zulip):

I mean, we totally will one day rely on (*x).field not going through any Deref

RalfJ (May 21 2019 at 18:27, on Zulip):

not sure why (*x)[i] should be different, then

rkruppe (May 21 2019 at 18:28, on Zulip):

Field access can't be overloaded, indexing can, so treating (*x)[i] differently depending on the type of x means there's some inconsistency

rkruppe (May 21 2019 at 18:29, on Zulip):

But to be clear, I could totally see us guaranteeing it, just... not now

Gankro (May 21 2019 at 18:31, on Zulip):

I would be scared to rely on it just because if I see x[i] i don't know if that's like an array or slice or vec, and if it's vec, is the indexing impl on vec or the slice and it's doing autoderef andddddd aaaaaaaaa

Gankro (May 21 2019 at 18:31, on Zulip):

but my good friend ptr.add(i).write(val) will never do me wrong

Gankro (May 21 2019 at 19:12, on Zulip):

posted: https://twitter.com/Gankro/status/1130914262631821312

RalfJ (May 21 2019 at 19:21, on Zulip):

:musical_notes:

nikomatsakis (May 28 2019 at 20:01, on Zulip):

@Gankro sorry, was traveling, but it seems like there was a lot of discussion here! thanks for working on that!

Gankro (May 28 2019 at 22:29, on Zulip):

if i have time I'm going to write a followup comparing the C++/Rust models here a bit more, because a lot of people got caught up on "well C++ can do this", because I didn't really drill into the buckwild things mem::uninitialized vaguely implies you can do.

Last update: Nov 19 2019 at 18:10UTC