Stream: t-lang/wg-unsafe-code-guidelines

Topic: Letting Read::read initialize memory


RalfJ (Feb 10 2019 at 18:17, on Zulip):

At the all-hands, @Amanieu d'Antras and @David Tolnay suggested the following approach to passing uninitialized memory to Read::read: We could add a function like fn freeze<T>(x: *mut T) to libcore, and the semantics of that function is to "freeze" the memory pointed to by x (and with the size given by T). The "freeze" operation is the same as in this paper, and its semantics basically is to turn the uninitialized parts of that memory into some non-deterministic fixed bit pattern. Since any fixed bit pattern is valid in a &mut [u8], an uninitialized array can be passed to Read::read after freezing.
The main problem here is actually implementing this function in LLVM, but their realization was that black_box, i.e. an empty volatile memory assembly block, should have this effect.

What do you think of this plan? I am particularly curious about @rkruppe's thoughts on the LLVM side of this. Also, @Amanieu d'Antras , @David Tolnay how did you plan to introduce this -- seems like it would need an RFC?

nagisa (Feb 10 2019 at 18:19, on Zulip):

As discussed on #rust-libs on IRC... 1) asm! does not exist on all platforms

nagisa (Feb 10 2019 at 18:19, on Zulip):

it would be possible to "freeze" by passing an "uninit" buffer into an extern function LLVM has no insight into

nagisa (Feb 10 2019 at 18:19, on Zulip):

LLVM side of things: https://reviews.llvm.org/D29011

nagisa (Feb 10 2019 at 18:20, on Zulip):

i.e. an instruction that does this has been long-time desired, but hasn’t landed yet despite the existing implementation.

nagisa (Feb 10 2019 at 18:21, on Zulip):

reasons are unclear to me.

RalfJ (Feb 10 2019 at 18:25, on Zulip):

the instruction is for values though, not for memory -- it would likely need to be run in a loop or so

rkruppe (Feb 10 2019 at 18:25, on Zulip):

What platforms does a nop asm!() not work on? wasm and BPF, I assume, anything else? On those platforms, throwing in an extern function is problematic: e.g., in wasm that will mean everyone using rustc-generated wasm will have to actually supply such a no-op function under the name we choose (unless we do very hacky hacks in the linking stage).

nagisa (Feb 10 2019 at 18:26, on Zulip):

@rkruppe wasm is the most problematic outstanding example of this, yes.

nagisa (Feb 10 2019 at 18:26, on Zulip):

I recall there being T3 platforms that do not have asmparser component in LLVM, but I don’t recall which platforms are these

RalfJ (Feb 10 2019 at 18:26, on Zulip):

the last I know, one issue the LLVM devs have with adapting freeze/poison is about whether poison works on a per-byte or per-value level. but it seems to me that freeze can be defined and is useful independent of such questions.

rkruppe (Feb 10 2019 at 18:27, on Zulip):

Yes, there are difficult open questions about how to proceed with undef/poison, but both an volatile memory inline asm or an unknown external function must be conservatively assumed to initialize the memory in some fashion, so it side steps all questions about how uninitialized memory would behave.

rkruppe (Feb 10 2019 at 18:28, on Zulip):

Of course, they both have the side effect of also blocking other optimizations, but for Read::read specifically that doesn't feel like it should be an issue.

nagisa (Feb 10 2019 at 18:28, on Zulip):

one option would be to call something like memset with 0 size with nobuiltin flag set (so that LLVM does not handle it as such)

rkruppe (Feb 10 2019 at 18:29, on Zulip):

I am very uncertain about whether nobuiltin will actually make it do what we want, and even if it does, we'll lose very desirable optimizations in the surrounding code.

RalfJ (Feb 10 2019 at 18:30, on Zulip):

Not more than we would with black_box, though?

nagisa (Feb 10 2019 at 18:30, on Zulip):

@rkruppe given that asm! is an “acceptable” solution and that it inhibits just about every optimisation out there...

RalfJ (Feb 10 2019 at 18:30, on Zulip):

they seemed to think that that is fine in this situation. we can always provide a better implementation of this later without changing the contract of this function.

nagisa (Feb 10 2019 at 18:30, on Zulip):

@RalfJ it is not appropriate to talk about this in terms of black_box, because that is a hint function and may be a no-op.

RalfJ (Feb 10 2019 at 18:30, on Zulip):

@nagisa fair, I meant "what black_box will get implemented as"

rkruppe (Feb 10 2019 at 18:30, on Zulip):

Sorry, to be clearer, I don't mean the immediately surrounding code, I mean other uses of memset in the rest of the function/module

nagisa (Feb 10 2019 at 18:31, on Zulip):

@rkruppe nobuiltin is a callsite attribute, not a function attribute.

rkruppe (Feb 10 2019 at 18:31, on Zulip):

Ohhh, per call site even? nice

nagisa (Feb 10 2019 at 18:31, on Zulip):

I think?

RalfJ (Feb 10 2019 at 18:32, on Zulip):

Us having such a language construct might also give those that want to add freeze to LLVM some support :D

rkruppe (Feb 10 2019 at 18:32, on Zulip):

But I am still concerned about to what extent nobuiltin will stop LLVM optimizations. e.g. it certainly stops loop idiom recognition, but I can imagine missing checks for it elsewhere

rkruppe (Feb 10 2019 at 18:32, on Zulip):

Could we instead just patch the affected backends to accept (only) inline asm with an empty asm string?

RalfJ (Feb 10 2019 at 18:33, on Zulip):

but then we break with system LLVM...

rkruppe (Feb 10 2019 at 18:34, on Zulip):

At least it'll be a compile time error rather than a miscompile :)
And it only affects a few (currently) rather nice targets.

nagisa (Feb 10 2019 at 18:36, on Zulip):

there is ASMParser in wasm target subdirectory tree.

nagisa (Feb 10 2019 at 18:47, on Zulip):

I’m actually very surprised about freeze only taking integer values as an argument

nagisa (Feb 10 2019 at 18:48, on Zulip):

not even aggregates.

rkruppe (Feb 10 2019 at 18:58, on Zulip):

FCAs should just go away imo

RalfJ (Feb 10 2019 at 19:56, on Zulip):

FCAs?

rkruppe (Feb 10 2019 at 20:56, on Zulip):

first-class aggregates, i.e., structs and arrays used as SSA values

Amanieu d'Antras (Feb 11 2019 at 17:19, on Zulip):

Alternatively you could write the address of the buffer to a global atomic and then issue a compiler fence. LLVM must assume that a signal handler could have read the global atomic and used it to fill the buffer.

gnzlbg (Feb 12 2019 at 18:31, on Zulip):

What do you think of this plan?

While freeze is ok, I'm still unconvinced about allowing uninitialized integers. The only argument seems to be "to avoid breaking code using mem::uninitialized", but AFAICT we want to deprecate mem::uninitialized anyways.

rkruppe (Feb 12 2019 at 20:33, on Zulip):

Deprecating it will not make any existing uses go away.

RalfJ (Feb 12 2019 at 20:33, on Zulip):

@rkruppe the plan is to make it freeze memory

rkruppe (Feb 12 2019 at 20:34, on Zulip):

I saw mention of that, but I was replying specifically to what @gnzlbg wrote here

RalfJ (Feb 12 2019 at 20:35, on Zulip):

that will make the issue with integers go away for sure, and actually in practice I think this will even make the issues with all the other types go away (if freeze is implemented as an assembly block) as LLVM has to assume they might have been initialized correctly.

RalfJ (Feb 12 2019 at 20:35, on Zulip):

yeah I think @gnzlbg said that in the context of the proposal to freeze

rkruppe (Feb 12 2019 at 20:38, on Zulip):

OK. Speaking of which: I'd like to actually get some experience with shipping ptr::freeze based code before comitting. Replacing mem::uninitialized with MaybeUninit currently has regressions and I would hate to bring similar regressions to existing unmodified code if we can avoid it

RalfJ (Feb 12 2019 at 21:06, on Zulip):

Replacing mem::uninitialized with MaybeUninit currently has regressions

It does? I thought we fixed them?

rkruppe (Feb 12 2019 at 21:07, on Zulip):

I was thinking of https://github.com/rust-lang/rust/issues/58201 but I misremembered, that happens with mem::uninitialized too

Jake Goulding (Feb 13 2019 at 17:16, on Zulip):

Where might I read more about freeze? I see this PR: https://github.com/rust-lang/rust/pull/58363

RalfJ (Feb 13 2019 at 17:23, on Zulip):

what kind of documentation are you looking for? there's https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf, but I suppose you were not asking for a paper?

Jake Goulding (Feb 13 2019 at 17:39, on Zulip):

Well, I was rather hoping for something more Rust-specific

Jake Goulding (Feb 13 2019 at 17:40, on Zulip):

insert my typical moaning about Rust having more features than documentation

RalfJ (Feb 13 2019 at 18:22, on Zulip):

I'm afraid I don't know of anything

RalfJ (Feb 13 2019 at 18:23, on Zulip):

ideally the doccomment in that PR would help?

RalfJ (Feb 13 2019 at 18:23, on Zulip):

is there a concrete question you have? that might help expanding the docs

simulacrum (Feb 13 2019 at 19:27, on Zulip):

@Jake Goulding My understanding on a very broad level is that freeze makes uninitalized memory readable; normally, it's UB to read it, whereas freeze makes those bytes 'frozen' in that they're no longer going to be side

Jake Goulding (Feb 13 2019 at 19:43, on Zulip):

no longer going to be side

@simulacrum typo?

Jake Goulding (Feb 13 2019 at 19:47, on Zulip):

@RalfJ no real concrete questions. The way my brain works is to have a big catalog of "these are the available tools" and then mapping them to usecases (this is why I like SO — it exercises those connections). This often involves having a broad but shallow understanding of many things. I see an interesting thing like "freeze" (which unfortunately makes me think of Ruby's freeze) and I want to know how to file it away in the brain.

RalfJ (Feb 13 2019 at 19:49, on Zulip):

@simulacrum correction: it is UB to perform any operation on it (comparison, arithmetic, whatever), but a plain read (at least into a MaybeUninit) is not UB

RalfJ (Feb 13 2019 at 19:50, on Zulip):

it's not like Ruby's freeze at all, that much I can say^^

simulacrum (Feb 13 2019 at 22:06, on Zulip):

no idea what I meant by "side" there

Ah, interesting -- I was not aware that plain read would be permitted...

Jake Goulding (Feb 14 2019 at 00:32, on Zulip):

@RalfJ how can a comparison be UB when a read isn't? What is the important difference?

RalfJ (Feb 14 2019 at 08:08, on Zulip):

@Jake Goulding anything that actually inspects the bits is UB, basically

RalfJ (Feb 14 2019 at 08:09, on Zulip):

reading does not inspect the bits

RalfJ (Feb 14 2019 at 08:10, on Zulip):

(LLVM is slightly more permissive: there, comparing uninit with something isn't UB, but it returns uninit. So anything that uninit even indirectly flows into gets "poisened" by uninit. If you then ever do a conditional branch on uninit, that's UB. This is a good choice for an optimizing IR, but for a surface language like Rust I feel it is simpler to make UB happen earlier.)

Jake Goulding (Feb 14 2019 at 15:00, on Zulip):

reading does not inspect the bits

I do not follow this sentence. Are we using non-English definitions for "read"?

Jake Goulding (Feb 14 2019 at 15:00, on Zulip):

How could reading data do anything but look at the bits... that it... reads?

rkruppe (Feb 14 2019 at 15:01, on Zulip):

"read" in the sense of "read memory" or "load from memory", so it copies them but does not care what the bits are

RalfJ (Feb 14 2019 at 15:03, on Zulip):

@Jake Goulding hm yeah words are hard sometimes around this. basically, imagine every byte in memory is of the following type

enum Byte { Uninit, Init(u8) }
RalfJ (Feb 14 2019 at 15:03, on Zulip):

then a "read" operation will just copy data at type Byte, no problem

RalfJ (Feb 14 2019 at 15:04, on Zulip):

but any operation that needs to actually do a match is UB if the byte is currently Uninit

RalfJ (Feb 14 2019 at 15:04, on Zulip):

does that make sense?

Jake Goulding (Feb 14 2019 at 15:05, on Zulip):

Are these terms set in stone already? If we could magically come up with something else, could those be used?

Jake Goulding (Feb 14 2019 at 15:10, on Zulip):

Is it accurate to say "if the CPU uses the data to make a decision, it's UB"?

Jake Goulding (Feb 14 2019 at 15:10, on Zulip):

I guess that's pretty close to "if you then ever do a conditional branch on ..."

rkruppe (Feb 14 2019 at 15:11, on Zulip):

The CPU doesn't really enter the picture, this is all formal language semantics. The CPU doesn't have a concept of uninitalized memory to begin with.

rkruppe (Feb 14 2019 at 15:12, on Zulip):

And in the formal semantics, it's most likely that even arithmetic and bitwise operations and the like are UB on uninit data.

RalfJ (Feb 14 2019 at 15:15, on Zulip):

Are these terms set in stone already? If we could magically come up with something else, could those be used?

do you mean the words we are using or the actual rules?

RalfJ (Feb 14 2019 at 15:16, on Zulip):

LLVM has a set of rules that we're unlikely to change (they call it poison, I call it Uninit, it's the same thing)

RalfJ (Feb 14 2019 at 15:16, on Zulip):

and the only wiggle-room (EDIT: for Rust) I know is whether Uninit + x is UB, or returns Uninit.

RalfJ (Feb 14 2019 at 15:16, on Zulip):

And in the formal semantics, it's most likely that even arithmetic and bitwise operations and the like are UB on uninit data.

yeah, that's certainly how LLVM handles it

RalfJ (Feb 14 2019 at 15:17, on Zulip):

that is required to enable optimizations that turn logical into arithmetic operations, I think

rkruppe (Feb 14 2019 at 15:18, on Zulip):

I'm really sure poison + 1 is poison, not UB. select may be different but that's basically a kind of branch.

RalfJ (Feb 14 2019 at 15:18, on Zulip):

yeah LLVM delays UB as much as possible, sorry if what I said came across differently

RalfJ (Feb 14 2019 at 15:18, on Zulip):

they need that to be able to reorder stuff

RalfJ (Feb 14 2019 at 15:19, on Zulip):

and hoist it out of loops, in particular

RalfJ (Feb 14 2019 at 15:19, on Zulip):

but for a surface language like Rust, it might be easier to just say UB earlier

rkruppe (Feb 14 2019 at 15:20, on Zulip):

Yes, I agree with that. What you said earlier seems to directly contradict that.

RalfJ (Feb 14 2019 at 15:24, on Zulip):

then it was probably a statement about Rust that I made in a way that it sounds like I am talking about LLVM?

rkruppe (Feb 14 2019 at 15:30, on Zulip):

And in the formal semantics, it's most likely that even arithmetic and bitwise operations and the like are UB on uninit data.

yeah, that's certainly how LLVM handles it

^ this is what threw me off, and there you mention LLVM explicitly ^^;

Jake Goulding (Feb 14 2019 at 15:31, on Zulip):

@rkruppe s/CPU/the program/ then?

rkruppe (Feb 14 2019 at 15:32, on Zulip):

Yeah that solves that objection. But also note the subtleties under "make a decision"; it may not be just branching and the like but even just operating on it with primitive operations such as +.

Jake Goulding (Feb 14 2019 at 15:41, on Zulip):

"used for computation"?

rkruppe (Feb 14 2019 at 15:41, on Zulip):

i guess?

RalfJ (Feb 14 2019 at 15:51, on Zulip):

^ this is what threw me off, and there you mention LLVM explicitly ^^;

hm, no idea what I was thinking. I probably misread. sorry.

RalfJ (Feb 14 2019 at 15:53, on Zulip):

"used for computation"?

I usually say "anything but a simple copy". shrug

Last update: Nov 19 2019 at 17:35UTC