Stream: t-compiler/major changes

Topic: Add placement by return (GCE, unsized retu compiler-team#330


triagebot (Jul 05 2020 at 19:24, on Zulip):

A new proposal has been announced: #330. It will be
announced at the next meeting to try and draw attention to it,
but usually MCPs are not discussed during triage meetings. If
you think this would benefit from discussion amongst the
team, consider proposing a design meeting.

Jonas Schievink (Jul 05 2020 at 19:34, on Zulip):

That looks like a language MCP in the wrong repo

Hanna Kruppe (Jul 05 2020 at 19:47, on Zulip):

It actually seems like two MCPs, one of which is a lang feature (support for unsized return values) and the other could be appropriate for T-compiler in principle but quite under-specified : placement of sized types is claimed to be one of the consequences, but I don't see anything addressing the core implementation issue of how we can "guarantee" to avoid large temporaries in MIR/LLVM IR/machine code.

Gary Guo (Jul 05 2020 at 20:26, on Zulip):

I believe we'd better implement copy elision for sized places before we even experiment unsized return types. Copy elision is minor while the proposed solution for unsized return types seems a major change.

Hanna Kruppe (Jul 05 2020 at 20:40, on Zulip):

I would not call copy elision "minor"; properly implementing it (and understanding the scope and limitations) seems like a full compiler MCP all on its own. Which is why I'm honestly a bit disappointed by this MCP, it glosses over that (important) topic.

Gary Guo (Jul 05 2020 at 20:54, on Zulip):

It is certainly not minor for the compiler implementation but it is waaaay less major than the unsized return part!

Christopher Durham (Jul 05 2020 at 21:28, on Zulip):

The linked RFC PR contains a lot more information on the proposed mechanism, which I suspect is (part of) why the MCP is so light on detail.

Hanna Kruppe (Jul 05 2020 at 22:31, on Zulip):

I've only skimmed it now (I read it more thoroughly in the past) but I believe: no, the kind of details I would expect from a compiler MCP working towards implementing placement are entirely absent. There is a great deal of detail about the proposed language feature (which ought to be a lang team MCP as noted earlier), but the challenges of generating code that adheres to the "guaranteed copy elision" wish list are not even mentioned, let alone addressed.

Christopher Durham (Jul 06 2020 at 03:15, on Zulip):

(no real argument here. Then again, I'm not exactly certain how much detail is expected from the MCP and how much is left to be discussed here; I've mostly stuck to the libs part of things.)

Olivier FAURE (Jul 06 2020 at 07:37, on Zulip):

Regarding the proposal being in the wrong repo, I followed the process outlined in the MCP document, which was explicitly advertised as the new entry point of the RFC "pipeline":

If you would like to make a major change to the compiler, the process is as follows:

If this process isn't meant to be use of language change proposal, that needs to be explicitly documented in the MCP process document. (and communicated internally, because as it is a lang team member explicitly directed me towards the MCP process)

kennytm (Jul 06 2020 at 07:41, on Zulip):

the lang team has its own MCP process https://github.com/rust-lang/rfcs/pull/2936

kennytm (Jul 06 2020 at 07:42, on Zulip):

so we are possibly a victim of https://github.com/rust-lang/rfcs/pull/2936#issuecomment-636440748 :upside_down:

Olivier FAURE (Jul 06 2020 at 07:43, on Zulip):

Right. I'm realizing now I was looking at merged RFCs, whereas the lang-team MCP process is still in a PR RFC. My bad.

(though the compiler-team MCP should probably include a "by the way, the lang-team MCP is over there, this is different" disclaimer)

Olivier FAURE (Jul 06 2020 at 07:46, on Zulip):

Ugh. I guess the simplest solution would be to close this issue, at least temporarily, and open another issue in the lang-team repo. Like people said, there's stuff in here that can be done purely implementation-side, but it's probably better to hash it out on the lang side first.

Olivier FAURE (Jul 06 2020 at 07:49, on Zulip):

Hanna Kruppe [said](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20placement.20by.20return.20(GCE.2C.20unsized.20retu.20compiler-team.23330/near/202935340):

placement of sized types is claimed to be one of the consequences, but I don't see anything addressing the core implementation issue of how we can "guarantee" to avoid large temporaries in MIR/LLVM IR/machine code.

Understood. Full disclosure: I'm completely unaware of these issues. Where do you suggest I start to familiarize myself with them? Is there a body of documentation, or a place in the code that's particularly relevant?

Hanna Kruppe (Jul 06 2020 at 10:07, on Zulip):

I don't have a good reference, these things are unfortunately spread over many discussions in many different places. Briefly, off the top of my head: right now, the language semantics of Rust and the way those semantics are represented in MIR (and beyond) evaluate essentially all expressions into a temporary first, before moving the result into the eventual destination. The naive way to get placement without temporaries would be to change that and say something like: every expression evaluation targets a particular destination place, and we set up and maintain those destinations such that x = ...; or return ... never need a temporary. But that doesn't work in general, because the expression evaluation can require reading the same place that ought to be the destination (e.g., consider pair = (pair.1, pair.0);). These situations are particularly tricky to identify when pointers are involved (e.g. consider let p = &pair; pair = (p.1, p.0);). This suggests the most practical way to eliminate temporaries is an optimization pass on MIR, where the prerequisite analyses are much more tractable than at the source level. But such an optimization will still need to ensure the destination place is not read during the RHS evaluation, which requires powerful alias analysis and, in particular, a settled aliasing semantics for unsafe code and raw pointers (like Stacked Borrows). But once we have that, it's still an open question how reliable and efficient such an optimization can be made, and whether it will correspond to any sensible source-level rules that we could tell users to follow if they want "guaranteed" move elision.

Hanna Kruppe (Jul 06 2020 at 10:12, on Zulip):

Note: I used assignments to local variables as examples for simplicity, but essentially the same problems apply to evaluating expressions into a function's return place (which is essentially a local, even if return ... looks different at the source level) and to evaluating into a place given by a raw pointer (which is what's needed for Box/Vec/etc. placement). For example, consider x = foo(&x as *const _) where foo reads from its pointer argument while constructing the return value, or likewise *ptr = foo(ptr);.

Gary Guo (Jul 06 2020 at 16:31, on Zulip):

Hanna Kruppe [said](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20placement.20by.20return.20(GCE.2C.20unsized.20retu.20compiler-team.23330/near/202971867):

But that doesn't work in general, because the expression evaluation can require reading the same place that ought to be the destination (e.g., consider pair = (pair.1, pair.0);).

I actually thought about this before. We don't need to worry about this if the destination place is uninitialised, which is true for let x = ... or return ..., so the overall idea will work.

rpjohnst (Jul 06 2020 at 16:31, on Zulip):

Probably relevant that C++'s guaranteed copy/move elision does not handle those cases either. What it does handle is a) returning a prvalue and b) initializing an object from a prvalue (excluding "potentially overlapping subobjects"). This excludes x = <anything that might read from x> for being an assignment, not an initialization.

rpjohnst (Jul 06 2020 at 16:33, on Zulip):

And notably, even if you have SomeLargeType foo(...) which properly does return some_rvalue, you can still get copies/moves if the call site for foo doesn't also follow the rules to enable the elision. So auto x = foo(...) will construct SomeLargeType directly into x, but existing_var = foo(...) will not, regardless of whether foo might read from existing_var.

Gary Guo (Jul 06 2020 at 16:41, on Zulip):

One caveat might be if a variable can be ininitialised or not depending on control path though. E.g.

let x= ...;
if ... {
  x = ...;
}
drop(x);
x = large struct; // Do we need to do analysis so we know x is uninitialised now, or we just say this isn't guaranteed?
Gary Guo (Jul 06 2020 at 16:41, on Zulip):

IMO we probably only need to guarantee about variables that are never initialised, so we don't guarantee the elision in that case

Jonas Schievink (Jul 06 2020 at 16:43, on Zulip):

Just to make sure: You're aware of the current work regarding NRVO and destination propagation, right?

Gary Guo (Jul 06 2020 at 16:48, on Zulip):

I am aware of those optimisations but haven't go through PRs in great detail. I might need to take a deeper look. Will destination propagation be "always on" even in debug builds though?

Jonas Schievink (Jul 06 2020 at 16:50, on Zulip):

Depends on whether people manage to optimize it further once it lands. In the current state it won't run at all unless you specifically request it with -Zmir-opt-level=2.

Gary Guo (Jul 06 2020 at 16:51, on Zulip):

Constructing directly into place would require none of the optimisations to be turned on for debug builds, so it still might be something to consider

Gary Guo (Jul 06 2020 at 16:53, on Zulip):

Destination propagation covers more than the guaranteed elision mentioned in this RFC, so it might not be ideal to force it to be turned on in opt-level=0

Hanna Kruppe (Jul 06 2020 at 17:38, on Zulip):

Gary Guo [said](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20placement.20by.20return.20(GCE.2C.20unsized.20retu.20compiler-team.23330/near/203011554):

I actually thought about this before. We don't need to worry about this if the destination place is uninitialised, which is true for let x = ... or return ..., so the overall idea will work.

Yes to let x = ...; (trivially). For returns, it's more subtle, because there's two places where temporaries might pop up: in the evaluation of the ... within the function, and in the evaluation of the function call in the caller. Both are equally problematic, and if we want to elide all the temporaries for something likemut_var = foo(...); in any circumstances, then "the destination place is uninitialize" isn't true in general. (But otherwise, we can leave the responsibility of ensuring it doesn't alias anything the function might read to the caller.)

Hanna Kruppe (Jul 06 2020 at 17:45, on Zulip):

However, even with let x = ...; there is a wrinkle for which we need tricky Unsafe-Code-Guidelines questions to be settled first: if you have something like loop { let mut x = foo(); escape(&raw mut x); }, is it really true that the storage space of x can't be accessed by foo()? For example, escape() might store the pointer somewhere and foo() retrieves that pointer to use x as scratch space. Intuitively it seems like we ought to rule this out, but what should the precise rules be to make such programs UB?

Hanna Kruppe (Jul 06 2020 at 17:48, on Zulip):

(See https://github.com/rust-lang/rust/issues/42371)

Josh Triplett (Jul 06 2020 at 18:04, on Zulip):

@Hanna Kruppe If foo is accessing a pointer to an object that's no longer live and past its lifetime (as it would have to for one iteration of that loop to alias another), that's UB by definition, right?

Hanna Kruppe (Jul 06 2020 at 18:16, on Zulip):

We don't really have a precise definition one way or another of what those terms mean, so I can't agree it's UB "by definition" :) In particular, the Alive/LLVM semantics described in https://github.com/rust-lang/rust/issues/42371#issuecomment-647103740 would make it potentially well-defined, depending on the precise placement of the StorageLive MIR statement relative to the call to foo(). While the currently implemented semantics in miri are stricter and (AFAICT) rule out the cross-iteration reuse, it makes some other optimizations (hoisting address calculations out of the loop body) illegal, so it's not completely obvious which variant we should ultimately adopt.

Gary Guo (Jul 06 2020 at 18:42, on Zulip):

Without considering optimisations, just from reading the program semantically, it ought to be a UB because the x in the next iteration is not the same x. Some optimisations might make it well defined when lowering to LLVM but it shouldn't be relied on and certainly should be a UB, regardless we exploit it or not.

Hanna Kruppe (Jul 06 2020 at 18:52, on Zulip):

I'm not saying "it's UB but some optimization might remove the UB down the line". My point is that we do not even have an agreed upon definition of some relevant aspects of Rust semantics, and there are plausible definitions that might make it well-defined. Specifically, it matters whether each iteration creates a new allocation or whether the same allocation is repeatedly de-initialized and re-initialized, and if it's the latter, it matters when the reinitialization happens within the loop body. There are several plausible answers and Rust has not yet committed to any one of them.

comex (Jul 06 2020 at 19:25, on Zulip):

it makes some other optimizations (hoisting address calculations out of the loop body) illegal
Not necessarily. The spec can say "each iteration is a new allocation" while the implementation refines it to "the iterations use the same allocation but accessing it while de-initialized is UB". I think the latter is effectively the semantics you get if you use llvm.lifetime.start/end.

comex (Jul 06 2020 at 19:31, on Zulip):

(Or the spec can say the latter, but both choices allow placement by return.)

Hanna Kruppe (Jul 06 2020 at 20:11, on Zulip):

Yes, after going to the more permissive semantics you can do the optimization that isn't compatible with the stricter semantics, as is often the case. But you can't do it before (including manually in surface Rust). Does that matter, and if so, does it matter more than making the whole placement affair simpler? I don't know, it's not obvious to me one way or another.

comex (Jul 06 2020 at 20:24, on Zulip):

Well, are there any reasons to go beyond "the iterations use the same allocation but accessing it while de-initialized is UB" to actually allowing it to be accessed?

comex (Jul 06 2020 at 20:26, on Zulip):

It's not just a question of placement; it also affects stack coloring (allowing different variables with disjoint lifetimes to reuse the same stack space) as well as non-guaranteed copy optimizations.

Hanna Kruppe (Jul 06 2020 at 20:35, on Zulip):

"the iterations use the same allocation but accessing it while de-initialized is UB" is the most permissive option of the two that were brought up here, unless I complete misunderstand what "accessing it while de-initialized" is supposed to mean (I believe it's the Alive2 semantics). While "each iteration gets a separate allocation" rules out any use of a pointer obtained in a past iteration, the more permissive semantics allows such accesses if they happen after the current iteration made that allocation live again. And if we want let x = ...; to evaluate the RHS directly into x, then x needs to be made live before (at least if StorageLive(x) makes x uninit, which I believe is the current consensus).

Hanna Kruppe (Jul 06 2020 at 20:40, on Zulip):

More precisely in terms of MIR: let mut p = &raw mut <something>; loop { let mut x = ptr::replace(p, 1); p = &raw mut x; } would have something like this in the loop body:

StorageLive(x)
_1 = call ptr::replace(_2, const 1)
_2 = &raw mut _1;
StorageDead(x)

Note how the access to *p is contained in the StorageLive-StorageDead pair, so you can't say it's UB on those grounds, you need to involve pointer provenance. Or perhaps stacked borrows suffices? I suppose the write to x would remove the permission for the previously created raw pointer to access the place. Hmmm.

Gary Guo (Jul 06 2020 at 20:41, on Zulip):

Just a note, "the iterations use the same allocation but accessing it while de-initialized is UB" would restrict optimisations that can be done after loop unrolling.

Hanna Kruppe (Jul 06 2020 at 20:41, on Zulip):

How so?

Gary Guo (Jul 06 2020 at 20:43, on Zulip):

Unrolled loops need to re-use the same allocation for x basically, which would limit some parallelization opportunities if we can do separate allocations

Gary Guo (Jul 06 2020 at 20:49, on Zulip):

Also, this code is disallowed by borrow checker, so it would be confusing it the semantic is allowed via raw pointer:

fn f() {
    let mut x = &1;
    loop {
        let y = 2;
        println!("{}", x);
        x = &y;
    }
}
Hanna Kruppe (Jul 06 2020 at 20:51, on Zulip):

Gary Guo [said](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20placement.20by.20return.20(GCE.2C.20unsized.20retu.20compiler-team.23330/near/203038493):

Unrolled loops need to re-use the same allocation for x basically, which would limit some parallelization opportunities if we can do separate allocations

I don't really follow. Or rather, I don't see how this is a limit you'll ever run into in practice. If you understand the accesses to the storage well enough to justify doing multiple in parallel or interleaved, then you should have enough knowledge to justify introducing (temporary) new storage where needed.

Hanna Kruppe (Jul 06 2020 at 20:52, on Zulip):

Gary Guo [said](https://rust-lang.zulipchat.com/#narrow/stream/233931-t-compiler.2Fmajor-changes/topic/Add.20placement.20by.20return.20(GCE.2C.20unsized.20retu.20compiler-team.23330/near/203039087):

Also, this code is disallowed by borrow checker, so it would be confusing it the semantic is allowed via raw pointer:

But raw pointers do allow much more than the borrow checker allows, in myriad different ways. That's the entire point of unsafe code!

Gary Guo (Jul 06 2020 at 20:56, on Zulip):

Whether raw pointers are used shouldn't affect the lifetime of a value though! Whether another value happens to be allocated at the exact same location shouldn't matter. Imagine that we have an allocator that always reuse the last freed allocation if sizes are equal, it still does not justify use-after-free.

Hanna Kruppe (Jul 06 2020 at 21:09, on Zulip):

The crux of the matter is whether this situation qualifies as "use after free", or rather, whether it ought to be UB (because the Rust project has not yet decided this and will have to make a decision). More precisely, if we want it to be UB, we will need a precise formal rule that explains why those offending programs -- and all others "like it" -- are UB. This rule set should also have other desirable traits (doesn't rule out things we'd like to allow, isn't an undue burden for programmers to adhere to, is efficiently checkable in miri, etc.), and there are myriad trade-offs involved. You can't just claim "this is like use after free, hence bad, hence we must make it UB somehow". For example, it is quite useful to be able to write to elements of a vector past the capacity, so we don't try to make that UB, even though it's very much like a "use after free" if you use the Vec as a kind of object pool. In the same vein, but far less clear-cut, there are some advantages to the "same allocation, just repeatedly overwritten with uninit" perspective on Storage{Live,Dead} in loops. We should weigh those advantages against the advantages of other choices for semantics.

comex (Jul 06 2020 at 21:46, on Zulip):

@Hanna Kruppe Oh, I thought you were proposing that the allocation be able to be used throughout the entire function. I think I understand the issue now.

comex (Jul 06 2020 at 21:46, on Zulip):

My mistake.

comex (Jul 06 2020 at 21:51, on Zulip):

This is a weird situation. I wish we could do return value optimization for reassigning an existing variable, but we can't in general because x = foo(&mut x); is allowed.

comex (Jul 06 2020 at 21:56, on Zulip):

So we can't have a general rule that foo = expr borrows foo throughout the entire statement. I was going to say, the question is whether let foo = expr borrows foo throughout the entire statement, from a Stacked Borrows perspective… but even that is weaker than ideal.

comex (Jul 06 2020 at 21:57, on Zulip):

Actually, no, that's strong enough, as long as it's borrowed starting at the point that it's StorageLive. And accessing it before then is UB.

comex (Jul 06 2020 at 22:03, on Zulip):

@RalfJ any advice?

Gary Guo (Jul 07 2020 at 01:05, on Zulip):

@Hanna Kruppe Something I really don't understand about the discussion, why would variables declared within a loop be treated differently? How about

loop {{ let mut x == ...; p = &raw mut x; }}

We've introduced a new scope, does your argument still apply?

Olivier FAURE (Jul 07 2020 at 08:54, on Zulip):

Hanna Kruppe

But that doesn't work in general, because the expression evaluation can require reading the same place that ought to be the destination (e.g., consider pair = (pair.1, pair.0);). These situations are particularly tricky to identify when pointers are involved (e.g. consider pair = (p.1, p.0); -- same problem as in the previous example iff p points to pair).

This is only tangentially relevant to the broader GCE/NRVO discussion, but I'll note that the "minimum GCE" described in the RFC doesn't need to handle that case; since it only requires GCE to apply for return chains, with unsafe code (eg the implementation of Box::new_with at the end) that has to respect invariants about scratch space not aliasing with args. Technically the proposal doesn't even require GCE to elide the final copy in the x = ... case; in that case the return chain could be collapsed to a single emplacement into a temporary, followed by a move, with no risk of aliasing.

Last update: May 07 2021 at 06:00UTC