Stream: t-compiler/major changes

Topic: Virtual locals scheme compiler-team#291


triagebot (May 08 2020 at 17:52, on Zulip):

A new proposal has been announced #291. It will be brought up at the next meeting.

Jonas Schievink (May 08 2020 at 18:05, on Zulip):

The "this HackMD" link links to the issue

Santiago Pastorino (May 08 2020 at 18:21, on Zulip):

fixed

triagebot (May 12 2020 at 07:01, on Zulip):

@T-compiler: Proposal #291 has been seconded, and will be approved in 10 days if no objections are raised.

nikomatsakis (May 12 2020 at 09:35, on Zulip):

I'm curious to get @RalfJ's opinion on this scheme. I can't remember how much they've talked about it.

nikomatsakis (May 12 2020 at 09:35, on Zulip):

It's sort of the first 'major' change to MIR since it's initial design

RalfJ (May 12 2020 at 19:58, on Zulip):

so where do I find the details of the proposal? the issue linked here seems to only scratch the surface?

RalfJ (May 12 2020 at 19:59, on Zulip):

in particular, what exactly wouldRvalue (Value?) and Place look like after this change?

RalfJ (May 12 2020 at 19:59, on Zulip):

oh its that hackmd... the text sounds like that would describe the grand plan and not just this one change

RalfJ (May 12 2020 at 20:02, on Zulip):

hm okay that hackmd unfortunately also doesnt show what exactly the enums would look like... but it contains more details so that's nice :D

RalfJ (May 12 2020 at 20:03, on Zulip):

this sounds almost like some kind "first-class places" to me, which seems nice :D though it only goes part of the way, I am still trying to figure out what exactly the limitations are

RalfJ (May 12 2020 at 20:05, on Zulip):

but I am not sure I agree with this:

miri: trivial, we just treat the special locals as real locals of pointer to actual type instead of the actual type. Each StatementKind::Project just behaves like StatementKind::Assign with an Rvalue::Ref.

so first of all, I presume this should be Rvalue::AddressOf, because we certainly dont want reborrowing. (so, no Retag emitted for these.)
but secondly... a place is not isomorphic to a pointer. maybe it should be, I am not sure, but in Miri places also track the expected alignment, unlike pointers where the expected alignment is given by the type. and indeed I think the proposed scheme would break field access to packed structs.

RalfJ (May 12 2020 at 20:06, on Zulip):

if this is truly "first-class places", then miri should properly support "place variables" by storing a Place (PlaceTy? but seems odd to keep the layout) inside them.

RalfJ (May 12 2020 at 20:10, on Zulip):

which brings me to my main question:

Thus, we propose to add virtual locals that don’t actually have any own memory, but which are temporary variables for storing places.

so can I imagine this like, there are now two kinds of locals, I'll call them "value locals" (that we already have, they store values) and the new "place locals" which store places? Can the same variable be used in both StatementKind::Project and StatementKind::Assign? (I would think not.)
Also Project is not a great name IMO, the projection is happening on the RHS, this is really just a normal assignment of a place to a place local... as opposed to a value local which gets assigned a value (hence the RHS of an Assign is Value, but the RHS of... AssignPlace?... is Place I presume, but the hachmd doesnt spell out the types)

nikomatsakis (May 12 2020 at 21:17, on Zulip):

Good questions. I agree we should extend the proposal with more details. Part of the reason I know this is true is that I don't know which answers I think are best just yet. :)

RalfJ (May 13 2020 at 08:42, on Zulip):

is there some place I could write something more coherent than this stream-of-thoughts? the issue you linked says it is not for discussion... so what is for discussion?^^

RalfJ (May 13 2020 at 08:43, on Zulip):

also FWIW I was rather confused by the first half of the linked issue as it seems to have nothing to do with the actual proposal, it's all meta stuff. I made a best guess at where the meta stuff ends without wanting to read it all in detail, I hope I got that right...

RalfJ (May 13 2020 at 08:43, on Zulip):

from the perspective of someone asked for feedback on the proposal, more than 2/3 of that post is noise that has nothing to do with the actual proposal...

RalfJ (May 13 2020 at 08:45, on Zulip):

This is work towards the eventual goal of removing entirely PlaceElem::Index. You can see the full proposal in this HackMD.

now I still dont know if the MCP is for the "eventual goal" or the "full proposal" or something else entirely.

RalfJ (May 13 2020 at 08:47, on Zulip):

(I understand this MCP stuff is new so consider this feedback on the parts of the process that maybe could use some polishing ;) )

Hanna Kruppe (May 13 2020 at 09:28, on Zulip):

On a different note, I have some concerns about the potential impact of this change on memory usage and compile time, both in general and in particular for codegen (both LLVM and Cranelift).

IIUC, this change would increase the number of statements in most MIR bodies significantly. In return it would shrink Place, but this won't shrink Statement (since StatementKind currently boxes all places), and the PlaceElems won't go away -- on the contrary, they are no longer interned, and each of them turns into a full Statement that has additional overhead. So I expect a significant increase in memory consumption. More statements also mean more work for essentially all MIR passes, even those who just ignore those statements.

And for codegen, the HackMD outlines two options for how places might get handled after this change:

None of this is necessarily a blocking concern, but IMO it should be considered carefully before moving forward. For example, I could imagine that it might be a good trade-off to keep the List<PlaceElem> representation for "static" projections and just split out the problematic kinds (deref, variable index).

bjorn3 (May 13 2020 at 09:30, on Zulip):

Yeah, especially for unsized places this would be a regression in cg_clif, as it always stores fat pointers on the stack, rather than in two registers.

ecstatic-morse (May 13 2020 at 18:06, on Zulip):

@Santiago Pastorino Could you say a bit more about what this change is designed to achieve?

I don't think that #71265 is enough to justify what would be a fairly major change to the MIR. ProjectionElem::Index is pretty low on the list of potential footguns in the MIR, since it is handled automatically for anyone using the Visitor API. Personally, I'm more concerned about Deref projections, since unlike all other projections they refer to memory outside the base local of the Place. It seems quite easy for the various MIR transformations to overlook this, and there's no simple solution like "just use Visitor::visit_local".

That said, I think we should avoid making such a major change unless there is something concrete to be gained, like enabling new optimizations or fixing soundness holes. The churn is not worth it otherwise.

Santiago Pastorino (May 13 2020 at 18:13, on Zulip):

@ecstatic-morse I totally understand the concerns stated here but I think @oli, @nikomatsakis and probably @eddyb have way better understanding than myself to properly justify this

nikomatsakis (May 13 2020 at 19:08, on Zulip):

@ecstatic-morse this scheme can handle both indices and deref projections, that's the main appeal

nikomatsakis (May 13 2020 at 19:08, on Zulip):

basically it allows us to build up arbitrarily complex places

nikomatsakis (May 13 2020 at 19:08, on Zulip):

with only "gep-like" places permitted within a single statement

nikomatsakis (May 13 2020 at 19:08, on Zulip):

the entire reason to do this is to make it easier to write optimizations

nikomatsakis (May 13 2020 at 19:08, on Zulip):

it makes the borrow checker harder :)

nikomatsakis (May 13 2020 at 19:08, on Zulip):

I think there is a valid alternative btw

nikomatsakis (May 13 2020 at 19:09, on Zulip):

in which we keep places just how they are today but we "lower" them to remove derefs and stuff and introduce temporaries

ecstatic-morse (May 13 2020 at 19:10, on Zulip):

That last suggestion seems much more manageable.

nikomatsakis (May 13 2020 at 19:10, on Zulip):

i.e., if we had

let x = foo((*tmp[n]).f.g)

it might get converted to

let tmp = &raw *tmp[n]
let x = foo((*tmp).f.g)

or whatever, idk

nikomatsakis (May 13 2020 at 19:11, on Zulip):

this further digs in on the idea of two phases in the MIR

nikomatsakis (May 13 2020 at 19:11, on Zulip):

the "analyzable" phase and the "optimization" phase

nikomatsakis (May 13 2020 at 19:11, on Zulip):

I was concerned about possible interactions with stacked borrows

nikomatsakis (May 13 2020 at 19:11, on Zulip):

i.e., creating raw pointers has a certain amount of "side effects"

nikomatsakis (May 13 2020 at 19:12, on Zulip):

makes things legal that wouldn't have been legal if something had never had a raw pointer created

nikomatsakis (May 13 2020 at 19:12, on Zulip):

so maybe the answer is that we lower not to &raw but to safe borrows

nikomatsakis (May 13 2020 at 19:12, on Zulip):

which I think for stacked borrows is fine

nikomatsakis (May 13 2020 at 19:12, on Zulip):

but which creates problems for borrow check

ecstatic-morse (May 13 2020 at 19:18, on Zulip):

Okay, I'm less worried about churn now. Retrofitting the borrow checker seemed pretty daunting to me. I guess I still share some concerns with Hanna Kruppe and bjorn3 about the additional intermediate values making optimizations/codegen run slower.

ecstatic-morse (May 13 2020 at 19:30, on Zulip):

Would introducing a SROA pass somewhere in the optimization pipeline solve some of the same problems as this proposal? That might be even more difficult, however.

ecstatic-morse (May 13 2020 at 19:31, on Zulip):

I can't speak to the concerns around stacked borrows unfortunately.

oli (May 14 2020 at 07:05, on Zulip):

nikomatsakis said:

but which creates problems for borrow check

I thought this transformation was supposed to happen after analyses, so after borrow checking. In that case this wouldn't be a problem, right?

Hanna Kruppe (May 14 2020 at 08:20, on Zulip):

That's not what the MCP in its current form proposes. It's a conceivable alternative, but not what that hackmd details.

oli (May 14 2020 at 09:33, on Zulip):

I was talking about the proposed altenrative by niko/ecstatic-morse, which is not to do the virtual locals thing but just "normalize" all projections to make deref projections only ever be the last projection in a chain, and never have multiple deref projections in a projection list

nikomatsakis (May 14 2020 at 14:08, on Zulip):

@oli if we apply the transformation as a lowering step (as @ecstatic-morse and I were discussing) then indeed borrow check is not impacted

nikomatsakis (May 14 2020 at 14:08, on Zulip):

that's kind of the point :)

nikomatsakis (May 14 2020 at 14:08, on Zulip):

we can either do this during MIR construction, in which case we have to account for borrow check, or as a later lowering step, in which case we don't

nikomatsakis (May 14 2020 at 14:08, on Zulip):

I think I lean towards the latter option but I think the concern was compilation times

oli (May 14 2020 at 14:17, on Zulip):

it's easy to benchmark such a transformation or even have it behind a flag at first, so I like that option

nikomatsakis (May 14 2020 at 14:37, on Zulip):

I think we should close this proposal

nikomatsakis (May 14 2020 at 14:37, on Zulip):

and write up a fresh one with the new approach

nikomatsakis (May 14 2020 at 14:39, on Zulip):

I can try to start that

oli (May 14 2020 at 14:40, on Zulip):

yes, that seems the best approach

nikomatsakis (May 14 2020 at 14:50, on Zulip):

Score one for MCP :)

nikomatsakis (May 14 2020 at 14:50, on Zulip):

(This seems to have been a productive design conversation)

Santiago Pastorino (May 14 2020 at 15:05, on Zulip):

:clap: :clap: :clap:

oli (May 14 2020 at 15:38, on Zulip):

agreed! I like this structured way. Instead of just bumping ideas between everyone with no clear picture, this kinda forces us to have the same mental picture

nikomatsakis (May 14 2020 at 17:41, on Zulip):

hackmd where I am writing up a brief MCP details: https://hackmd.io/ra2JEG69R1qAeWFEGK_T-w

nikomatsakis (May 14 2020 at 17:43, on Zulip):

Some complexities:

If you want to rewrite something like &<place>, it's easy enough to add intermediate borrows

nikomatsakis (May 14 2020 at 17:43, on Zulip):

But if you are doing a move, and that move is passing through a Box, what do you us for the intermediate pointer type?

nikomatsakis (May 14 2020 at 17:44, on Zulip):

Or should we simply recommend writing some_op(P) (where P is a complex place) to

let tmp = move P;
some_op(tmp)

i.e., we limit complex places to the simple case of a move place operation?

nikomatsakis (May 14 2020 at 17:44, on Zulip):

ps, @RalfJ, regarding the naming of Rvalue, it felt very unnatural to write Value (or "value expression") in the above sentence... operation felt much more natural :)

nikomatsakis (May 14 2020 at 17:45, on Zulip):

@oli, @ecstatic-morse thoughts?

nikomatsakis (May 14 2020 at 17:45, on Zulip):

Similarly, is it worth simplifying borrows? Or should we just say that the only place a "complex" place can appear is in borrows/moves?

nikomatsakis (May 14 2020 at 17:47, on Zulip):

(I suppose that we could even do that during construction ... in fact, we probably almost do...)

nikomatsakis (May 14 2020 at 17:47, on Zulip):

anyway, I'm going to stop typing now but feel free to edit the hackmd here

RalfJ (May 14 2020 at 17:53, on Zulip):

nikomatsakis said:

ps, RalfJ, regarding the naming of Rvalue, it felt very unnatural to write Value (or "value expression") in the above sentence... operation felt much more natural :)

"move place" expression" feels natural though, doesn't it?

ecstatic-morse (May 14 2020 at 17:53, on Zulip):

So one possible downside that should be discussed is that if you have a type like [BigStruct; 20] and want to get a field of a single struct in the array, the current scheme lets you do:

field = (arr[idx]).x

But if you separate out index projections, you have to do one of

tmp = arr[idx]  // Creates a large temporary on the stack
field = tmp.x

or

tptr = &arr[idx] // Bad for our simple form of alias analysis
fptr = &tptr.x
field = *fptr

How would we address this issue? Or are we not worried about it? Also cc @Hanna Kruppe and @bjorn3, who I'm sure are better able to explain these kinds of concerns than I am.

RalfJ (May 14 2020 at 17:53, on Zulip):

(FWIW I wasnt the one proposing "value" for this, precisely because it is getting awkward here. I got outvoted though... including by @nikomatsakis I believe :D )

RalfJ (May 14 2020 at 17:54, on Zulip):

a "post borrowck lowering" proposal has the problem that the invariants enforced by the lowering are not encoded in the types

nikomatsakis (May 14 2020 at 18:31, on Zulip):

RalfJ said:

(FWIW I wasnt the one proposing "value" for this, precisely because it is getting awkward here. I got outvoted though... including by nikomatsakis I believe :D )

wait what. I wanted "expression". oh I can't remember.

nikomatsakis (May 14 2020 at 18:31, on Zulip):

I thought eddyb wanted "op"?

nikomatsakis (May 14 2020 at 18:31, on Zulip):

anyway

nikomatsakis (May 14 2020 at 18:32, on Zulip):

RalfJ said:

a "post borrowck lowering" proposal has the problem that the invariants enforced by the lowering are not encoded in the types

say more?

nikomatsakis (May 14 2020 at 18:32, on Zulip):

oh, you mean the MIR types?

nikomatsakis (May 14 2020 at 18:32, on Zulip):

this is true, but I don't think it matters that much. that's just life

nikomatsakis (May 14 2020 at 18:32, on Zulip):

i.e., we also split critical edges and so forth and that's not encoded

nikomatsakis (May 14 2020 at 18:33, on Zulip):

but I DO think this is going further down the road of saying "there are two IRs packed into one here" -- or at least there is an important subset

nikomatsakis (May 14 2020 at 18:33, on Zulip):

I guess it's a drag that we have to assert for invalid enum variants, that is true

nikomatsakis (May 14 2020 at 18:36, on Zulip):

@ecstatic-morse clearly moving onto the stack is both incorrect and ineffecient.

I think there are two options for arr[idx].x, either you do

tmp = move arr[idx].x // complex places only allowed in this kind of op, but they can be arbitrarily complex

or you do some variant of

tmp1 = &arr[idx]
tmp2 = &tmp1.x
data = *tmp2

What did you mean by "bad for our simple form of alias analysis?" My main worry there is that moves out of references aren't allowed. Though maybe the & has semantic meaning for stacked borrows that may be problematic for later moves? (I think we decided it is ok for stacked borrows, but I'd want to check)

ecstatic-morse (May 14 2020 at 18:41, on Zulip):

Right, I suppose I was only considering Copy types.

In the generator transform for example. As soon as something be comes borrowed, we have to assume that it requires storage across yield points. If we lower to the second form, arr would now require storage.

ecstatic-morse (May 14 2020 at 18:46, on Zulip):

To be more concrete:

let arr = [BigStruct::new(); 42];
let _ = arr[10].field; // After lowering, creates a reference to `arr`
yield;
ecstatic-morse (May 14 2020 at 18:48, on Zulip):

I suppose this isn't a problem if we put the StorageDead for arr in the right place, above the yield point.

ecstatic-morse (May 14 2020 at 18:49, on Zulip):

(if we use arr below the yield point, we have to store it regardless of whether it is borrowed, which I didn't realize at first)

ecstatic-morse (May 14 2020 at 18:51, on Zulip):

In any case, I'm mildly skeptical of a change that relies on creating more short-lived references or pointers, although I think this could be mitigated with the addition of an InvalidateBorrows statement.

nikomatsakis (May 14 2020 at 18:54, on Zulip):

ecstatic-morse said:

In the generator transform for example. As soon as something be comes borrowed, we have to assume that it requires storage across yield points. If we lower to the second form, arr would now require storage.

I see. We would have to make that smarter, yes.

nikomatsakis (May 14 2020 at 18:54, on Zulip):

I mean basically the whole point of this change is to create intermediate pointers.

nikomatsakis (May 14 2020 at 18:54, on Zulip):

On the premise that today's places are too complex.

nikomatsakis (May 14 2020 at 18:54, on Zulip):

That said, the variant I listed of "limiting complex places to a single form of statement" has none of these problems

nikomatsakis (May 14 2020 at 18:54, on Zulip):

but maybe it doesn't actually solve any problems either

ecstatic-morse (May 14 2020 at 19:05, on Zulip):

That's true, but as you allude to, MIR optimizations will still have to handle complex places even if they are limited to a single statement kind.

ecstatic-morse (May 14 2020 at 19:09, on Zulip):

I feel like I'm being too pessimistic here. I certainly see the upsides, and would very much like to not think about derefs and variable array indices when writing MIR optimizations. I just don't have a good idea of whether the upsides outweigh the (potential) downsides enough to justify what I think would be a rather large refactoring, even if we don't have to touch the borrow checker. Maybe I'm just being irrationally averse to change though? I'd like to hear what others think about this.

nikomatsakis (May 14 2020 at 21:41, on Zulip):

I'm just as happy to leave it as is :P I guess we would need to discuss with @eddyb what the full benefits were that they wanted to get

Jonas Schievink (May 14 2020 at 21:50, on Zulip):

Now that the visitor has been fixed to take index projections into account, and most of the special cases were removed, this might be a too invasive and time-consuming change to be worth it in the end.

RalfJ (May 16 2020 at 08:59, on Zulip):

nikomatsakis said:

On the premise that today's places are too complex.

(place expressions :rofl: ... I'm only partially joking though, the thing place expressions evaluate to -- what I think when I hear "place" -- isn#t affected by this change)

RalfJ (May 16 2020 at 11:48, on Zulip):

on the other hand, doing such a change might have prevented what I think is a soundness bug... our visit_place method does not actually visit every place any more since some are just implicitly encoded as prefixes of the chain of projections!

RalfJ (May 16 2020 at 12:34, on Zulip):

looks like visit_projection_elem can be used to visit every sub-place, but it doesnt give you a Place so that's rather inconvenient and also it only exists for read-only visitors?
Also, ProjectionElem::Field is visited with visit_ty only on read-only visitors? That's odd.

RalfJ (May 16 2020 at 12:34, on Zulip):

are these process_* methods in the visitor meant to be overwritten or not?

RalfJ (May 16 2020 at 12:57, on Zulip):

oh the check is actually correct I think, it's just not necessary to have it in the loop

RalfJ (May 16 2020 at 12:58, on Zulip):

but actually visiting each "sub-place" seems hard now?

nikomatsakis (May 18 2020 at 15:29, on Zulip):

(Re: place expressions, indeed the distinction can be confusing, and the "expression" suffix is kind of long and clumsy, one of the reasons I wanted to use the term "path" to mean "place expression", so we could be precise...but let's not rehash it, though I'll note that also in your comments you mostly wrote "place" and not "place expression")

nikomatsakis (May 18 2020 at 15:29, on Zulip):

As for the visitor setup, I'm not sure how it ended up, but I do remember it was a bit difficult to figure out the right setup given interning

Last update: May 07 2021 at 07:00UTC