Stream: t-compiler

Topic: element-wise aggregate initialization


oli (Oct 27 2018 at 17:32, on Zulip):

So I was thinking about doing the optimization described in https://github.com/rust-lang/rust/blob/master/src/librustc_mir/build/expr/as_rvalue.rs#L182
My current plan is to basically replace

_1 = ...;
_2 = ...;
_3 = Aggregate { x: _1, y: _2 };

with

_3.x = ...;
_3.y = ...;
SetDiscriminant(_3, 0);

@nikomatsakis you wrote the original FIXME comment and @eddyb suggested to ask you about potential pitfalls and things to look out for

nagisa (Oct 27 2018 at 17:34, on Zulip):

We at some point had an optimisation pass called dis/deaggregation, that was left not entirely finished by an intern 2 years ago?

nagisa (Oct 27 2018 at 17:35, on Zulip):

maybe 3 now.

rkruppe (Oct 27 2018 at 17:53, on Zulip):

IIRC eddyb tried to finish it up more recently but there were serious perf regressions and the PR was never pushed over the finish line

rkruppe (Oct 27 2018 at 17:54, on Zulip):

ah no, the PR I was thinking of was merged but didn't eliminates the aggregate rvalues entirely because of perf

rkruppe (Oct 27 2018 at 17:54, on Zulip):

https://github.com/rust-lang/rust/pull/48052

nagisa (Oct 27 2018 at 17:55, on Zulip):

I have a feeling that deagregatting everything will always have the perf impact one way or the other

nagisa (Oct 27 2018 at 17:55, on Zulip):

and the earlier it is done, the worse off the impact will end up being

nagisa (Oct 27 2018 at 17:56, on Zulip):

(because e.g. drop elaboration will have to track many more locals)

rkruppe (Oct 27 2018 at 18:01, on Zulip):

i'm not sure why there would be many more locals? more statements, yes, and that would slow down e.g. data flow analyses, but whether you compute a temporary and then assign it to a projection or whether you put the temporary into an aggregate rvalue to assign the whole struct shouldn't affect the number of locals

rkruppe (Oct 27 2018 at 18:02, on Zulip):

note that this is not quite the same as SROA, we don't need to actually split let t: (T, U); into let (t0, t1): (T, U);

nagisa (Oct 27 2018 at 18:03, on Zulip):

oh, right… I for some reason recalled disagregattor doing SROA

rkruppe (Oct 27 2018 at 18:06, on Zulip):

come to think of it, i'm not sure any more whether more statements would actually have any quadratic behavior. i guess it might depending on how we insert and remove statements in the middle of blocks? but most data flow is linear in the statements. which still doesn't make more statements free (not to mention memory usage) but outside of the "large constant vector that can be created with a single statement today" case it might not be so bad

oli (Oct 27 2018 at 18:08, on Zulip):

The number of statements would not change with this optimization

oli (Oct 27 2018 at 18:08, on Zulip):

there would simply be fewer locals

rkruppe (Oct 27 2018 at 18:09, on Zulip):

it would if the Aggregate contains constant operands, which currently aren't separate statements

rkruppe (Oct 27 2018 at 18:10, on Zulip):

e.g. let x = [1, 2, 3, 4, 5]; is ca. one statement, with deaggregation it's ca. five

oli (Oct 27 2018 at 18:10, on Zulip):

right

nagisa (Oct 27 2018 at 18:11, on Zulip):

@rkruppe at least the drop elaboration does not care about what initializes an aggregate. So if yoou have an aggregate that has a billion elements, it won’t have to bother and look at all billion statements that assign the aggregate field by field. IIRC, anyway.

oli (Oct 27 2018 at 18:11, on Zulip):

but these cases actually have another possibility for optimization: Just generate a single constant instead of ever producing a mir aggregate of constants

rkruppe (Oct 27 2018 at 18:11, on Zulip):

@Oli yes i was just typing out something to that effect :)

nagisa (Oct 27 2018 at 18:12, on Zulip):

It wold be pretty cool to have s/Rvalue::Aggregate/Rvalue::ByteInitializer/

oli (Oct 27 2018 at 18:12, on Zulip):

there are some problems when lifetimes are involved, but I think they are resolveable

rkruppe (Oct 27 2018 at 18:12, on Zulip):

though [1, 2, 3, 4, 5, ...., runtime_variable, ...] needs either trickery or just remains inefficient (it seems more niche)

rkruppe (Oct 27 2018 at 18:13, on Zulip):

the trickery being, for example, leaving some elements of the constant undef and overwriting them in a separate statement after the ByteInitializer

oli (Oct 27 2018 at 18:13, on Zulip):

right

oli (Oct 27 2018 at 20:56, on Zulip):

ok... I found a slightly unfun situation: x[i] = (x[j].0, 42) can't be converted to x[i].0 = x[j].0; x[i].1 = 42; because for index ops going through IndexMut that would evaluate index_mut twice

we also can't do let a = &mut x[i]; a.0 = x[j].0; a.1 = 42; because then borrowck throws a fit

oli (Oct 27 2018 at 21:03, on Zulip):

Ah, I'm in the wrong code anyway. It'll work itself out automatically if I get x = [(4, 5), (1, 2)]; to be x[0].0 = 4; x[0].1 = 5; x[1].0 = 1; x[1].1 = 2;

oli (Oct 27 2018 at 21:22, on Zulip):

So I think I know how to do this: partially evaluate aggregate initializers so that x[j].0 becomes _X if it's not just a projection, then partially evaluate the lhs of the original assignment so we get let a = &mut X if anything other than projections are involved. Then generate all the assignments.

This translates to: Evaluate the rhs to a tree of aggregate initializers of Operands. Evaluate the lhs to a Place. Visit the aggregate initializer tree recursively, building up the Place to the concrete value as you go in, and popping projections as you go out.

oli (Oct 28 2018 at 01:39, on Zulip):

The previous sentence is not right. It should be Rvalue not Operand

oli (Oct 30 2018 at 10:04, on Zulip):

Ok... I ran into a few problems with this. Most notably I'm now hitting the fact that the 2018 edition forbids let mut s; s.x = Val;

pnkfelix (Oct 30 2018 at 12:08, on Zulip):

is this a problem? I can point you at the lang team discussion of the matter

pnkfelix (Oct 30 2018 at 12:08, on Zulip):

oh I see, you're doing it as a MIR transformation that runs before rustc_mir::borrow_check?

pnkfelix (Oct 30 2018 at 12:09, on Zulip):

We can probably easily add a feature flag to let you write that code again. There's an RFC being drafted to describe the scenarios where we would start allowing it again.

oli (Oct 30 2018 at 12:12, on Zulip):

I just found the RFC draft independently :D https://github.com/Centril/rfcs/pull/16

oli (Oct 30 2018 at 12:12, on Zulip):

Well... a feature flag does make sense. I should probably hide my changes behind one for sanity's sake

oli (Oct 30 2018 at 12:13, on Zulip):

I'm not even doing a MIR transformation, but changing the way MIR is generated

oli (Oct 30 2018 at 12:13, on Zulip):

deaggregation as a MIR pass is too expensive as proven by our existing deaggregation MIR pass

nikomatsakis (Oct 30 2018 at 14:55, on Zulip):

@Oli I do not think you should change how MIR is generated, but rather I would expect to transform this after the fact as an optimization.

nikomatsakis (Oct 30 2018 at 14:55, on Zulip):

Otherwise, we're going to need some way to signal that the aggregate is "complete", no? That was the reason we kept it as is until now -- i.e., when do we know that the destructor should run.

oli (Oct 30 2018 at 14:55, on Zulip):

@nikomatsakis eddyb tried that and we do have a disaggregator

nikomatsakis (Oct 30 2018 at 14:56, on Zulip):

What is the motivation here?

oli (Oct 30 2018 at 14:56, on Zulip):

but it's very inefficient to do so after the fact

oli (Oct 30 2018 at 14:56, on Zulip):

my motivation is const eval performance, so disaggregator would be fine

nikomatsakis (Oct 30 2018 at 14:56, on Zulip):

It seems like we're accumulating a few places where we'd like to update MIR

oli (Oct 30 2018 at 14:56, on Zulip):

but now my motivation has become getting rid of Rvalue::Aggregate

nikomatsakis (Oct 30 2018 at 14:56, on Zulip):

basically where there is tension between the needs of the borrow check (which wants something closer to surface level syntax) and optimizations (which maybe doesn't)

nikomatsakis (Oct 30 2018 at 14:57, on Zulip):

I feel pretty wary of making this change at this time

nikomatsakis (Oct 30 2018 at 14:57, on Zulip):

particularly without some sort of write-up

oli (Oct 30 2018 at 14:57, on Zulip):

Otherwise, we're going to need some way to signal that the aggregate is "complete", no? That was the reason we kept it as is until now -- i.e., when do we know that the destructor should run.

I am currently using SetDiscriminant for that, which is what the disaggregator MIR pass also does

nikomatsakis (Oct 30 2018 at 14:57, on Zulip):

i.e., we are in the midst of trying to ship a new borrow checker based on the MIR :)

oli (Oct 30 2018 at 14:57, on Zulip):

I'm fine with delaying this, but I also do have an impl behind a feature gate ;)

nikomatsakis (Oct 30 2018 at 14:58, on Zulip):

well, one that doesn't work :)

nikomatsakis (Oct 30 2018 at 14:58, on Zulip):

i.e., it breaks in Rust 2018

oli (Oct 30 2018 at 14:58, on Zulip):

everywhere where mir borrowck bug!ed out, I assert that the feature gate is active and then do a different change

nikomatsakis (Oct 30 2018 at 14:58, on Zulip):

because it's not been integrated properly with the mir borrowck

oli (Oct 30 2018 at 14:58, on Zulip):

oh...

oli (Oct 30 2018 at 14:58, on Zulip):

that explains why my tests pass :D

nikomatsakis (Oct 30 2018 at 14:58, on Zulip):

(at least if I understood you properly)

oli (Oct 30 2018 at 14:59, on Zulip):

you did, and you're right

nikomatsakis (Oct 30 2018 at 14:59, on Zulip):

(that said, I'm not opposed to the overall plan)

oli (Oct 30 2018 at 14:59, on Zulip):

2018 edition fails hard on this

oli (Oct 30 2018 at 14:59, on Zulip):

are you fine with me doing this behind a feature gate?

nikomatsakis (Oct 30 2018 at 14:59, on Zulip):

hmm

nikomatsakis (Oct 30 2018 at 14:59, on Zulip):

no, I am :)

nikomatsakis (Oct 30 2018 at 14:59, on Zulip):

but I am not opposed to you doing it "properly" :P

nikomatsakis (Oct 30 2018 at 14:59, on Zulip):

tbh I'm not sure that I'm opposed to anything

oli (Oct 30 2018 at 14:59, on Zulip):

huh?

oli (Oct 30 2018 at 15:00, on Zulip):

"properly"?

nikomatsakis (Oct 30 2018 at 15:00, on Zulip):

I don't understand why a feature gate makes sense

nikomatsakis (Oct 30 2018 at 15:00, on Zulip):

this is not something end-users of Rust should be able to observe

nikomatsakis (Oct 30 2018 at 15:00, on Zulip):

it sounds like you want a feature gate because you want to land a PR that doesn't fully work, I guess? That might be ok

nikomatsakis (Oct 30 2018 at 15:00, on Zulip):

but I'd prefer to do that with a broader plan in place for how to finish the work

oli (Oct 30 2018 at 15:01, on Zulip):

the issue is that we need to allow partially initializing structs, which is forbidden in 2018

nikomatsakis (Oct 30 2018 at 15:01, on Zulip):

(it seems like a "revised MIR" might be a good Rust All Hands discussion topic)

nikomatsakis (Oct 30 2018 at 15:01, on Zulip):

yes, though the intention was to eventually support it

oli (Oct 30 2018 at 15:01, on Zulip):

jup, so I'd bind these two things together

nikomatsakis (Oct 30 2018 at 15:01, on Zulip):

I see, and the feature gate would be for that

nikomatsakis (Oct 30 2018 at 15:02, on Zulip):

I believe @centril is working on an RFC on that topic

nikomatsakis (Oct 30 2018 at 15:02, on Zulip):

that said, I think the two things are sort of independent

oli (Oct 30 2018 at 15:02, on Zulip):

they are, (linked above)

nikomatsakis (Oct 30 2018 at 15:02, on Zulip):

or should be

oli (Oct 30 2018 at 15:02, on Zulip):

they are, but without the latter I can't do the former

oli (Oct 30 2018 at 15:02, on Zulip):

I'd need a lot of weird workarounds

nikomatsakis (Oct 30 2018 at 15:02, on Zulip):

how do you plan to deal with things like let x = Foo { .. } where Foo has a Drop and hence is not permitted to be partially initialized?

oli (Oct 30 2018 at 15:03, on Zulip):

hmm... good point...

oli (Oct 30 2018 at 15:03, on Zulip):

I skipped drop types so far

oli (Oct 30 2018 at 15:03, on Zulip):

ok, so these two things are independent

oli (Oct 30 2018 at 15:04, on Zulip):

basically if I make drop type element wise intialization work correctly and separately from user element wise initialization, everything should be fine

nikomatsakis (Oct 30 2018 at 15:05, on Zulip):

(one step back: you mentioned that the "deaggregator" was "good enough" to solve some const eval perf problems?)

oli (Oct 30 2018 at 15:06, on Zulip):

This necessitates a form of "initialization" similar to mem::uninitialized, but only available to MIR, and only if the expression it's coming from an adt/tuple/array init-expression

oli (Oct 30 2018 at 15:06, on Zulip):

right

oli (Oct 30 2018 at 15:07, on Zulip):

const evaluating large arrays is problematic, because we get a local for each element if the element type is complex, and then copy all those locals in one expression into the destination array local

nikomatsakis (Oct 30 2018 at 15:07, on Zulip):

yes; same problem of course arises in our codegen

oli (Oct 30 2018 at 15:07, on Zulip):

if we just wrote the elements into the array each by itself we'd be saving ourselves quite a few assignments and conversions

oli (Oct 30 2018 at 15:08, on Zulip):

(const eval conversions between immediates like fat pointers or ints and their memory representations is expensive)

oli (Oct 30 2018 at 15:09, on Zulip):

current state of my experiments: https://gist.github.com/oli-obk/41cc65f0808b63d714ef2d74578f9f6c

nikomatsakis (Oct 30 2018 at 15:15, on Zulip):

OK -- I have to run -- I think the TL;DR is that I definitely want to see improvements here, I just want to have a plan for how it should work in all parts of the compiler. In my ideal world, I think, we'd document this by a PR to rustc-guide that describes the new design (and of course rustc-guide would have enough material that this makes sense). I'm not sure how much docs we have for MIR in the rustc-guide, not enough I imagine to really have it be a "sufficiently precise spec" for this purpose.

nikomatsakis (Oct 30 2018 at 15:16, on Zulip):

I think maybe it'd be nice if -- perhaps via SetDiscriminant? -- we always generated the same sort of MIR for Foo { .. }

nikomatsakis (Oct 30 2018 at 15:16, on Zulip):

and the borrow check knew the difference between Drop etc

nikomatsakis (Oct 30 2018 at 15:16, on Zulip):

but maybe that's not worth it, and we should keep the Aggregate but just not always use it

nikomatsakis (Oct 30 2018 at 15:16, on Zulip):

and then lower the remaining uses away

nikomatsakis (Oct 30 2018 at 15:16, on Zulip):

I can't really decide if that would actually be nice

oli (Oct 30 2018 at 15:21, on Zulip):

I'll write something up. I guess first I'll have to do a writeup for HAIR though. The crickets are very loud in https://rust-lang-nursery.github.io/rustc-guide/mir/construction.html

nikomatsakis (Oct 30 2018 at 15:22, on Zulip):

Yes, HAIR specifically has been on my list to try and start documenting

nikomatsakis (Oct 30 2018 at 15:22, on Zulip):

I am thinking more and more about this idea of a rustc-guide WG

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

particularly if we are able to use the rustc-guide as a way to talk about design-level discussion? maybe that's a nutty idea

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

I am trying to pursue that path with the trait work: document the intended design,

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

implement,

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

and then adjust the docs

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

I won't lie though, it's hard

nikomatsakis (Oct 30 2018 at 15:23, on Zulip):

takes time :)

oli (Oct 30 2018 at 15:24, on Zulip):

I'll start documenting how mir building works, since it's fresh on my mind

nagisa (Oct 30 2018 at 16:07, on Zulip):

@Oli I do not believe disaggregation being run as a pass as the core cause of the slowness for the disaggregator pass. Rather I think it is all on us not having an efficient way to update MIR.

rkruppe (Oct 30 2018 at 16:10, on Zulip):

I do however think getting rid of Rvalue::Aggregate entirely could be a nice simplification of the IR, if it doesn't force significant extra complexity on borrowck/drop elaboration/etc.

oli (Oct 30 2018 at 16:10, on Zulip):

even so, the changes required for doing this during MIR generation are minimal

oli (Oct 30 2018 at 16:11, on Zulip):

we do need the changes to all the other passes anyway when/if we get element-wise initialization in user space

rkruppe (Oct 30 2018 at 16:11, on Zulip):

good point

oli (Oct 30 2018 at 16:12, on Zulip):

I am mainly concerned about [1, 2, 3] as an Rvalue::Aggregate of constants becoming 3 statements

oli (Oct 30 2018 at 16:12, on Zulip):

but I believe we can address this by generating the appropriate constant on the fly (if we consider that to be a problem at all)

nikomatsakis (Oct 30 2018 at 17:12, on Zulip):

@Oli I do not believe disaggregation being run as a pass as the core cause of the slowness for the disaggregator pass. Rather I think it is all on us not having an efficient way to update MIR.

I just added this to my list of all hands topics -- see the MIR 2.0 bullet point :)

Last update: Nov 16 2019 at 01:20UTC