Stream: t-compiler/wg-incr-comp

Topic: cgu partitioning questions


view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:32):

Hey @Xavier Denis :wave:

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:32):

:wave:

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:32):

this is mostly a reiteration of the questions i was raising in https://rust-lang.zulipchat.com/#narrow/stream/189540-t-compiler.2Fwg-mir-opt/topic/const.20prop.20into.20operands and with @Wesley Wiser in DMs. But I'm curious how propagating constants into operands would cause partitioning changes that lead to 12% increase in build times

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:33):

Yeah, it's a really interesting question!

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:33):

and more specifically how can we debug this? It'd be interesting to see how the groups are generated

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:33):

I'll try to explain what I've seen happen and why I think it's happening

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:33):

because I fail to see how operand propagation would make a println! affect partitioning

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:34):

the two seem... orthogonal

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:37):

So, to give some back story, there are kind of two main factors to the partitioning system currently: what goes in a CGU and how big is the CGU. Initially what goes in the CGU is based on the Rust module. IIRC, for each module in the crate we generate two CGUs: one for code which is monomorphic to start with and one for monomorphic instantiations of polymorphic functions (ie, generic code). As for how big the CGU is, this is just the sum of the cost estimates for each item in the CGU. I believe currently, this just boils down to a "count of MIR statements in the function".

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:38):

However, the actual number of CGUs we're limited to is fixed. I believe you can control this via a -C flag but the default is like 128 or something. So when we have more CGUs than we're supposed to, we order all the CGUs by their size and start merging the smallest ones together until we're under the limit.

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:44):

it'd be interesting to see if that change to mir-opt causing a regression was just 'bad luck' ie the CGU in question was just at the limit and adding a println just pushed things over the edge but then these benchmarks seem quite limiting and don't seem to capture interesting properties when we're making changes to code being generated!

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:48):

That's the part that's causing some of our MIR-opts issues I believe. Let me show an example:

Suppose we have a crate that ends up with CGUs like this:

Now suppose we have limit of 5 CGUs, we'd get a merged CGU result of:

And we have some function that is eligible for the new MIR-opt we're adding. Currently, this function might be getting included in CGU 3. But now since we've optimized that function (maybe a lot) it has a smaller cost which means its initial CGU partition does as well:

Well now, since we order by the size and merge the smallest, CGU 3 and CGU 5 are going to be merged:

So the end result CGU that our function ended up in actually grew from 500 to 699 which means it will probably take LLVM longer to compile that. Even though our function decreased in size, the module it was a part of changed and got bigger!

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:50):

But notice too, the total amount of work went down: 10,000 + 8,000 + 5,000 + 500 + 400 + 300 => 24,200. But in the new version it's only 24,099.

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:50):

yea but to be clear: this is already the case with no opts

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:51):

we could engineer a testcase that looks exactly like the optimized one you gave

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:51):

Yes, for sure!

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:51):

so that adding a println! also pushes it over the edge

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:51):

Yep!

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:51):

I guess more broadly: how valuable are these tests when looking at changes which change the code being generated

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:51):

/ should we be benchmarking something else?

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:51):

But perf.rlo isn't measuring the cost of adding the println!() per say, it's measuring how expensive that change is relative from one compiler to another.

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:51):

maybe.. a sequence of changes?

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:52):

Wesley Wiser said:

But perf.rlo isn't measuring the cost of adding the println!() per say, it's measuring how expensive that change is relative from one compiler to another.

yes agreed, but in this case the compilers aren't generating the same mir!

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:52):

those tests make sense to me if we're generating the same mir code but measuring how fast we can generate the same code

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:53):

Yeah, there's a bit of discussion about that idea here https://rust-lang.zulipchat.com/#narrow/stream/241847-t-compiler.2Fwg-incr-comp/topic/brainstorming.3A.20mission.20statement.2C.20goals.2C.20design.20constraints/near/203851817 and probably else where as well

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:55):

In some sense, it's true that perf.rlo isn't "pure" or whatever because what it's measuring changes but on the other hand, that's also kind of the point. If we're making changes that are theoretically faster but in practice slower, those are bad changes.

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:55):

What we're seeing right now are just the benchmarks being flawed.

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:56):

Which is why we are willing to land changes that look bad in the benchmarks if there's solid evidence to show they aren't bad in the real world.

view this post on Zulip Xavier Denis (Jul 20 2020 at 14:58):

Wesley Wiser said:

If we're making changes that are theoretically faster but in practice slower, those are bad changes.

Agreed, but with changes like this how do we know if it's just 'bad luck' wrt to getting unlucky partitioning changes or reflective of a real slowdown

view this post on Zulip Wesley Wiser (Jul 20 2020 at 14:59):

Xavier Denis said:

we could engineer a testcase that looks exactly like the optimized one you gave

This is another reason why we've landed a number of those changes which show "regressions". If the regressions are only appearing because of where the test suite happens to insert the println!() then we're often ok with landing it.

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:00):

semi-related: would it make sense for incr compilation to not change CGUs?

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:02):

Xavier Denis said:

Wesley Wiser said:

If we're making changes that are theoretically faster but in practice slower, those are bad changes.

Agreed, but with changes like this how do we know if it's just 'bad luck' wrt to getting unlucky partitioning changes or reflective of a real slowdown

Great question! Here's some of the things I look for to tell if it's the CGU issue:

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:04):

Xavier Denis said:

semi-related: would it make sense for incr compilation to not change CGUs?

I'm not sure I've thought about it enough to have a good opinion but I'm skeptical in general of making the perf test environment too different from what users would normally run themselves.

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:05):

oh i mean in general not change CGUs during incremental compilation :P

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:06):

or basically say 'only change them if something has majorly changed'

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:06):

basically make incremental compilation say 'if CGUs are less than 50% out of balance don't bother changing them'

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:07):

Oh gotcha

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:08):

basically under the hypothesis that when doing things incrementally changing cgus is not useful and it's better to deal with a single slightly larger / lighter unit than rebalance

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:08):

IIRC @pnkfelix had a similar idea around here https://rust-lang.zulipchat.com/#narrow/stream/241847-t-compiler.2Fwg-incr-comp/topic/brainstorming.3A.20mission.20statement.2C.20goals.2C.20design.20constraints/near/203736391 (you might have to scroll a bit further back to see all of the relevant info).

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:09):

(btw i checked the operand opt and the worst test case +12% for an incr rebuild has 5 more calls to LLVM_module_codegen)

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:09):

Yeah seems likely it's the CGU issue then

view this post on Zulip Wesley Wiser (Jul 20 2020 at 15:09):

I've got run but if you have more thoughts, feel free to leave them here and I'll try to reply when I'm back later

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:11):

yea no worries! I just want to better understand how we rule out opts from normal compilation

view this post on Zulip Xavier Denis (Jul 20 2020 at 15:12):

and it's sad that we could lose an opt like this which should cause many other opts to unblock

view this post on Zulip Félix Fischer (Jul 20 2020 at 19:18):

@Wesley Wiser two questions:

view this post on Zulip Félix Fischer (Jul 20 2020 at 19:19):

Oh, btw, in this context I mean "incremental compilation for debug builds". Release builds are another story entirely :P

view this post on Zulip Wesley Wiser (Jul 21 2020 at 01:12):

Since merging CGUs is what makes the times change in strange ways for incremental comp, would it make sense to not merge CGUs at all during incremental compilation?

That seems like a good experiment to try! I think pnkfelix even suggested splitting large CGUs which also makes a lot of sense. One other thing we've talked about would be scaling the number of CGUs targeted by the size of the crate.

Ideally one would want the time needed for recompilation after a small change to be constant. For it to be independent of the rest of the crate, and only depend on the size of the change. Could we generate enough CGUs that we get this level of fine-grained behavior on incremental compilation?

Hopefully yes! We will probably have to change our visibility and linkage code to accommodate having code spread across many CGUs and not clustered together.


Last updated: Oct 21 2021 at 21:20 UTC