Stream: t-compiler/meetings

Topic: [proposal] CGU Partitioning compiler-team#281


pnkfelix (May 08 2020 at 15:42, on Zulip):

creating topic to discuss compiler-team#281

pnkfelix (May 08 2020 at 15:44, on Zulip):

cc @Wesley Wiser and @oli

pnkfelix (May 08 2020 at 15:44, on Zulip):

I wanted to follow up regarding something with CGU's

Wesley Wiser (May 08 2020 at 15:44, on Zulip):

Hey :wave:

pnkfelix (May 08 2020 at 15:44, on Zulip):

(and maybe this Question would be better off being posed within a stream specific to mir-opt or something)

pnkfelix (May 08 2020 at 15:45, on Zulip):

but I wanted to check: The CGU partitioning is used, from what I understand, for two purposes: For incremental compilation, and for parallel code-generation (at LLVM level)

pnkfelix (May 08 2020 at 15:45, on Zulip):

do we use the same algorithm for both?

Wesley Wiser (May 08 2020 at 15:45, on Zulip):

Yeah, there's only one algorithm currently.

Wesley Wiser (May 08 2020 at 15:45, on Zulip):

Which, IMO, is probably something we should change.

pnkfelix (May 08 2020 at 15:46, on Zulip):

(I guess I cannot think of any reason why you wouldn't use the same algorithm for both, since in end its about trying to identify a partitioning that minimizes dependencies between pairs of CGU's, right?)

pnkfelix (May 08 2020 at 15:46, on Zulip):

or maybe there is something I'm missing?

Jonas Schievink (May 08 2020 at 15:46, on Zulip):

non-incremental mode will merge them together though AFAIK

Wesley Wiser (May 08 2020 at 15:46, on Zulip):

I think it depends a lot on the compilation profile

pnkfelix (May 08 2020 at 15:46, on Zulip):

Jonas Schievink said:

non-incremental mode will merge them together though AFAIK

I don't understand what this meant

Wesley Wiser (May 08 2020 at 15:46, on Zulip):

For release builds, we care a lot about putting related code together in the same cgu so LLVM can optimize and inline it.

Wesley Wiser (May 08 2020 at 15:47, on Zulip):

But, hypothetically, for debug incremental builds, we might actually want to make cgus as small as possible.

Wesley Wiser (May 08 2020 at 15:47, on Zulip):

Because we don't care about optimizations

Wesley Wiser (May 08 2020 at 15:48, on Zulip):

Jonas Schievink said:

non-incremental mode will merge them together though AFAIK

That wasn't my understanding but I'm not an expert on this cdoe.

Wesley Wiser (May 08 2020 at 15:48, on Zulip):

(It's here FYI https://github.com/rust-lang/rust/blob/a51e004e1bf7f9bba151dd9104a217c1ace6a0a2/src/librustc_mir/monomorphize/partitioning.rs#L454)

Jonas Schievink (May 08 2020 at 15:49, on Zulip):

Oh, it will respect codegen-units now I think

Wesley Wiser (May 08 2020 at 15:49, on Zulip):

Wesley Wiser said:

But, hypothetically, for debug incremental builds, we might actually want to make cgus as small as possible.

I'm playing around with a variant of this idea locally with interesting results.

Wesley Wiser (May 08 2020 at 15:49, on Zulip):

Yeah, I believe that's correct.

Jonas Schievink (May 08 2020 at 15:50, on Zulip):

Incremental will by default merge CGUs together until it's down to 256, non-incremental defaults to 16

Wesley Wiser (May 08 2020 at 15:51, on Zulip):

Interesting... I hadn't see that code yet. That's very helpful

mark-i-m (May 08 2020 at 17:10, on Zulip):

btw, the module-level comments in that file are helpful as a background info

andjo403 (May 11 2020 at 19:40, on Zulip):

is it possible to split up the CGU more as it is now all instances of generic code will go in the same CGU e.g. made a main function that creates 24 different hashmaps either with the same type or with unique types and the largest CGU takes 569ms to compile then the same type is used and 5234ms when unique types is used.

andjo403 (May 11 2020 at 19:47, on Zulip):

and yes there is more to compile so expect more compile time but think that it can be distributed between the CGUs better strange to have one CGU that takes 5s to compile and all the other is less then 500ms

andjo403 (May 11 2020 at 20:29, on Zulip):

and the other thing that I thinks affects compile time the most is figure out how to account for the calls to inline marked functions when merging CGUs.
as it is now the inline maked functions called in a CGU is added to the CGU after merging is done and if one CGU is calling a lot of inline marked functions that CGUs size can increase a lot.
it can also be a lot of extra work to compile the same inline marked function in multiple CGUs, maybe it hade been possible to merge some CGUs and the inline marked function was only compiled once.
when I was looking in to this in https://github.com/rust-lang/rust/pull/65281 I tried to do the merging after the inline marked functions was added but was not able to figure out how to account for the same function being inlined in multiple CGUs

Wesley Wiser (May 11 2020 at 20:44, on Zulip):

Thanks @andjo403, that's helpful!

andjo403 (May 11 2020 at 21:11, on Zulip):

think that a lot can be done in this area like better size estimate and split up huge CGUs but as with the mir opts it feels like all falls back on the inline marked function handling as long as that do not work there will be regressions some where

mark-i-m (May 12 2020 at 01:57, on Zulip):

one other question: how much can LTO make up for lost optimizations due to more cgus? or is LTO slower than compiling with fewer cgus?

Wesley Wiser (May 12 2020 at 14:13, on Zulip):

That's a question definitely worth exploring for release workloads.

Wesley Wiser (May 12 2020 at 14:13, on Zulip):

@pnkfelix Did your question get answered in the above discussion?

pnkfelix (May 12 2020 at 14:14, on Zulip):

I think it did; I'd have to review to be sure, and I'm looking at something else at the moment

andjo403 (May 15 2020 at 20:41, on Zulip):

something that I started to think about today was that maybe we can have a perf run that only uses one cgu (-Ccodegen-units=1) to be able to see if the optimisations have perf regressions due to problems with the partitioning. so if all one cgu perfs i good the problem is the partitioning, at least until there is a solution to the partitioning problem.
But maybe the perf runs take to much time with that.

mark-i-m (May 15 2020 at 22:55, on Zulip):

It would be kind of interesting to also look at the runtime performance of workloads compile with a huge number of CGUs... Then, we would have data about what the tradeoff of compile time vs runtime is

Félix Fischer (May 16 2020 at 08:16, on Zulip):

I think both are really good ideas! +1 to both by me :3

Félix Fischer (May 16 2020 at 08:19, on Zulip):

Although if @andjo403's idea is feasible (like, maybe the compile time with just one CGU is too large?), I think it will help us a lot in having a sense of what's the upper bound of speedup that the mir-opts are able to give at any time.

Félix Fischer (May 16 2020 at 08:21, on Zulip):

Because it gives you an idea of how much work we are able to take off of LLVM's shoulders if the restriction of having to partition the IR in CGUs is lifted

Last update: Nov 25 2020 at 02:45UTC