Stream: t-compiler/meetings

Topic: [design meeting] codegen unit partitioning compiler-team#281


nikomatsakis (May 22 2020 at 13:52, on Zulip):

Hey @T-compiler/meeting -- design meeting in this topic in ~10 minutes.

Topic: Code Generation Unit Partitioning Meeting compiler-team#281

nikomatsakis (May 22 2020 at 13:54, on Zulip):

Gist with details

Félix Fischer (May 22 2020 at 13:58, on Zulip):

Hi Niko :)
I'd like to participate because this topic is very important to me (specially because I'm now working on mir-opts). How do the meetings work? Do we have like a Hangouts Meet / Jitsi / that kind of thing?

Wesley Wiser (May 22 2020 at 13:59, on Zulip):

They happen here in this Zulip channel over chat :slight_smile:

nikomatsakis (May 22 2020 at 13:59, on Zulip):

@Félix Fischer you're in the right place!

Félix Fischer (May 22 2020 at 14:00, on Zulip):

Yay! Okay, I'm ready then :slight_smile:

andjo403 (May 22 2020 at 14:00, on Zulip):

I will also hang around :D

nikomatsakis (May 22 2020 at 14:02, on Zulip):

Hello @T-compiler/meeting! codegen unit partitioning discussion starting now. Please :wave: to show you're here. To give folks time to gather, let's have a few minutes for

Announcements

nikomatsakis (May 22 2020 at 14:03, on Zulip):
Wesley Wiser (May 22 2020 at 14:03, on Zulip):

Um, sure :slight_smile:

Wesley Wiser (May 22 2020 at 14:03, on Zulip):

I've never run one of these before so I'm a bit lost. I guess it would be good to start by re-iterating the problem?

pnkfelix (May 22 2020 at 14:04, on Zulip):

:thumbs_up:

Wesley Wiser (May 22 2020 at 14:04, on Zulip):

So the backend part of the compiler has parallel codegen.

simulacrum (May 22 2020 at 14:04, on Zulip):
pnkfelix (May 22 2020 at 14:04, on Zulip):

@Wesley Wiser oh lets wait until announcements are done, I guess

Wesley Wiser (May 22 2020 at 14:05, on Zulip):

Sorry!

Santiago Pastorino (May 22 2020 at 14:05, on Zulip):

wanted to mention, as I've stated in this topic that both current P-critical issues are solved with the same PR. Niko "is" investigating (Niko?)

Santiago Pastorino (May 22 2020 at 14:05, on Zulip):

@nikomatsakis I'm glad to give you some interesting tasks :slight_smile:

pnkfelix (May 22 2020 at 14:05, on Zulip):

(its not your fault, @Wesley Wiser ! we don't have a good way to signaling "time for announcements is over.)

simulacrum (May 22 2020 at 14:06, on Zulip):
pnkfelix (May 22 2020 at 14:07, on Zulip):

Oooh!

Félix Fischer (May 22 2020 at 14:07, on Zulip):

(currently on the radar: cache misses, artifact sizes)

That is super cool :heart:

pnkfelix (May 22 2020 at 14:07, on Zulip):

@simulacrum can you link the discussion of shifting to zulip?

simulacrum (May 22 2020 at 14:08, on Zulip):

https://github.com/rust-lang/release-team/issues/6 and https://github.com/rust-lang/infra-team/issues/36

pnkfelix (May 22 2020 at 14:08, on Zulip):

okay. I think that's enough time for announcements. Take it away, @Wesley Wiser !

Wesley Wiser (May 22 2020 at 14:08, on Zulip):

So the backend part of the compiler has parallel codegen. To facilitate this as well as our incremental compilation, we split a crate into LLVM modules which we call "code generation units" or CGUs. Each CGU can then be run through LLVM in parallel or only recompiled if it has changed.

The process of deciding what functions go in what CGU is called CGU partitioning. This is a pretty critical algorithm because it impacts two things:

  1. Without LTO, LLVM will only perform inter-procedural optimizations like inlining between functions in the same CGU (since, if they are in another CGU, it can't see the function body).

  2. In incremental mode, if a function body changes, we have to re-codegen the entire CGU it belongs to.

nikomatsakis (May 22 2020 at 14:09, on Zulip):

(And those two things are interconnected)

nikomatsakis (May 22 2020 at 14:10, on Zulip):

(i.e., partly why we have to recodegen everything, is because we don't know what LLVM inlined into what etc)

davidtwco (May 22 2020 at 14:10, on Zulip):

I think we also suspected that partitioning could have been affecting the gains from polymorphisation (testing with a single codegen unit is something I plan to do).

Wesley Wiser (May 22 2020 at 14:10, on Zulip):

So CGU partitioning is critical to both runtime performance of Rust code because it enables or hinders LLVM optimizations and it is also critical to compiler performance since incremental works best when CGUs are smaller and there is less work to do.

Wesley Wiser (May 22 2020 at 14:11, on Zulip):

Yeah, that's a great point.

Wesley Wiser (May 22 2020 at 14:11, on Zulip):

Which is a good segway into the problem which is that there are (probably) some deficiencies in the current partitioning algorithm.

Wesley Wiser (May 22 2020 at 14:11, on Zulip):

So, for reference, the current algorithm is defined in https://github.com/rust-lang/rust/blob/e91aebc1a3835b9b420da0c021e211175a724b8d/src/librustc_mir/monomorphize/partitioning.rs

Wesley Wiser (May 22 2020 at 14:12, on Zulip):

It's actually very well commented so kudos to mw and others!

pnkfelix (May 22 2020 at 14:12, on Zulip):

@Wesley Wiser out of curiosity, do you or others know the partitioning algorithm LLVM uses?

pnkfelix (May 22 2020 at 14:12, on Zulip):

or sorry, that clang uses, I mean

Félix Fischer (May 22 2020 at 14:13, on Zulip):

pnkfelix said:

or sorry, that clang uses, I mean

Thank you for the clarification, I was just gonna ask you what you meant by llvm!

Wesley Wiser (May 22 2020 at 14:13, on Zulip):

No, I don't but I suspect it's very closely mapped to the translation unit concept C and C++ have.

pnkfelix (May 22 2020 at 14:13, on Zulip):

yeah I was just in the midst of trying to phrase a comment hypothesizing that

Félix Fischer (May 22 2020 at 14:13, on Zulip):

I'm not aware of what Clang uses

Wesley Wiser (May 22 2020 at 14:14, on Zulip):

I'm not a C\C++ dev but my impression is that there's a greater expectation on C\C++ developers to manage compile times by splitting or combining code themselves to manage compile time and performance.

nikomatsakis (May 22 2020 at 14:14, on Zulip):

@pnkfelix they use "one .c file is one partition"

davidtwco (May 22 2020 at 14:14, on Zulip):

Wesley Wiser said:

So, for reference, the current algorithm is defined in https://github.com/rust-lang/rust/blob/e91aebc1a3835b9b420da0c021e211175a724b8d/src/librustc_mir/monomorphize/partitioning.rs

One detail that wasn't immediately obvious to me - the doc comment explains that the heuristic is that, for each source-level module, there are two codegen units - one for non-generic code, and one for generic code. It didn't explain how you get to the requested N codegen units from that, and I believe from a quick read that it merges the smallest two codegen units repeatedly until it reaches N.

Wesley Wiser (May 22 2020 at 14:14, on Zulip):

Yes, that's correct.

nikomatsakis (May 22 2020 at 14:14, on Zulip):

or .cpp of course

nikomatsakis (May 22 2020 at 14:14, on Zulip):

There is one other thing worth mentioning though, @Wesley Wiser

nikomatsakis (May 22 2020 at 14:15, on Zulip):

in principle, ThinLTO was supposed to help ammeliorate this trade-off on the LLVM side

Wesley Wiser (May 22 2020 at 14:15, on Zulip):

Thanks @nikomatsakis!

nikomatsakis (May 22 2020 at 14:15, on Zulip):

it being basically a scheme that lets you do inlining quite late, so that we still get (most of) the benefits of inlining even if we slice into very fine codegen-units

davidtwco (May 22 2020 at 14:16, on Zulip):

(from polymorphisation's perspective: my assumption is that if we mostly polymorphise closures right now - which are necessarily contained in generic functions - then the result is that polymorphised closures are initially in different codegen units from the functions that call them; of course, after merging, who knows how it ends up)

nikomatsakis (May 22 2020 at 14:16, on Zulip):

(and in principle I think it supports incremental recompilation, too, but I'm not sure if we're doing that)

Wesley Wiser (May 22 2020 at 14:16, on Zulip):

I'm definitely not an expert on this part of the compiler so I'm mostly just talking about what I've learned from tinkering with the partitioning algorithm.

pnkfelix (May 22 2020 at 14:17, on Zulip):

so, the strategy of merging smallest two repeatedly until N is hit

pnkfelix (May 22 2020 at 14:17, on Zulip):

I can understand how that is optimizing the load balancing

pnkfelix (May 22 2020 at 14:17, on Zulip):

in terms of compilation work

Félix Fischer (May 22 2020 at 14:17, on Zulip):

nikomatsakis said:

(and in principle I think it supports incremental recompilation, too, but I'm not sure if we're doing that)

It appears to support it, at least from the Clang side: https://clang.llvm.org/docs/ThinLTO.html

pnkfelix (May 22 2020 at 14:17, on Zulip):

but I would have thought one would be better off with something that is aware of the, I dunno, "edges" between the cgu's ?

Wesley Wiser (May 22 2020 at 14:17, on Zulip):

One thing I was hoping to get out of this meeting would be to identify who has an understanding of this part of the compiler.

pnkfelix (May 22 2020 at 14:17, on Zulip):

in terms of interdependencies

pnkfelix (May 22 2020 at 14:17, on Zulip):

(i.e. referencing relationships?)

Wesley Wiser (May 22 2020 at 14:18, on Zulip):

So, @pnkfelix I think what you're getting is is something I haven't mentioned yet

Wesley Wiser (May 22 2020 at 14:18, on Zulip):

Which is that, within a CGU, we include copies of all the functions callable from the "roots" of the CGU

davidtwco (May 22 2020 at 14:18, on Zulip):

Wesley Wiser said:

One thing I was hoping to get out of this meeting would be to identify who has an understanding of this part of the compiler.

I've looked at this a little bit anticipating that it could be affecting polymorphisation, and I'm happy to help out and dive deeper (exams have just finished, so just starting to get polymorphisation back in cache).

Wesley Wiser (May 22 2020 at 14:19, on Zulip):

This is mentioned here https://github.com/rust-lang/rust/blob/e91aebc1a3835b9b420da0c021e211175a724b8d/src/librustc_mir/monomorphize/partitioning.rs#L73

Wesley Wiser (May 22 2020 at 14:19, on Zulip):

and is implemented here https://github.com/rust-lang/rust/blob/e91aebc1a3835b9b420da0c021e211175a724b8d/src/librustc_mir/monomorphize/partitioning.rs#L555

pnkfelix (May 22 2020 at 14:19, on Zulip):

Wesley Wiser said:

Which is that, within a CGU, we include copies of all the functions callable from the "roots" of the CGU

oh, right. I think I forgot that detail

pnkfelix (May 22 2020 at 14:20, on Zulip):

even so, the merging process would still benefit from being aware of this, right?

Wesley Wiser (May 22 2020 at 14:20, on Zulip):

Perhaps yeah.

pnkfelix (May 22 2020 at 14:20, on Zulip):

because those copies are creating extra work that would be eliminated by smarter merging

pnkfelix (May 22 2020 at 14:20, on Zulip):

(at least, I hope it would be eliminated.)

Wesley Wiser (May 22 2020 at 14:20, on Zulip):

I would think merging CGUs that share large overlaps of the same functions would result in less duplicate code

nikomatsakis (May 22 2020 at 14:20, on Zulip):

Wait, we include copies of all the callable functions?

nikomatsakis (May 22 2020 at 14:20, on Zulip):

Or those that are marked #[inline] or some other heuristic

pnkfelix (May 22 2020 at 14:20, on Zulip):

ones from the same module that comment says

pnkfelix (May 22 2020 at 14:21, on Zulip):

even if not marked inline

bjorn3 (May 22 2020 at 14:21, on Zulip):

And if marked #[inline(never)]?

nikomatsakis (May 22 2020 at 14:21, on Zulip):

OK.

pnkfelix (May 22 2020 at 14:21, on Zulip):

wait, hold on, that's just a consequence of the cgu choice

pnkfelix (May 22 2020 at 14:21, on Zulip):

/me thinks more

Félix Fischer (May 22 2020 at 14:21, on Zulip):

Wesley Wiser said:

I would think merging CGUs that share large overlaps of the same functions would result in less duplicate code

Maybe we could compute a "distance" between two CGU that was more or less representative of this overlap, and use that as a heuristic for picking CGU pairs for the purposes of merging?

pnkfelix (May 22 2020 at 14:22, on Zulip):

@Wesley Wiser can you confirm wheter it makes copies from other modules even if they aren't marked #[inline] ?

Wesley Wiser (May 22 2020 at 14:22, on Zulip):

Hmm....

pnkfelix (May 22 2020 at 14:22, on Zulip):

actually, its seems easy to conclude that either way, there's potentially stuff to investigate here

Wesley Wiser (May 22 2020 at 14:22, on Zulip):

I've definitely seen code from other CGUs get included but I can't recall if it's due to #[inline] or not.

davidtwco (May 22 2020 at 14:22, on Zulip):

I think the inlining-specific movement of mono items between the codegen units only considers those marked with #[inline].

davidtwco (May 22 2020 at 14:23, on Zulip):

(based on the top comment)

pnkfelix (May 22 2020 at 14:23, on Zulip):

regardless of how/whether duplication is done, a smarter merge strategy might be good

Wesley Wiser (May 22 2020 at 14:23, on Zulip):

I think it must only be #[inline] items otherwise the CGUs would all be much bigger.

davidtwco (May 22 2020 at 14:23, on Zulip):
//! - The partitioning algorithm has to know which functions are likely to get
//!   inlined, so it can distribute function instantiations accordingly. Since
//!   there is no way of knowing for sure which functions LLVM will decide to
//!   inline in the end, we apply a heuristic here: Only functions marked with
//!   `#[inline]` are considered for inlining by the partitioner. The current
//!   implementation will not try to determine if a function is likely to be
//!   inlined by looking at the functions definition.
Wesley Wiser (May 22 2020 at 14:24, on Zulip):

So, we've talked a lot about the specifics of the current algorithm. Are there other questions people have that we should address about it now or should we move on?

Félix Fischer (May 22 2020 at 14:24, on Zulip):

Thank you @davidtwco, I was copying the same snippet :3

Wesley Wiser (May 22 2020 at 14:24, on Zulip):

(Just trying to watch the clock)

Félix Fischer (May 22 2020 at 14:24, on Zulip):

I think it's worth mentioning that we're modeling the current status as a local maxima

Félix Fischer (May 22 2020 at 14:25, on Zulip):

Like, it's more or less optimized for how we generate MIR today

Félix Fischer (May 22 2020 at 14:25, on Zulip):

BUT it's just a local optimum, and not what we really want. We seem to get regressions with every other MIR opt :P

Félix Fischer (May 22 2020 at 14:26, on Zulip):

Does that make sense?

Wesley Wiser (May 22 2020 at 14:26, on Zulip):

Yeah, so the theory @oli and I have is that the algorithm got tweaked per how rustc works "today" and the MIR we generate gets tweaked to make the partitioning logic happy and so on.

pnkfelix (May 22 2020 at 14:26, on Zulip):

As in, we may only see benefits from changing this strategy if we do it in tandem with other changes to e..g. MIR code gen?

Wesley Wiser (May 22 2020 at 14:26, on Zulip):

Potentially yeah

Wesley Wiser (May 22 2020 at 14:26, on Zulip):

Or said another way

Wesley Wiser (May 22 2020 at 14:27, on Zulip):

We may only see benefits from further MIR optimizations if we tweak the partitioning logic.

pnkfelix (May 22 2020 at 14:27, on Zulip):

ah, okay, I was wondering if I was supposed to read @Félix Fischer 's comment with that (bidirectional) interpretation

nikomatsakis (May 22 2020 at 14:27, on Zulip):

is the concern that MIR optimizations are limited by what they see from other CGUs?

nikomatsakis (May 22 2020 at 14:28, on Zulip):

or is that we think LLVM is not able to do the inlining that would be imp't

Wesley Wiser (May 22 2020 at 14:28, on Zulip):

I don't think so but perhaps I'm misinterpreting your question @nikomatsakis

nikomatsakis (May 22 2020 at 14:28, on Zulip):

I think my question is confused

nikomatsakis (May 22 2020 at 14:28, on Zulip):

all of our opts happen before cgu partionining

Wesley Wiser (May 22 2020 at 14:28, on Zulip):

MIR optimizations generally don't care about CGU partitioning since it happens later in the compiler.

nikomatsakis (May 22 2020 at 14:28, on Zulip):

so I guess I don't quite understand how it interacts with the mir opts I guess

Wesley Wiser (May 22 2020 at 14:28, on Zulip):

Yeah, exactly.

Wesley Wiser (May 22 2020 at 14:28, on Zulip):

Oh

Wesley Wiser (May 22 2020 at 14:28, on Zulip):

So

Wesley Wiser (May 22 2020 at 14:29, on Zulip):

What we think is happening, is that as we optimize things, that changes the "estimated size" of a MIR function.

Wesley Wiser (May 22 2020 at 14:29, on Zulip):

That estimated size is critical to the CGU partitioning algorithm because it uses it to estimate how big CGUs are

Wesley Wiser (May 22 2020 at 14:29, on Zulip):

Which controls which ones get merged (or not)

davidtwco (May 22 2020 at 14:30, on Zulip):

And because of that, there's been a sort of "selective pressure" influencing which MIR optimisations we've adopted and which we haven't?

simulacrum (May 22 2020 at 14:30, on Zulip):

Have we tried pre-estimating the sizes and using those instead of post-opt sizes? Obviously they're "wrong" but could give some idea of whether this theory is right, correct?

Wesley Wiser (May 22 2020 at 14:30, on Zulip):

So as MIR optimizations play with the amount of MIR statements in a Body, it changes which CGU those are potentially included in.

pnkfelix (May 22 2020 at 14:30, on Zulip):

so there's a tension here that I want to try to make explicit

Wesley Wiser (May 22 2020 at 14:30, on Zulip):

Yes, exactly @davidtwco

pnkfelix (May 22 2020 at 14:30, on Zulip):

you're pointing out that regressions occur when you try to add some mir-opt

pnkfelix (May 22 2020 at 14:31, on Zulip):

and those regressions are being caused by issues with the partitioning (or cgu merging scheme, if you prefer)

Wesley Wiser (May 22 2020 at 14:31, on Zulip):

Yeah, specifically regressions from LLVM where the number of CGUs codegenned changes

pnkfelix (May 22 2020 at 14:31, on Zulip):

but the regressions here

pnkfelix (May 22 2020 at 14:31, on Zulip):

they are in the quality of the generated object code

pnkfelix (May 22 2020 at 14:31, on Zulip):

not in the compile time, right?

pnkfelix (May 22 2020 at 14:31, on Zulip):

or are you talking about regressions in compile time

Wesley Wiser (May 22 2020 at 14:31, on Zulip):

I'm not sure what you mean by "here"

pnkfelix (May 22 2020 at 14:31, on Zulip):

you say "things regressed"

pnkfelix (May 22 2020 at 14:31, on Zulip):

I want to confirm that what kinds of regressions we're talking about

Wesley Wiser (May 22 2020 at 14:31, on Zulip):

Yeah, by regressions I mean compile time regressions, not code quality regressions.

pnkfelix (May 22 2020 at 14:32, on Zulip):

oh, you do mean compile time regressions. Okay

simulacrum (May 22 2020 at 14:32, on Zulip):

My understanding is that the changes to merging/partitioning mean that we are regressing on incremental compiles, because where before we skipped codegen we no longer can (because we merged different things)

nikomatsakis (May 22 2020 at 14:32, on Zulip):

@Wesley Wiser I'm curious -- have you tested what happens with one partition? Also, in release builds, do we partition by default? I forget.

Félix Fischer (May 22 2020 at 14:32, on Zulip):

Oh, thank you for bringing that up (@pnkfelix). Yas. As Wesley says, it's compile-time regressions

davidtwco (May 22 2020 at 14:32, on Zulip):

The optimisations that we've wanted to adopt and haven't yet - which motivated these discussions - have we tested those with codegen-units=1? Is my logic right that-

...what @nikomatsakis said I guess.

nikomatsakis (May 22 2020 at 14:32, on Zulip):

Or maybe my question is answered by the point that this is compilation time

Wesley Wiser (May 22 2020 at 14:32, on Zulip):

Yeah, we still partition in release mode

Wesley Wiser (May 22 2020 at 14:32, on Zulip):

Because we want parallel codegen

nikomatsakis (May 22 2020 at 14:32, on Zulip):

Right, I remember we debated about it for some time

simulacrum (May 22 2020 at 14:32, on Zulip):

16 partitions is the default w/o incremental, I think with incremental we have some more complicated scheme but I'm not sure

pnkfelix (May 22 2020 at 14:32, on Zulip):

can you confirm @simulacrum 's statement, @Wesley Wiser ?

pnkfelix (May 22 2020 at 14:33, on Zulip):

are the regressions largely affecting incremental alone?

Wesley Wiser (May 22 2020 at 14:33, on Zulip):

Which has the side effect that people that really care about runtime performance or code size set the CGU count to 1.

Wesley Wiser (May 22 2020 at 14:33, on Zulip):

Let me find an example...

pnkfelix (May 22 2020 at 14:34, on Zulip):

(because I can definitely believe that mir-opts => code size changes => totally different partitioning => completely different performance profile for incremental.)

simulacrum (May 22 2020 at 14:34, on Zulip):

they'll affect non-incremental too, to be clear, just to a lesser extent because it's a matter of how the work is distributed (though you can get 200% work, AIUI, or more, since we duplicate functions).

Félix Fischer (May 22 2020 at 14:34, on Zulip):

pnkfelix said:

are the regressions largely affecting incremental alone?

If I recall correctly, that was the case for one of my PRs. I think Wesley's looking for that one or one that's similar :3

Wesley Wiser (May 22 2020 at 14:34, on Zulip):

Yeah, I would say in general this is mostly an issue for incremental tests.

simulacrum (May 22 2020 at 14:35, on Zulip):

but incremental is where we see "compile times got 50% worse" because, well, we started doing 2x work in llvm

simulacrum (May 22 2020 at 14:35, on Zulip):

one could argue that this is also a symptom of our tests

Wesley Wiser (May 22 2020 at 14:35, on Zulip):

I think the most extreme example I've seen is @Matthew Jasper's PR which reworked some drop logic

Wesley Wiser (May 22 2020 at 14:35, on Zulip):

https://github.com/rust-lang/rust/pull/71840

nikomatsakis (May 22 2020 at 14:35, on Zulip):

(Do we duplicate #[inline] functions in debug builds also, or only release?)

simulacrum (May 22 2020 at 14:35, on Zulip):

I suspect that it matters a lot where you put the println! or whatever; we don't have great coverage

Wesley Wiser (May 22 2020 at 14:35, on Zulip):

https://perf.rust-lang.org/compare.html?start=e5f35df2c6944b843b08369c4b2ff3bdb0beb2d2&end=d4d72c3f314944d906c0b5214cc5062315df842b

pnkfelix (May 22 2020 at 14:36, on Zulip):

okay. so then that leads me to wonder, shifting to a different strategy that is not based on estimated cgu code sizes, but rather based on other factors that are inherent in the structucture

Wesley Wiser (May 22 2020 at 14:36, on Zulip):

So

pnkfelix (May 22 2020 at 14:36, on Zulip):

might mean that we have less noisy results when we try to evaluate impact of changes

Wesley Wiser (May 22 2020 at 14:36, on Zulip):

That's one of the observations I had: We use the same CGU partitioning algorithm for all build types.

davidtwco (May 22 2020 at 14:36, on Zulip):

nikomatsakis said:

(Do we duplicate #[inline] functions in debug builds also, or only release?)

(my understanding, from the doc comment, is that we always do - the algorithm itself doesn't appear to change, just the number of codegen-units?)

simulacrum (May 22 2020 at 14:37, on Zulip):

to my knowledge debug and release builds both have 16 codegen units by default

simulacrum (May 22 2020 at 14:37, on Zulip):

incremental may change that

Wesley Wiser (May 22 2020 at 14:37, on Zulip):

When I suspect there are perhaps better strategies for various types of builds.

Wesley Wiser (May 22 2020 at 14:37, on Zulip):

debug-incremental vs release-no-incremental for example.

pnkfelix (May 22 2020 at 14:37, on Zulip):

Am i right that the strategy of merging two smallest cgu's is essentially prioritizing load balancing over all other metrics?

Félix Fischer (May 22 2020 at 14:37, on Zulip):

pnkfelix said:

[shifting to a different strategy that is not based on estimated cgu code sizes, but rather based on other factors that are inherent in the structucture] might mean that we have less noisy results when we try to evaluate impact of changes

That sounds like a good starting principle, I like that :)

Wesley Wiser (May 22 2020 at 14:37, on Zulip):

Kind of yeah

Wesley Wiser (May 22 2020 at 14:38, on Zulip):

It's a bit weird IMO that it doesn't split huge CGUs

Wesley Wiser (May 22 2020 at 14:38, on Zulip):

Like you can wind up with CGU0: 124, CGU1: 431, CGU2: 123456

simulacrum (May 22 2020 at 14:38, on Zulip):

even with the current strategy we commonly end up with one huge cgu -- I know historically this has been the case for a servo crate and rustc_middle and that means you're waiting for it to compile forever

Wesley Wiser (May 22 2020 at 14:38, on Zulip):

and then obviously a change that requires regenerating CGU2 takes forever

pnkfelix (May 22 2020 at 14:39, on Zulip):

so that's an interesting point too then: We probably should be considering splitting cgu's up

pnkfelix (May 22 2020 at 14:39, on Zulip):

as well as revising the merge strategy, and maybe even revising how we select the initial set of cgu's.

Wesley Wiser (May 22 2020 at 14:40, on Zulip):

So one idea I've been experimenting with is that in debug-incremental, we should try to partition everything into it's own CGU as much as possible.

Wesley Wiser (May 22 2020 at 14:40, on Zulip):

On the theory that allows incremental to do the least work

pnkfelix (May 22 2020 at 14:40, on Zulip):

(I've been sitting here musing if we should add a -Z flag that lets you say "use the one file per cgu for the initial set of cgus", to get something more like what clang does)

Félix Fischer (May 22 2020 at 14:40, on Zulip):

Wesley Wiser said:

So one idea I've been experimenting with is that in debug-incremental, we should try to partition everything into it's own CGU as much as possible.

That makes a lot of sense. Specially because in debug we don't care as much about optimizations

Félix Fischer (May 22 2020 at 14:40, on Zulip):

...right?

pnkfelix (May 22 2020 at 14:40, on Zulip):

(and another -Z flags that lets you say "start with each function in its own cgu")

Wesley Wiser (May 22 2020 at 14:41, on Zulip):

I have an extremely hacky version of this https://github.com/rust-lang/rust/pull/71349

davidtwco (May 22 2020 at 14:41, on Zulip):

Do we know why the original algorithm decided to split source-level modules into generic and non-generic codegen units?

pnkfelix (May 22 2020 at 14:41, on Zulip):

I assume "each function in its own cgu" is basically what @Wesley Wiser is suggesting as maximizing the initial partition, right?

Wesley Wiser (May 22 2020 at 14:41, on Zulip):

With mixed but interesting results https://perf.rust-lang.org/compare.html?start=0aa6751c19d3ba80df5b0b02c00bf44e13c97e80&end=35fdb7107589ab8f0cce89e7a5790f12c40ca662

Wesley Wiser (May 22 2020 at 14:41, on Zulip):

@pnkfelix Yeah

simulacrum (May 22 2020 at 14:41, on Zulip):

@davidtwco I think the Rust algorithm for "what can get inlined" was decided long ago and since then std and similar have been carefully annotated respecting it, e.g. you'll find very few #[inline] tagged functions in std if they're generic because of this

Wesley Wiser (May 22 2020 at 14:42, on Zulip):

@davidtwco it's talked a bit about here https://github.com/rust-lang/rust/blob/e91aebc1a3835b9b420da0c021e211175a724b8d/src/librustc_mir/monomorphize/partitioning.rs#L38

simulacrum (May 22 2020 at 14:42, on Zulip):

(since #[inline] today plays two roles: makes something eligible for inlining but also bumps priorities to LLVM)

simulacrum (May 22 2020 at 14:43, on Zulip):

@Wesley Wiser, maybe good to discuss what regressions we're willing to accept? I doubt we can find something that's better (or not worse at least) in 100% of cases while making improvements

simulacrum (May 22 2020 at 14:43, on Zulip):

(we have ~15 minutes left)

Wesley Wiser (May 22 2020 at 14:43, on Zulip):

Thanks @simulacrum

Wesley Wiser (May 22 2020 at 14:43, on Zulip):

Let me copy some things in from the gist

Wesley Wiser (May 22 2020 at 14:44, on Zulip):
Key design questions

    Is there currently an expert on the compiler team we should be talking to?

    Does anyone have ideas for improving the current algorithm?

    Does anyone have ideas for a different algorithm that would perform better?

    Is regressing some types of compilation performance more acceptable than others? For example, how much do we care about fat LTO + release vs regular release vs debug?

    What should the performance criteria be for merging changes?
        "No regressions": the default and the "safest" answer but makes it very challenging to land any improvements.
        "big enough to offset small losses"
        "bias toward improvements on large crates"
        others?

Ideally we would identify several ideas to try with the current algorithm and one or two different algorithms that could be implemented and tested in parallel.
nikomatsakis (May 22 2020 at 14:44, on Zulip):

One thing I will say

Félix Fischer (May 22 2020 at 14:44, on Zulip):

simulacrum said:

Wesley Wiser, maybe good to discuss what regressions we're willing to accept? I doubt we can find something that's better (or not worse at least) in 100% of cases while making improvements

And also if our theory of local maxima holds, we'll probably be seeing overall regressions from small changes to the algorithm.

nikomatsakis (May 22 2020 at 14:44, on Zulip):

I don't think there's a real expert right now, but we should form some :)

pnkfelix (May 22 2020 at 14:45, on Zulip):

Maybe this would blend nicely with my hope to make a WG-incr-comp

pnkfelix (May 22 2020 at 14:45, on Zulip):

of course codegen units are a broader topic than incr comp

pnkfelix (May 22 2020 at 14:45, on Zulip):

but they are clearly closely coupled

Wesley Wiser (May 22 2020 at 14:46, on Zulip):

There's definitely a strong relationship there

Félix Fischer (May 22 2020 at 14:46, on Zulip):

Yas. And I think CGU's most relevant target are incremental compilation scenarios

Wesley Wiser (May 22 2020 at 14:47, on Zulip):

So I think we've talked a bit about the first question and it sounds like we don't have an expert but we want to have some.

Wesley Wiser (May 22 2020 at 14:47, on Zulip):

It sounds like @nikomatsakis and maybe @pnkfelix probably understand a bit about the overall backend and how it interacts with LLVM

Wesley Wiser (May 22 2020 at 14:47, on Zulip):

?

nikomatsakis (May 22 2020 at 14:48, on Zulip):

I have a decent 'big picture' view

Wesley Wiser (May 22 2020 at 14:48, on Zulip):

So perhaps they would be good to talk to about "big picture" questions?

Wesley Wiser (May 22 2020 at 14:48, on Zulip):

Oh

nikomatsakis (May 22 2020 at 14:48, on Zulip):

and I talked a lot to @mw when he was doing the design work

pnkfelix (May 22 2020 at 14:48, on Zulip):

I think I do. I'm embarrassed that I didn't really grok the merging strategy until now

nikomatsakis (May 22 2020 at 14:48, on Zulip):

but it's a bit out of cache and I know there were tweaks in the interim

Wesley Wiser (May 22 2020 at 14:49, on Zulip):

I think I'm coming up to speed on the actual algorithm but I lack a lot of background knowledge about LLVM

Wesley Wiser (May 22 2020 at 14:49, on Zulip):

So having somebody to bounce those kinds of questions off of would be helpful.

nikomatsakis (May 22 2020 at 14:49, on Zulip):

so when it comes to the specifics of the partitioning strategy I suspect you all grok it more

pnkfelix (May 22 2020 at 14:49, on Zulip):

all of my LLVM knowledge has been learned on demand in order to resolve those ThinLTO+incremental linking bugs

Wesley Wiser (May 22 2020 at 14:49, on Zulip):

Maybe we really should form that working group...

Wesley Wiser (May 22 2020 at 14:50, on Zulip):

Ok, anyway

pnkfelix (May 22 2020 at 14:50, on Zulip):

I think I'll draft a proposal (RFC, whatever the protocol is) today for a WG

Wesley Wiser (May 22 2020 at 14:50, on Zulip):

Does anyone have ideas for improving the current algorithm?

Wesley Wiser (May 22 2020 at 14:50, on Zulip):

So I've mentioned an idea I had but I don't want to monopolize the conversation

pnkfelix (May 22 2020 at 14:51, on Zulip):

i have ideas, yes. But I also think there must be tons of prior work in this area?

Wesley Wiser (May 22 2020 at 14:51, on Zulip):

That's a good point.

Wesley Wiser (May 22 2020 at 14:52, on Zulip):

Is there standard terminology for this? I thought our codegen unit partitioning terminology was rustc specific.

Wesley Wiser (May 22 2020 at 14:52, on Zulip):

Perhaps I should do some more research

pnkfelix (May 22 2020 at 14:52, on Zulip):

In terms of the literature, I'm betting "compilation unit" is popular

Wesley Wiser (May 22 2020 at 14:53, on Zulip):

That's very helpful

pnkfelix (May 22 2020 at 14:53, on Zulip):

but then maybe I'm wrong, in terms of whether a bunch of people have researched partitioning schemes there

pnkfelix (May 22 2020 at 14:53, on Zulip):

(versus just assuming "use whatever the user gave to you")

Wesley Wiser (May 22 2020 at 14:53, on Zulip):

So maybe that's worth talking about a bit?

Wesley Wiser (May 22 2020 at 14:53, on Zulip):

I guess I was implicitly assuming this was something that rustc should do

pnkfelix (May 22 2020 at 14:53, on Zulip):

/me is Asking Jeeves

pnkfelix (May 22 2020 at 14:54, on Zulip):

I think it could be very useful to at least try out a per-file scheme

pnkfelix (May 22 2020 at 14:54, on Zulip):

at the very least, such a scheme should mean that changing a span will only pollute the other functions in your initial cgu

Wesley Wiser (May 22 2020 at 14:54, on Zulip):

Would making the user have a more direct responsibility in this regard be desired/acceptable?

pnkfelix (May 22 2020 at 14:54, on Zulip):

this "merging minimum size cgus" thing means that changes to spans can pollute pretty much anywhere

davidtwco (May 22 2020 at 14:54, on Zulip):

Isn't that what the current scheme approximates by being per-module? (of course, it then gets split into generic/non-generic and then merged again)

pnkfelix (May 22 2020 at 14:54, on Zulip):

(right?)

pnkfelix (May 22 2020 at 14:55, on Zulip):

davidtwco said:

Isn't that what the current scheme approximates by being per-module? (of course, it then gets split into generic/non-generic and then merged again)

Well I was figuring that one file may have many modules

Wesley Wiser (May 22 2020 at 14:55, on Zulip):

Yeah, that's my understanding

pnkfelix (May 22 2020 at 14:55, on Zulip):

but you're right, @davidtwco ,that current best practices for rust may imply one module per file anyway

nikomatsakis (May 22 2020 at 14:55, on Zulip):

@pnkfelix I don't think per-file and per-module are appreciably different

nikomatsakis (May 22 2020 at 14:55, on Zulip):

and definitely when we settled on per-module the intent was to be "more or less the same" as per-file

nikomatsakis (May 22 2020 at 14:55, on Zulip):

that said, there are the twists like "generic vs non-generic", and I don't remember that well the motivation there

davidtwco (May 22 2020 at 14:56, on Zulip):

yeah, they definitely aren't the same thing, but my gut feeling is that the difference would be negligible.

Wesley Wiser (May 22 2020 at 14:56, on Zulip):

So

Wesley Wiser (May 22 2020 at 14:56, on Zulip):

4 minutes left

Wesley Wiser (May 22 2020 at 14:56, on Zulip):

I guess one last thing, if we have time, might be to talk about what @simulacrum mentioned earlier

Wesley Wiser (May 22 2020 at 14:56, on Zulip):

What kind of performance regressions are acceptable?

nikomatsakis (May 22 2020 at 14:56, on Zulip):

basically when we concocted the current scheme initially, the idea was

but it seems like it evolved a reasonable amount away from "simple and predictable" in the meantime

Wesley Wiser (May 22 2020 at 14:57, on Zulip):

Or what are the performance targets?

nikomatsakis (May 22 2020 at 14:57, on Zulip):

Yeah, that's awlays a really hard question toa nswer

Wesley Wiser (May 22 2020 at 14:57, on Zulip):
    What should the performance criteria be for merging changes?
        "No regressions": the default and the "safest" answer but makes it very challenging to land any improvements.
        "big enough to offset small losses"
        "bias toward improvements on large crates"
        others?
nikomatsakis (May 22 2020 at 14:57, on Zulip):

one thing I do recall

nikomatsakis (May 22 2020 at 14:57, on Zulip):

when we talked about this for e.g. NLL

Félix Fischer (May 22 2020 at 14:57, on Zulip):

(Fuck I lost connection. Okay I'm on my phone now. Will read the backlog)

nikomatsakis (May 22 2020 at 14:57, on Zulip):

we came up with some thresholds that were based on human perception

nikomatsakis (May 22 2020 at 14:57, on Zulip):

I'll have to go look back, but it came down to something like .. <10% people can't really tell

nikomatsakis (May 22 2020 at 14:57, on Zulip):

after that, there are gradations of how much they notice

nikomatsakis (May 22 2020 at 14:58, on Zulip):

so I think on this basis we tried to hold a strict line of 10%

pnkfelix (May 22 2020 at 14:58, on Zulip):

Wesley Wiser said:

Or what are the performance targets?

it would be nice, on this note, if we somehow indicated on perf.rlo how "important" each benchmark was, or some way to track back to who even cared about that case

nikomatsakis (May 22 2020 at 14:58, on Zulip):

with the general rule of like "but those should be rare"

nikomatsakis (May 22 2020 at 14:58, on Zulip):

I can go back and find the actual citations and so forth

nikomatsakis (May 22 2020 at 14:58, on Zulip):

but yes I agree with felix that "importance" is not uniform

nikomatsakis (May 22 2020 at 14:58, on Zulip):

I would say that "integration tests" are by far the most imp't

pnkfelix (May 22 2020 at 14:59, on Zulip):

as in things like serde?

nikomatsakis (May 22 2020 at 14:59, on Zulip):

yes, real-world crates and things

nikomatsakis (May 22 2020 at 14:59, on Zulip):

but some of the other tests were extracted based on patterns we saw in real life so it's a bit tricky :)

pnkfelix (May 22 2020 at 14:59, on Zulip):

right. stress tests are like microbenchmarks

davidtwco (May 22 2020 at 14:59, on Zulip):

Wesley Wiser said:

Or what are the performance targets?

It definitely depends on the goal - we want to maintain runtime performance for release builds but compile-time perf could probably slip there; but we want to maintain compile-times for debug builds but runtime perf could probably slip there.

pnkfelix (May 22 2020 at 14:59, on Zulip):

they can tell you interesting things, but you shoudn't necessarily make decisions based on changes to microbenchmarks/stress tests

Félix Fischer (May 22 2020 at 14:59, on Zulip):

I 300% agree with david there

Wesley Wiser (May 22 2020 at 15:00, on Zulip):

Yeah, that's why the idea of using different algorithms for different compilation modes seems so interesting to me.

Félix Fischer (May 22 2020 at 15:00, on Zulip):

I think it's ultimately kind of essential to do so

Wesley Wiser (May 22 2020 at 15:00, on Zulip):

Ok. We're at 11.

Wesley Wiser (May 22 2020 at 15:00, on Zulip):

I guess that's it?

davidtwco (May 22 2020 at 15:00, on Zulip):

One benefit of having the same algorithm is that when you optimise for it, you optimise for both debug and release.

pnkfelix (May 22 2020 at 15:00, on Zulip):

this was really great, @Wesley Wiser

Félix Fischer (May 22 2020 at 15:00, on Zulip):

Like... there are ridiculous optimizations that are cool, bu only make sense from compile time perspective in release mode

Wesley Wiser (May 22 2020 at 15:01, on Zulip):

Thanks, it felt very hap-hazard to me but that's probably just because I was "under the gun" to run the meeting. :slight_smile:

Wesley Wiser (May 22 2020 at 15:02, on Zulip):

If anyone has questions or stuff they want to talk about, I'm happy to chat and we can spin up a topic in #t-compiler or something

Félix Fischer (May 22 2020 at 15:02, on Zulip):

Thanks for everyone who was here. This conversation felt really insightful to me :)

Wesley Wiser (May 22 2020 at 15:02, on Zulip):

Have a great day/evening #t-compiler/meetings :wave:

Last update: Nov 25 2020 at 02:30UTC