Stream: t-compiler/wg-incr-comp

Topic: cgu partitioning brain dump


view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:20):

Hey @Félix Fischer! :wave:

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:23):

So I guess the thing to start with is this PR https://github.com/rust-lang/rust/pull/74275 which is relatively straightforward refactoring of the partitioning module.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:24):

My main goal here was to make it easier for us to add new partitioning schemes for testing purposes. I'm not sure if we actually want different schemes in the future but making changes to the existing one just to test things out can be tricky because you can break the bootstrap and get much more complex errors than when you're testing simpler crates.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:29):

Locally, I have a new scheme that is quite a bit simpler than the current one. Instead of partitioning the code initially by module/crate/etc (see characteristic_def_id_of_type, it first looks at the InliningMap to see what monomorphic instantiations each item will bring in. It then uses the cost of the item and all of it's dependencies to create CGUs which are (hopefully) close in size. After that, we do the standard merge process where we take the two smallest and merge those together until we have the correct number of CGUs.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:30):

I was hoping to see an improvement over the current algorithm for debug builds, but unfortunately it's slower. I've mostly been testing by building a copy of the regex crate locally and I think it's currently about 15 - 20% slower than the regular scheme.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:31):

The code is quite a bit simpler IMO so this may just mean we need to do some additional tuning to close the gap.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:32):

So, some ideas I have that might be worth exploring:

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:34):

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:36):

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:38):

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:40):

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:45):

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:47):

So that's some ideas

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:47):

Um

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:48):

I guess there's still somethings I'm not sure about which would be worth understanding more thoroughly and (ideally) we could feed that back into the rustc dev guide.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:48):

I have general ideas about what the partitioning (minus merging) will do but it would be great to have some basic examples that show exactly what's going on.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 13:55):

For example, I'd love to have documented what the behavior is when you have:

Crate a Crate b
Non-generic function Uses once in a module
Non-generic function Uses in two different modules
Non-generic #[inline] function Uses once in a module
Non-generic #[inline] function Uses in two different modules
Generic function Uses once in a module
Generic function Uses in two different modules
Generic #[inline] function Uses once in a module
Generic #[inline] function Uses in two different modules

view this post on Zulip Wesley Wiser (Aug 06 2020 at 14:01):

Other random things I'm just now thinking of:

view this post on Zulip Wesley Wiser (Aug 06 2020 at 14:02):

view this post on Zulip bjorn3 (Aug 06 2020 at 14:03):

Wesley Wiser said:

I think the main difference between cg_clif and cg_llvm is that cg_llvm seems to force stack usage a bit more and then expects the mem2reg LLVM pass to cleanup after it. cg_clif tries hard to prevent the stack usage during ir generation, because it doesn't have a mem2reg like pass. This doesn't affect ir size that much I think and the current size estimator doesn't account for this (https://github.com/rust-lang/rust/blob/6e87bacd37539b7e7cd75152dffd225047fa983a/src/librustc_ty/ty.rs#L306) in any way.

view this post on Zulip Wesley Wiser (Aug 06 2020 at 14:06):

Thanks @bjorn3, that's good info!

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:38):

I would like to know what kinds of tradeoffs there are between <size of CGUs> and <incremental compile times>

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:40):

Like I wonder what would happen if in debug mode for example, we lifted the restriction in number of CGUs

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:41):

And just decided to skip the part where we "merge them based on the total number of CGUs"

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:42):

I think this could be useful for two reasons:

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:42):

view this post on Zulip pnkfelix (Aug 06 2020 at 15:43):

Wesley Wiser said:

By the way, @Wesley Wiser , do we currently gather any data on how (in)accurate our code size estimates are overall? For example, is there a debugflag one could use to get a table with a row for each function (or maybe each cgu?) and the columns are 1. estimated-code-size, 2. actual-code size in LLVM IR instructions, and 3. actual code size in machine instructions and/or bytes?

view this post on Zulip Wesley Wiser (Aug 06 2020 at 15:45):

I don't think we think we do, but we certainly should!

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:45):

(Cont.)

view this post on Zulip Félix Fischer (Aug 06 2020 at 15:51):

@Wesley Wiser does that seem reasonable? I don't know what other tradeoffs are between merging and not merging CGUs, but maybe in debug builds we could do without the merging step, and make incr-comp both more fine grained (ie more efficient) as well as more predictable

view this post on Zulip Wesley Wiser (Aug 06 2020 at 16:00):

It does seem reasonable overall (or, at least, for some experiments). A few thoughts:

Smaller CGUs does give increased opportunity for incremental recompilation to be fast (as there is less extra code being recompiled).

I don't think there's any fundamental reason why we have to merge CGUs, I think it's just an improvement (mostly) over the original partitioning we have now.

At least with the current scheme, every CGU has to have all the mono items it needs to call (or for them to be available upstream). Having more CGUs will currently (I believe) increase the amount of mono item duplication.

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:04):

Interesting

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:05):

I think it's fine to have mono item duplication in cases where it helps compilation times

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:06):

I would still merge those if it turns out that merging is better for their times

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:08):

I think that in the general case, merging makes sense when you want to have a certain level of optimization across big swaths of code (because LLVM can optimize as much as it wants inside one CGU). For incr-comp, this would mean that release mode can be tuned to have more or less CGUs, and therefore less or more optimizations.

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:09):

Because in release mode we could want to have more LLVM optimizations available

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:10):

(Even at the cost of compile times)

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:10):

(Depending on how performant we want incremental release binaries to be)

view this post on Zulip Félix Fischer (Aug 06 2020 at 16:11):

For debug builds I would disable any merging that does not help with compile times, however :3

view this post on Zulip Wesley Wiser (Aug 07 2020 at 18:13):

Yeah, there's a lot of complexity here so I think we certainly want our changes to be well motivated both from the theoretical perspective as well as concrete performance benchmarks.

view this post on Zulip Félix Fischer (Aug 07 2020 at 23:21):

Indeed

view this post on Zulip Félix Fischer (Aug 07 2020 at 23:21):

Mmm, now I want to make more performance tests

view this post on Zulip Félix Fischer (Aug 07 2020 at 23:22):

But I don't know how much I should, since I cannot focus in both this and in helping nick with the bench suite

view this post on Zulip Félix Fischer (Aug 07 2020 at 23:22):

(Specially after uni is done processing my thesis application)

view this post on Zulip Félix Fischer (Aug 07 2020 at 23:23):

(I think I'll have space for just one of the two then)

view this post on Zulip Wesley Wiser (Aug 07 2020 at 23:43):

That's fine! Is there anything in particular you'd like to work on next so I don't start on that? If you're busy with other stuff right now, that's totally fine too :smile:

view this post on Zulip Félix Fischer (Aug 08 2020 at 01:00):

I think I'm currently busy with other stuff to start programming right away, but maybe I could start reading the relevant bits of the CGU partitioning stuff

view this post on Zulip Félix Fischer (Aug 08 2020 at 01:00):

How big is that code?

view this post on Zulip Félix Fischer (Aug 10 2020 at 15:01):

Also which files should one read? @Wesley Wiser :)

view this post on Zulip Wesley Wiser (Aug 10 2020 at 17:03):

@Félix Fischer https://github.com/rust-lang/rust/blob/master/src/librustc_mir/monomorphize/partitioning.rs

The doc comment is 93 lines but the total file is about 1000 lines.

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:04):

Is everything in that file?

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:04):

:3

view this post on Zulip Wesley Wiser (Aug 10 2020 at 17:05):

Most everything related to cgu partitioning yeah

view this post on Zulip davidtwco (Aug 10 2020 at 17:11):

A thought I’ve just had - can we leverage knowledge about the circumstances in which compiler shims are generated to put those mono items in compilation units that make more sense? (e.g. maybe drop shims don’t change much and aren’t often removed or replaced, so should be in their own codegen unit; but function pointer shims on the other hand are often changed alongside the mono item which spawned them?)

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:19):

What is a shim in this context?

view this post on Zulip davidtwco (Aug 10 2020 at 17:26):

rustc generates MIR bodies in some contexts, for example: implementations of drop_in_place<T>, or of Fn* trait functions for a given function pointer.

view this post on Zulip davidtwco (Aug 10 2020 at 17:27):

We know the context in which these shims get created, and that information might be leveraged by the partitioning.

view this post on Zulip lcnr (Aug 10 2020 at 17:45):

would moving all drop_in_place into one cgu result in recompiles of the whole cgu if we use a new instance of a generic type?

view this post on Zulip lcnr (Aug 10 2020 at 17:46):

without polymorphization we need seperate drop_in_place for Vec<u32> and Vec<u16>, don't we?

view this post on Zulip davidtwco (Aug 10 2020 at 17:52):

Yeah - my examples were very much artificial, I haven’t thought much about what useful partitioning we could do if we leveraged this information, just that we could leverage that information.

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:52):

@lcnr that feels accurate to me

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:53):

Ahh, okay, I understand the context now, @davidtwco :3

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:54):

I think it is a good idea to keep in the list

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:54):

Do we have a list?

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:55):

(Like a hackmd)

view this post on Zulip davidtwco (Aug 10 2020 at 17:55):

It’s possible though that because those shims are constructed from just a type that much of the compiler wouldn’t need to be invoked - so even if they were recompiled more often, it might be really quick every time. Again, just speculation, but :shrug:

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:55):

Yeah we need more measurement to really know without fiddling with the partitioning and seeing what happens

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:56):

But I think it 100% makes sense

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:56):

And also it would seem to me that it's closely related to polymorphization

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:56):

As @lcnr said

view this post on Zulip lcnr (Aug 10 2020 at 17:57):

are we trying to polymorphize shims rn? :thinking: from what I can tell we aren't doing so rn...

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:57):

Like maybe with PM enabled, most of this optimization would not be needed, I think

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:58):

@lcnr is polimorphization already in-tree?

view this post on Zulip lcnr (Aug 10 2020 at 17:58):

ye, but off by default

view this post on Zulip lcnr (Aug 10 2020 at 17:58):

-Zpolymorphize=on

view this post on Zulip davidtwco (Aug 10 2020 at 17:58):

@lcnr we do, they’re not treated separately but we’ve made changes in the past to shim building (e.g. #75346) that enables them to be polymorphic.

view this post on Zulip Félix Fischer (Aug 10 2020 at 17:58):

I seem to remember that last week it was not enabled... ahh, you're right. In-tree but off by default, yas

view this post on Zulip Félix Fischer (Aug 10 2020 at 18:00):

@davidtwco nice! :D

view this post on Zulip lcnr (Aug 10 2020 at 18:00):

ahh, the reason why drop_in_place isn't polymorphized is then because it always uses a pointer to T?

view this post on Zulip davidtwco (Aug 10 2020 at 18:00):

I assume so, I’ve never dug into the MIR for it.

view this post on Zulip Félix Fischer (Aug 10 2020 at 18:02):

So assuming all shims are eventually captured by polimorphization, this optimization would mainly be useful for when compiling in release mode, right? Since (I'm assuming) release would turn off PM in order to get the most runtime performance available

view this post on Zulip lcnr (Aug 10 2020 at 18:04):

PM does not have a negative runtime impact

view this post on Zulip lcnr (Aug 10 2020 at 18:05):

we actually get some benefits because the binary is smaller (depending on how clever llvm is I guess :sweat_smile: )

view this post on Zulip lcnr (Aug 10 2020 at 18:05):

We do not virtualize function calls with polymorphization

view this post on Zulip davidtwco (Aug 10 2020 at 18:06):

Polymorphization just generates fewer copies of functions, if shouldn’t impact runtime performance. Polymorphization is run on all shims but that’s not to say that they will be polymorphized - depends how they use their generic parameters (if they have any).

If you have more questions, we can chat in #t-compiler/wg-polymorphization to keep this chat on-topic.

view this post on Zulip Félix Fischer (Aug 10 2020 at 18:07):

We do not virtualize function calls with polymorphization

Holy cow. I do not understand how did you pull it off, then, David. I will surely go ask! Thanks for the time :heart:

view this post on Zulip Félix Fischer (Aug 10 2020 at 18:32):

Okay now I understand. We'd never disable polymorphization for runtime performance reasons :D

view this post on Zulip Félix Fischer (Aug 10 2020 at 18:33):

And at the same time, there are shims that will definitely not be polymorphisized

view this post on Zulip davidtwco (Aug 10 2020 at 18:51):

@lcnr I just realised - I don’t think we do actually polymorphize shims. I’m misremembering because we did have to make them capable of dealing with generic params (if the instance that spawned them was polymorphized). Instance::polymorphize has a check for InstanceDef::Item (ought to look into if that is necessary). That’s the last I’ll say about polymorphization here though to avoid going off topic.

view this post on Zulip Josh Triplett (Aug 17 2020 at 06:43):

@davidtwco There is one way in which polymorphization would impact runtime performance: it reduces code size and thus instruction cache footprint.

view this post on Zulip Aman Arora (Aug 31 2020 at 20:15):

Wesley Wiser said:

For example, I'd love to have documented what the behavior is when you have:

Crate a Crate b
Non-generic function Uses once in a module
Non-generic function Uses in two different modules
Non-generic #[inline] function Uses once in a module
Non-generic #[inline] function Uses in two different modules
Generic function Uses once in a module
Generic function Uses in two different modules
Generic #[inline] function Uses once in a module
Generic #[inline] function Uses in two different modules

Sorry I got occupied last week. @Wesley Wiser can you comment on how I can verify the behavior for these?

view this post on Zulip Santiago Pastorino (Aug 31 2020 at 20:19):

would be great to keep filling rustc-dev-guide with more info ;)

view this post on Zulip Aman Arora (Sep 03 2020 at 08:27):

Where part of the dev guide should I add results of this testing to? I can't seem to find a particular section dedicated to incremental compile (there is a section in the query system) and the only detail I could find about partitioning was a comment on this page

view this post on Zulip Aman Arora (Sep 03 2020 at 08:28):

About the actual results I added more details on the issue itself: https://github.com/rust-lang/wg-incr-comp/issues/2

view this post on Zulip Wesley Wiser (Sep 03 2020 at 15:05):

Apologies to everyone trying to get a hold of me the last few days. I've been feeling a bit sick and haven't spent much time on Rust stuff. I'm feeling better now so hopefully I can dig out of the pile of Zulip and GitHub notifications over the next day or two! :smile:

view this post on Zulip Wesley Wiser (Sep 03 2020 at 15:05):

@Aman Arora Sorry, I'll take a look at what you wrote on GitHub later today.

view this post on Zulip Aman Arora (Sep 08 2020 at 20:06):

rustc-dev-guide merged https://github.com/rust-lang/rustc-dev-guide/pull/847. https://github.com/rust-lang/wg-incr-comp/issues/2 can be closed.

view this post on Zulip Aman Arora (Sep 08 2020 at 20:06):

The information can be found here: https://rustc-dev-guide.rust-lang.org/backend/monomorph.html?highlight=mono#codegen-unit-cgu-partitioning

view this post on Zulip Wesley Wiser (Sep 09 2020 at 00:00):

Great work! :tada:


Last updated: Oct 21 2021 at 21:46 UTC