Stream: t-compiler/wg-llvm

Topic: new pass manager


cuviper (Sep 10 2019 at 17:35, on Zulip):

Is anyone already working on the new pass manager? https://github.com/rust-lang/rust/issues/64289
If not, I would be interested to try this myself

Nikita Popov (Sep 10 2019 at 21:47, on Zulip):

Nope, not aware of anyone working on newpm yet. Feel free :)

Nikita Popov (Oct 02 2019 at 21:12, on Zulip):

Did you have any luck with newpm?

cuviper (Oct 03 2019 at 16:56, on Zulip):

I started poking at it a little, mostly to learn how the new bits work, nothing to run yet

cuviper (Oct 03 2019 at 16:56, on Zulip):

unfortunately I got sidetracked with other work

mw (Oct 07 2019 at 13:43, on Zulip):

Looking forward to seeing this in action. Apparently the new pass manager can improve optimization quality: https://groups.google.com/d/msg/llvm-dev/CZmUJC4gMjQ/004NTMEYBAAJ

cuviper (Oct 28 2019 at 17:48, on Zulip):

FYI this is still on my mind, but personal life has been intense lately

Vadim Petrochenkov (Jan 08 2020 at 11:09, on Zulip):

I's there any summary on how the new pass manager is different from the old one and why?
I know the new pass manager was a multi-year effort, but I used only the most basic functionality in the old pass manager myself, so I find it hard to imagine why it would be such a big deal.

Nikita Popov (Jan 08 2020 at 13:07, on Zulip):

I'm not really familiar either, but I believe the main difference and advantage is in how it manages analysis passes.

Nikita Popov (Jan 08 2020 at 13:08, on Zulip):

I believe those get computed more lazily now and also get invalidated more lazily (e.g. if a transform pass doesn't make any changes, even if it does not explicitly preserve some specific analysis.)

Nikita Popov (Jan 08 2020 at 13:09, on Zulip):

And apparently some kinds of analysis couldn't be accessed in some types of transformation passes before ... or something.

Wesley Wiser (Feb 24 2020 at 15:36, on Zulip):

I'm working on some MIR optimizations and I'm seeing a massive regression in compile times when my optimization is enabled due to LLVM (on one particular test case). I'm working on trying to debug this more but I've noticed that if I enabled the new pass manager via -Znew-llvm-pass-manager, that seems to resolve the issue.

Is there a timeline for using the new pass manager by default or are we blocked on something? Is there anything I can do to help get that flipped on by default?

andjo403 (Feb 24 2020 at 22:26, on Zulip):

one reason to not make the switch is that not even clang is using the new pass manager yet

Wesley Wiser (Feb 24 2020 at 22:30, on Zulip):

Ah, I see. I was just reading in LLVM weekly that passes are still being ported to it. Perhaps "experimental pass manager" is a better term :slight_smile:

nagisa (Feb 24 2020 at 22:44, on Zulip):

what optimizations those are

nagisa (Feb 24 2020 at 22:45, on Zulip):

MIR optimisations should first and foremost be for reducing compile times, optimising a generic IR will not allow super high quality code anyway – LLVM will do it.

Wesley Wiser (Feb 24 2020 at 23:54, on Zulip):

The MIR inliner. It's a win for most of the rustc-perf benchmarks but deeply-nested shows an enormous slowdown when it's enabled.

Wesley Wiser (Feb 24 2020 at 23:56, on Zulip):

I think the core issue is that some of the iterator methods are currently just slightly too costly for LLVM to inline them. After turning on MIR inliner, they just squeeze by the cost threshold which triggers an exponential blowup in terms of locals.

Wesley Wiser (Feb 24 2020 at 23:56, on Zulip):

I have a number of local tweaks that I think fix that at the MIR level but I'm still seeing LLVM take massive amounts of compilation time.

Wesley Wiser (Feb 24 2020 at 23:58, on Zulip):

I've tried following the guide here https://rust-lang.github.io/rustc-guide/codegen/debugging.html but unfortunately if I set -Ccodegen-units=1, the slowdown goes away and if I use opt on the individual files generated by --emit llvm-ir, there's no slowdown.

Wesley Wiser (Feb 24 2020 at 23:59, on Zulip):

So I'm not really sure where to go next. I've been starting to poke at the codegen partition code; maybe there's something that can be tweaked there? That's just speculation on my part though.

Wesley Wiser (Feb 25 2020 at 00:03, on Zulip):

I happened to notice that the new pass manager also resolves the issue but it looks like that's currently a dead end.

nagisa (Feb 25 2020 at 02:05, on Zulip):

Wesley Wiser said:

I happened to notice that the new pass manager also resolves the issue but it looks like that's currently a dead end.

Very likely to be because "new pass manager" is not full featured yet?

nagisa (Feb 25 2020 at 02:05, on Zulip):

Consider tweaking inlining thresholds in Rust?

Zoxc (Mar 01 2020 at 07:43, on Zulip):

@Wesley Wiser Have you tried extracting the LLVM IR for the codegen unit which slows down? Maybe you could report a performance problem upstream to LLVM?

Wesley Wiser (Mar 09 2020 at 21:03, on Zulip):

@Zoxc Sorry, just getting back to this. I do have the LLVM IR for the codegen unit which is slow. I'm not really sure how to get opt to do exactly what rustc is doing though so I don't have a clean repro.

I suspect a lot of this is self-inflicted and possibly not a bug in LLVM since we've slapped #[inline] on pretty much everything in std::iter

Wesley Wiser (Mar 09 2020 at 21:04, on Zulip):

If anyone has advice on invoking opt with the same set of optimization passes rustc uses, that would be very helpful! :)

Nikita Popov (Mar 10 2020 at 08:32, on Zulip):

I'm assuming you're using opt -O3 currently? You can try -Zprint-llvm-passes and pass the output of that to opt.

Nikita Popov (Mar 10 2020 at 08:36, on Zulip):

Regarding the original question, we should give using the new pass manager another try after updating to LLVM 10. There are a couple of issues I'm still aware of, so not sure if we'll actually make the switch already.

Zoxc (Mar 11 2020 at 11:39, on Zulip):

You want the IR before any LLVM passes run, I'm not sure if we have an easy way to output that.

Wesley Wiser (Mar 11 2020 at 13:14, on Zulip):

That's the -Cno-prepopulate-passes flag right? https://rustc-dev-guide.rust-lang.org/codegen/debugging.html#compiler-options-to-know-and-love

Zoxc (Mar 12 2020 at 14:22, on Zulip):

In theory, not sure it's accurate in practice =P

nagisa (Mar 12 2020 at 14:33, on Zulip):

relatively accurate. AFAIR we still just output an error if we would run any passes

Zoxc (Mar 12 2020 at 15:09, on Zulip):

@Wesley Wiser Do we have a limit for how large functions can be before we stop inlining?

Wesley Wiser (Mar 12 2020 at 15:17, on Zulip):

@Zoxc not currently no

Wesley Wiser (Mar 12 2020 at 15:41, on Zulip):

Let me brain dump what I've learned so far in case it helps spark an idea from anyone else:

The problematic test case is deeply-nested.

With MIR inlining enabled, we hand LLVM a couple thousand lines of IR for foo() which it chews through very quickly (this isn't the compilation time issue). The problem is that because we box the value as Box<dyn Iterator<Item = ()>> , the collector also has to codegen all of the trait methods. It's this code which LLVM is choking on because it spends a huge amount of processing the Iterator::next() body after inlining all of the Chain adapters.

I suspect the reason that LLVM inlining is triggering so aggressively here is that currently something in std::iter is just over the threshold to inline and so it is not currently getting inlined but with MIR inlining enabled, it may now be just small enough to inline. I have not confirmed this though.

Zoxc (Mar 12 2020 at 16:14, on Zulip):

You could try adding a max function size and stop inlining into such functions. It might help with the regression since you don't seem too sure about that cause.

We could also just regress deeply-nested since it's just a synthetic benchmark. It would be interesting to see if adding or removing more .chain(empty())s from it changes the regression.

Last update: Jul 03 2020 at 16:50UTC