Stream: t-compiler/wg-mir-opt

Topic: MIR inliner can finally complete a bootstrap


Wesley Wiser (Jan 14 2020 at 13:31, on Zulip):

With a simple patch to turn the MIR inliner on, ./x.py build now completes successfully! There's 4 failing UI tests which seem to be more or less expected but the rest of the tests seem to pass.

Wesley Wiser (Jan 14 2020 at 13:33, on Zulip):

Failures:

---- [ui] ui/async-await/async-closure.rs stdout ----

error: test compilation failed although it shouldn't!
status: exit code: 101
command: "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/home/wesley/code/rust/rust/src/test/ui/async-await/async-closure.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "-o" "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/test/ui/async-await/async-closure/a" "-Crpath" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--edition=2018" "-L" "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/test/ui/async-await/async-closure/auxiliary"
stdout:
------------------------------------------

------------------------------------------
stderr:
------------------------------------------
error: internal compiler error: src/librustc/traits/codegen/mod.rs:57: Encountered error `OutputTypeParameterMismatch(Binder(<[closure@/home/wesley/code/rust/rust/src/test/ui/async-await/async-closure.rs:50:5: 53:7] as std::ops::Fn<(u8,)>>), Binder(<[closure@/home/wesley/code/rust/rust/src/test/ui/async-await/async-closure.rs:50:5: 53:7] as std::ops::Fn<(u8,)>>), Sorts(ExpectedFound { expected: impl std::future::Future, found: std::future::GenFuture<[static generator@/home/wesley/code/rust/rust/src/test/ui/async-await/async-closure.rs:50:31: 53:6 x:u8 {WakeOnceThenComplete, WakeOnceThenComplete, ()}]> }))` selecting `Binder(<[closure@/home/wesley/code/rust/rust/src/test/ui/async-await/async-closure.rs:50:5: 53:7] as std::ops::Fn<(u8,)>>)` during codegen

thread 'rustc' panicked at 'Box<Any>', src/librustc_errors/lib.rs:872:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.42.0-dev running on x86_64-unknown-linux-gnu

note: compiler flags: -Z threads=1 -Z ui-testing -Z deduplicate-diagnostics=no -Z unstable-options -C prefer-dynamic -C rpath -C debuginfo=0

error: aborting due to previous error


------------------------------------------
Wesley Wiser (Jan 14 2020 at 13:33, on Zulip):
---- [ui] ui/type_length_limit.rs stdout ----

error: ui test compiled successfully!
status: exit code: 0
command: "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/home/wesley/code/rust/rust/src/test/ui/type_length_limit.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-C" "prefer-dynamic" "--out-dir" "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/test/ui/type_length_limit" "-Crpath" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/test/ui/type_length_limit/auxiliary" "-A" "unused"
stdout:
------------------------------------------

------------------------------------------
stderr:
------------------------------------------

------------------------------------------


---- [ui] ui/backtrace-debuginfo.rs stdout ----

error: test run failed!
status: exit code: 101
command: "/home/wesley/code/rust/rust/build/x86_64-unknown-linux-gnu/test/ui/backtrace-debuginfo/a"
stdout:
------------------------------------------
---------------------------------------
trace does not match position list
still need to find ["backtrace-debuginfo.rs:189", "backtrace-debuginfo.rs:126", "backtrace-debuginfo.rs:109"]

--- stdout
backtrace-debuginfo.rs:109
backtrace-debuginfo.rs:126
backtrace-debuginfo.rs:189

--- stderr
test case 6
thread 'main' panicked at 'explicit panic', /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:107:9
stack backtrace:
   0:     0x7fb2ec036603 - backtrace::backtrace::trace_unsynchronized::h92d5dd857c57941c
                               at /home/wesley/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x7fb2ec036603 - std::sys_common::backtrace::_print_fmt::h415534bd2e78d38a
                               at src/libstd/sys_common/backtrace.rs:77
   2:     0x7fb2ec036603 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h25a4adad3f103ff1
                               at src/libstd/sys_common/backtrace.rs:59
   3:     0x7fb2ec0b185d - core::fmt::write::hee5e361c086e3d9f
                               at src/libcore/fmt/mod.rs:1057
   4:     0x7fb2ec071807 - std::io::Write::write_fmt::hf03f4a9da07cb0fb
                               at src/libstd/io/mod.rs:1426
   5:     0x7fb2ec037d46 - std::sys_common::backtrace::_print::hb0f7a784c46e475f
                               at src/libstd/sys_common/backtrace.rs:62
   6:     0x7fb2ec037d46 - std::sys_common::backtrace::print::ha4ea7a4bc3181ec8
                               at src/libstd/sys_common/backtrace.rs:49
   7:     0x7fb2ec037d46 - std::panicking::default_hook::{{closure}}::ha321f4d5aa59349d
                               at src/libstd/panicking.rs:195
   8:     0x7fb2ec037967 - std::panicking::default_hook::hacbad13ea406746e
                               at src/libstd/panicking.rs:215
   9:     0x7fb2ec0388ec - std::panicking::rust_panic_with_hook::h799d29f0de040960
                               at src/libstd/panicking.rs:463
  10:     0x558504f8fee7 - std::panicking::begin_panic::habeca5b7aae9ee28
                               at /home/wesley/code/rust/rust/src/libstd/panicking.rs:390
  11:     0x558504f9fbf4 - backtrace_debuginfo::inner_inlined::h77ab595dd670898f
                               at /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:107
  12:     0x558504f9fbf4 - backtrace_debuginfo::outer::h681fb285db800a84
                               at /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:126
  13:     0x558504fa15a6 - backtrace_debuginfo::main::h068f900789b8ef4b
                               at /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:189
  14:     0x558504f8cd52 - std::rt::lang_start::{{closure}}::hbaee571eae29b588
                               at /home/wesley/code/rust/rust/src/libstd/rt.rs:67
  15:     0x7fb2ec038189 - std::rt::lang_start_internal::{{closure}}::h3d4c30d894c4d541
                               at src/libstd/rt.rs:52
  16:     0x7fb2ec038189 - std::panicking::try::do_call::hd027c00fedeee9d1
                               at src/libstd/panicking.rs:296
  17:     0x7fb2ec07e144 - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:79
  18:     0x7fb2ec037fae - std::panicking::try::h8ba6ae2d7ad0284a
                               at src/libstd/panicking.rs:272
  19:     0x7fb2ec038e87 - std::panic::catch_unwind::h1ca20130e7192581
                               at src/libstd/panic.rs:394
  20:     0x7fb2ec038e87 - std::rt::lang_start_internal::h0e3d20d1e9a2c63c
                               at src/libstd/rt.rs:51
  21:     0x558504f8cd23 - std::rt::lang_start::h58d40a97e857c26c
                               at /home/wesley/code/rust/rust/src/libstd/rt.rs:67
  22:     0x558504fa191b - main
  23:     0x7fb2ebdfde0b - __libc_start_main
                               at ../csu/libc-start.c:308
  24:     0x558504f8609a - _start
  25:                0x0 - <unknown>

---------------------------------------
trace does not match position list
still need to find ["backtrace-debuginfo.rs:189", "backtrace-debuginfo.rs:126", "backtrace-debuginfo.rs:120"]

--- stdout
backtrace-debuginfo.rs:120
backtrace-debuginfo.rs:126
backtrace-debuginfo.rs:189

--- stderr
test case 9
thread 'main' panicked at 'explicit panic', /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:107:9
stack backtrace:
   0:     0x7fddddde1603 - backtrace::backtrace::trace_unsynchronized::h92d5dd857c57941c
                               at /home/wesley/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x7fddddde1603 - std::sys_common::backtrace::_print_fmt::h415534bd2e78d38a
                               at src/libstd/sys_common/backtrace.rs:77
   2:     0x7fddddde1603 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h25a4adad3f103ff1
                               at src/libstd/sys_common/backtrace.rs:59
   3:     0x7fdddde5c85d - core::fmt::write::hee5e361c086e3d9f
                               at src/libcore/fmt/mod.rs:1057
   4:     0x7fdddde1c807 - std::io::Write::write_fmt::hf03f4a9da07cb0fb
                               at src/libstd/io/mod.rs:1426
   5:     0x7fddddde2d46 - std::sys_common::backtrace::_print::hb0f7a784c46e475f
                               at src/libstd/sys_common/backtrace.rs:62
   6:     0x7fddddde2d46 - std::sys_common::backtrace::print::ha4ea7a4bc3181ec8
                               at src/libstd/sys_common/backtrace.rs:49
   7:     0x7fddddde2d46 - std::panicking::default_hook::{{closure}}::ha321f4d5aa59349d
                               at src/libstd/panicking.rs:195
   8:     0x7fddddde2967 - std::panicking::default_hook::hacbad13ea406746e
                               at src/libstd/panicking.rs:215
   9:     0x7fddddde38ec - std::panicking::rust_panic_with_hook::h799d29f0de040960
                               at src/libstd/panicking.rs:463
  10:     0x56186e3d5ee7 - std::panicking::begin_panic::habeca5b7aae9ee28
                               at /home/wesley/code/rust/rust/src/libstd/panicking.rs:390
  11:     0x56186e3e5ea7 - backtrace_debuginfo::inner_inlined::h77ab595dd670898f
                               at /home/wesley/code/rust/rust/src/test/ui/backtrace-debuginfo.rs:107
  12:     0x56186e3e5ea7 - backtrace_debuginfo::outer::h681fb285db800a84
oli (Jan 14 2020 at 13:36, on Zulip):

We should also do a crater run with the optimization enabled

Wesley Wiser (Jan 15 2020 at 10:34, on Zulip):

Update: The perf run completed and most cases show a 2-5% improvement. However, there are some major regressions, most notably deeply-nested-opt which regressed 901,432.7%! I posted an analysis on the PR which I'll copy here:

Looking at the (major) regressions, I'm seeing the following patterns:

oli (Jan 15 2020 at 11:21, on Zulip):

One way to figure out what is causing these to regress may be to add more fine grained profiling for self-profile to pick up. E.g. what I did for const eval was to have the main query also create a profile section that had the name of the constant being evaluated in it. Most of the time these will just disappear in the noise and never show up, but if one of them dominates, that shows up

oli (Jan 15 2020 at 11:23, on Zulip):

so for the normalize_ty_after_erasing_regions you could add a profile call let _scope = tcx.prof.generic_activity(&format!("inlining {}", instance)); before https://github.com/rust-lang/rust/blob/632387f38dfbac0f2b8b8900c840fff7f1fb888e/src/librustc_mir/transform/inline.rs#L125-L133 and thus see inlining which functions takes time

oli (Jan 15 2020 at 11:24, on Zulip):

I mean technically this may be worthy of a query, because if a function is inlined multiple times with the same generic parameters we may want to have a cached version of the monomorphic version available

Wesley Wiser (Jan 15 2020 at 11:25, on Zulip):

That's a good idea! mw actually landed support recently for recording query keys for every event so we might already have all the data we need

oli (Jan 15 2020 at 11:25, on Zulip):

oh heh

oli (Jan 15 2020 at 11:25, on Zulip):

wonderful

oli (Jan 15 2020 at 11:25, on Zulip):

That's obviously a better way than my hacky format stuff

Wesley Wiser (Jan 15 2020 at 11:26, on Zulip):

And we just got the tooling support to read them in measureme this morning https://github.com/rust-lang/measureme/pull/108

Wesley Wiser (Jan 15 2020 at 11:27, on Zulip):

For the LLVM issue, I tried enabling -Z print-llvm-passes and I saw this:

Target Library Information
  FunctionPass Manager
    Dominator Tree Construction
    Natural Loop Information
    Branch Probability Analysis
    Block Frequency Analysis

and then a very long hang. So I attached gdb and got this backtrace:

#0  0x00007f98dd394337 in llvm::DenseMapBase<llvm::DenseMap<llvm::BasicBlock*, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > > >, llvm::BasicBlock*, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > >, llvm::DenseMapInfo<llvm::BasicBlock*>, llvm::detail::DenseMapPair<llvm::BasicBlock*, std::unique_ptr<llvm::DomTreeNodeBase<llvm::BasicBlock>, std::default_delete<llvm::DomTreeNodeBase<llvm::BasicBlock> > > > >::find(llvm::BasicBlock const*) const () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#1  0x00007f98de816c22 in llvm::DominatorTreeBase<llvm::BasicBlock, false>::getNode(llvm::BasicBlock const*) const ()
   from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#2  0x00007f98de817036 in llvm::DominatorTreeBase<llvm::BasicBlock, false>::dominates(llvm::BasicBlock const*, llvm::BasicBlock const*) const ()
   from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#3  0x00007f98ddf0c500 in llvm::GVN::findLeader(llvm::BasicBlock const*, unsigned int) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#4  0x00007f98ddf18a93 in llvm::GVN::processInstruction(llvm::Instruction*) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#5  0x00007f98ddf18cb9 in llvm::GVN::processBlock(llvm::BasicBlock*) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#6  0x00007f98ddf18f90 in llvm::GVN::iterateOnFunction(llvm::Function&) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#7  0x00007f98ddf19176 in llvm::GVN::runImpl(llvm::Function&, llvm::AssumptionCache&, llvm::DominatorTree&, llvm::TargetLibraryInfo const&, llvm::AAResults&, llvm::MemoryDependenceResults*, llvm::LoopInfo*, llvm::OptimizationRemarkEmitter*) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#8  0x00007f98ddf19ee1 in llvm::gvn::GVNLegacyPass::runOnFunction(llvm::Function&) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#9  0x00007f98de867518 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#10 0x00007f98de40e93b in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#11 0x00007f98de8668c8 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
--Type <RET> for more, q to quit, c to continue without paging--
#12 0x00007f98de7e2679 in LLVMRunPassManager () from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-9d36f6e088352849.so
#13 0x00007f98dc80bab8 in rustc_codegen_llvm::back::write::optimize (cgcx=0x7f98d8238150, diag_handler=<optimized out>, module=<optimized out>, config=<optimized out>)
    at src/librustc_codegen_llvm/back/write.rs:444
#14 <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::write::WriteBackendMethods>::optimize (cgcx=<optimized out>, diag_handler=0x7f98d8237e50, module=<optimized out>,
    config=0x7f98d4510f70) at src/librustc_codegen_llvm/lib.rs:160
#15 0x00007f98dc6deff8 in rustc_codegen_ssa::back::write::execute_optimize_work_item (cgcx=0x7f98d8238150, module_config=0x7f98d4510f70, module=...)
    at /home/wesley/code/rust/rust/src/librustc_codegen_ssa/back/write.rs:740
#16 rustc_codegen_ssa::back::write::execute_work_item (cgcx=0x7f98d8238150, work_item=...) at /home/wesley/code/rust/rust/src/librustc_codegen_ssa/back/write.rs:717
Wesley Wiser (Jan 15 2020 at 11:28, on Zulip):

I thought originally this might have something to do with how many blocks were in the function being optimized but after running the inliner, we actually have fewer blocks than when we started.

Wesley Wiser (Jan 15 2020 at 11:29, on Zulip):

GVN looks at locals I believe and we do create a lot more locals for the deeply-nested test after inlining.

oli (Jan 15 2020 at 11:30, on Zulip):

so... this is some exponential algorithm on the number of locals?

Wesley Wiser (Jan 15 2020 at 11:30, on Zulip):

I guess?

Wesley Wiser (Jan 15 2020 at 11:30, on Zulip):

I know nothing of LLVM internals

oli (Jan 15 2020 at 11:30, on Zulip):

"GVNLegacyPass" sounds dubious

Wesley Wiser (Jan 15 2020 at 11:30, on Zulip):

Yeah I thought so too

Wesley Wiser (Jan 15 2020 at 11:31, on Zulip):

But it looks like "New GVN" isn't ready yet

oli (Jan 15 2020 at 11:31, on Zulip):

Hmm... the docs aren't helpful :D http://llvm.org/doxygen/classllvm_1_1gvn_1_1GVNLegacyPass.html

Wesley Wiser (Jan 15 2020 at 11:31, on Zulip):

Yeah lol

Wesley Wiser (Jan 15 2020 at 11:32, on Zulip):

Here's the tracking bug for New GVN https://bugs.llvm.org/show_bug.cgi?id=30995

oli (Jan 15 2020 at 11:32, on Zulip):

// This pass performs global value numbering to eliminate fully redundant
// instructions. It also performs simple dead load elimination.
//
// Note that this pass does the value numbering itself; it does not use the
// ValueNumbering analysis passes.

Wesley Wiser (Jan 15 2020 at 11:36, on Zulip):

Something kind of interesting about the deeply-nested test case: before inlining, we have 34 basic blocks and 35 locals, after inlining we have only 16 basic blocks but we now have 53 locals.

oli (Jan 15 2020 at 11:37, on Zulip):

that doesn't sound like something that should end up exploding llvm

oli (Jan 15 2020 at 11:37, on Zulip):

do you have a mir dump available that I could look at?

Wesley Wiser (Jan 15 2020 at 11:37, on Zulip):

Sure

Wesley Wiser (Jan 15 2020 at 11:38, on Zulip):

https://gist.github.com/wesleywiser/1a01122c920a8771a64a59847cfdf272

Wesley Wiser (Jan 15 2020 at 11:39, on Zulip):

There's also the weird extra scope { }s that the inliner produces

oli (Jan 15 2020 at 11:44, on Zulip):

We could just start optimizing out random things until llvm morale improves

oli (Jan 15 2020 at 11:46, on Zulip):

e.g. assignments of zst locals are pretty much useless. We could just eliminate them, even generic uses of PhantomData could just be killed

Wesley Wiser (Jan 15 2020 at 11:48, on Zulip):

Hmm

Wesley Wiser (Jan 15 2020 at 11:48, on Zulip):

Let me dump the rest of the method after the other optimization passes. I would think ConstProp and SimplifyLocals would already do that.

oli (Jan 15 2020 at 11:48, on Zulip):

also... why did Iterator::chain not get inlined

oli (Jan 15 2020 at 11:50, on Zulip):

I guess Chain::new(self, other.into_iter()) is either too complex to inline or it bailed out due to other reasons

Wesley Wiser (Jan 15 2020 at 11:50, on Zulip):

[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:4728 ~ core[e46f]::iter[0]::traits[0]::iterator[0]::Iterator[0]::chain[0]), substs: [std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Chain<std::iter::Empty<()>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>>, std::iter::Empty<()>], bb: bb32, location: SourceInfo { span: test.rs:4:14: 20:24, scope: scope[0] } } [cost=106 > threshold=100]

oli (Jan 15 2020 at 11:57, on Zulip):

hmm...

Wesley Wiser (Jan 15 2020 at 11:57, on Zulip):

Oh const prop doesn't handle it because it's an assignment on a projection of a local https://github.com/rust-lang/rust/blob/632387f38dfbac0f2b8b8900c840fff7f1fb888e/src/librustc_mir/transform/const_prop.rs#L828

oli (Jan 15 2020 at 11:58, on Zulip):

oh heh...

oli (Jan 15 2020 at 11:59, on Zulip):

so... the inlining cost is computed on the polymorphic MIR of a function, and all locals with unknown type size (so all generic locals) have cost 10

oli (Jan 15 2020 at 11:59, on Zulip):

also function calls have cost 25

oli (Jan 15 2020 at 11:59, on Zulip):

so the two function calls + a few locals would quickly build up to 106

Wesley Wiser (Jan 15 2020 at 12:00, on Zulip):

imo the cost model was just a place holder since the pass was never really enabled

Wesley Wiser (Jan 15 2020 at 12:00, on Zulip):

but I'm not the original author so I can't say for sure

oli (Jan 15 2020 at 12:00, on Zulip):

yea

Wesley Wiser (Jan 15 2020 at 12:00, on Zulip):

I don't think there's ever been any tweaks made to it

oli (Jan 15 2020 at 12:12, on Zulip):

Well, the costs themselves are probably taken from llvm: https://llvm.org/doxygen/namespacellvm_1_1InlineConstants.html

oli (Jan 15 2020 at 12:15, on Zulip):

but llvm has no hard threshold

oli (Jan 15 2020 at 12:17, on Zulip):

llvm has fun things like https://github.com/llvm/llvm-project/blob/064087581ab98cca7254b4d0f12ecbed13da2692/llvm/lib/Analysis/InlineCost.cpp#L1250 though

Wesley Wiser (Jan 15 2020 at 12:19, on Zulip):

That seems likely

oli (Jan 15 2020 at 12:20, on Zulip):

ah

oli (Jan 15 2020 at 12:20, on Zulip):

https://github.com/llvm/llvm-project/blob/064087581ab98cca7254b4d0f12ecbed13da2692/llvm/lib/Analysis/InlineCost.cpp#L49

oli (Jan 15 2020 at 12:20, on Zulip):

so the default threshold is 225 in llvm

oli (Jan 15 2020 at 12:21, on Zulip):

and much higher values for hot calls

oli (Jan 15 2020 at 12:21, on Zulip):

and much lower values for cold calls

oli (Jan 15 2020 at 12:21, on Zulip):

so uh :D

Wesley Wiser (Jan 15 2020 at 12:22, on Zulip):

We should play with the inlining costs?

oli (Jan 15 2020 at 12:22, on Zulip):

we may be confusing llvm by having a higher threshold for cold calls than they do and a lower one for hot calls

Wesley Wiser (Jan 15 2020 at 12:22, on Zulip):

Hm

oli (Jan 15 2020 at 12:22, on Zulip):

well, idk if we can solve this without actually putting a lot of mindpower into it

Wesley Wiser (Jan 15 2020 at 12:23, on Zulip):

What do you mean by confusing llvm?

Wesley Wiser (Jan 15 2020 at 12:23, on Zulip):

We don't pass any of this data to llvm from the inliner

oli (Jan 15 2020 at 12:23, on Zulip):

right, what I mean is we now inline cold calls that llvm didn't see inlined before

Wesley Wiser (Jan 15 2020 at 12:23, on Zulip):

So the only thing that should be happening is that we don't inline enough

Wesley Wiser (Jan 15 2020 at 12:24, on Zulip):

ah

oli (Jan 15 2020 at 12:24, on Zulip):

well we do https://github.com/rust-lang/rust/blob/632387f38dfbac0f2b8b8900c840fff7f1fb888e/src/librustc_mir/transform/inline.rs#L257

oli (Jan 15 2020 at 12:24, on Zulip):

which is pretty close for the current threshold, so

oli (Jan 15 2020 at 12:24, on Zulip):

that's probably not the cause either

oli (Jan 15 2020 at 12:24, on Zulip):

idk, how to debug this

oli (Jan 15 2020 at 12:25, on Zulip):

just fiddling with the inlining cost seems fragile

Wesley Wiser (Jan 15 2020 at 12:25, on Zulip):

Should we perhaps not inline #[cold] functions at all?

oli (Jan 15 2020 at 12:25, on Zulip):

if llvm does it, there's no reason for us not to do it

oli (Jan 15 2020 at 12:27, on Zulip):

oh

oli (Jan 15 2020 at 12:27, on Zulip):

one thing that may be relevant

oli (Jan 15 2020 at 12:28, on Zulip):

can you check the llvm IR vs the llvm IR on nightly?

oli (Jan 15 2020 at 12:28, on Zulip):

maybe llvm now optimizes better

Wesley Wiser (Jan 15 2020 at 12:28, on Zulip):

After the LLVM optimizations run or before?

oli (Jan 15 2020 at 12:28, on Zulip):

after

oli (Jan 15 2020 at 12:29, on Zulip):

like I'm wondering if llvm optimizes away the entire test now

oli (Jan 15 2020 at 12:30, on Zulip):

if that entire functions ends up with a Box::new(()) and nothing else, then I'd say the compile-time perf hit is kinda warranted

Wesley Wiser (Jan 15 2020 at 12:32, on Zulip):

So I have some random changes I'm playing with an something in them fixed it?

Wesley Wiser (Jan 15 2020 at 12:32, on Zulip):

Stashing for now and I'll track down what fixed it later

oli (Jan 15 2020 at 12:33, on Zulip):

wat

Wesley Wiser (Jan 15 2020 at 12:33, on Zulip):

It must have been this

diff --cc src/librustc_mir/transform/inline.rs
index f4e47b01cb3,f4e47b01cb3..2971347ec12
--- a/src/librustc_mir/transform/inline.rs
+++ b/src/librustc_mir/transform/inline.rs
@@@ -40,6 -40,6 +40,13 @@@ struct CallSite<'tcx>
  impl<'tcx> MirPass<'tcx> for Inline {
      fn run_pass(&self, tcx: TyCtxt<'tcx>, source: MirSource<'tcx>, body: &mut BodyAndCache<'tcx>) {
          if tcx.sess.opts.debugging_opts.mir_opt_level >= 1 {
++            // Don't run the `Inliner` on bodies that are very large because LLVM does not handle
++            // them very well and increasing the number of blocks can drastically increase compile
++            // time.
++            if body.basic_blocks().len() > 100 {
++                return;
++            }
++
              Inliner { tcx, source }.run_pass(body);
          }
      }
oli (Jan 15 2020 at 12:34, on Zulip):

huh, but I thought bodies with that many blocks would have been eliminated by https://github.com/rust-lang/rust/blob/632387f38dfbac0f2b8b8900c840fff7f1fb888e/src/librustc_mir/transform/inline.rs#L335 increasing the cost by 5 per block

Wesley Wiser (Jan 15 2020 at 12:35, on Zulip):

Yeah

Wesley Wiser (Jan 15 2020 at 12:35, on Zulip):

Eh, hang on I've undone my change and it's still compiling fast

Wesley Wiser (Jan 15 2020 at 12:35, on Zulip):

I'm doing something wrong

oli (Jan 15 2020 at 12:36, on Zulip):

already commited some changes?

Wesley Wiser (Jan 15 2020 at 12:36, on Zulip):

No, I'm on the same commit as my branch

Wesley Wiser (Jan 15 2020 at 12:36, on Zulip):

Does build -i not play nice with git stash?

oli (Jan 15 2020 at 12:37, on Zulip):

idk

oli (Jan 15 2020 at 12:37, on Zulip):

I have incremental = true in config.toml

Wesley Wiser (Jan 15 2020 at 12:38, on Zulip):

I'm cleaning and rebuilding

Wesley Wiser (Jan 15 2020 at 13:23, on Zulip):

wut

Wesley Wiser (Jan 15 2020 at 13:24, on Zulip):

If I compile rustc with ./x.py build --stage 1 -i src/libstd it is fast but if I compile without the -i, it is very slow.

oli (Jan 15 2020 at 15:12, on Zulip):

wtf

oli (Jan 15 2020 at 15:13, on Zulip):

oh

oli (Jan 15 2020 at 15:14, on Zulip):

maybe it's because of the hack in inlining that prevents cycles

Wesley Wiser (Jan 15 2020 at 15:20, on Zulip):

Ah, that's a good hypothesis!

Zoxc (Jan 15 2020 at 17:11, on Zulip):

I presume we don't handle debug information when inlining yet?

Wesley Wiser (Jan 15 2020 at 17:11, on Zulip):

No, that's what's breaking the ui/backtrace-debuginfo.rs test

Wesley Wiser (Jan 30 2020 at 03:48, on Zulip):

can you check the llvm IR vs the llvm IR on nightly?

The IR is exactly the same.

What's even more interesting is that if I mess with -C codegen-units, the compile time regression seems to go away. With codegen-units < 3, the regression goes away. With codegen-units >= 3, the regression appears.

Passing -Z time-llvm-passes shows this:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 16.3174 seconds (16.2279 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  12.9047 ( 79.7%)   0.0039 (  3.2%)  12.9086 ( 79.1%)  12.9085 ( 79.5%)  Global Value Numbering #3
   1.1998 (  7.4%)   0.0000 (  0.0%)   1.1998 (  7.4%)   1.1998 (  7.4%)  Dead Store Elimination #3
   0.2420 (  1.5%)   0.0038 (  3.1%)   0.2458 (  1.5%)   0.2458 (  1.5%)  Combine redundant instructions #12
   0.2086 (  1.3%)   0.0148 ( 12.1%)   0.2234 (  1.4%)   0.2233 (  1.4%)  Function Integration/Inlining #3
   0.1406 (  0.9%)   0.0039 (  3.2%)   0.1445 (  0.9%)   0.1444 (  0.9%)  Memory SSA #3
....
Wesley Wiser (Jan 30 2020 at 14:11, on Zulip):

The IR is exactly the same.

To be clear: The IR for the test program is the same. I have no idea what the std library IR looks like.

Wesley Wiser (Jan 30 2020 at 14:12, on Zulip):

I'm trying to add support for enabling the LLVM time-trace feature which hopefully will give us more details. My C++ code is seg faulting though :laughter_tears:

Zoxc (Jan 30 2020 at 18:35, on Zulip):

That time-trace looks very Clang-ish. Does it have much LLVM stuff?

Wesley Wiser (Jan 30 2020 at 18:36, on Zulip):

I think so? https://reviews.llvm.org/rL357340

Zoxc (Jan 30 2020 at 18:38, on Zulip):

Aw, looks like it isn't a callback based interface. Having LLVM stuff in -Zself-profile would be neat.

Wesley Wiser (Jan 30 2020 at 18:45, on Zulip):

Yeah

Wesley Wiser (Jan 30 2020 at 18:46, on Zulip):

I'm not sure how useful it will be yet

Wesley Wiser (Jan 30 2020 at 18:46, on Zulip):

If it actually helps, we may want to see if we can change it to be callback based as you say.

Zoxc (Jan 30 2020 at 18:49, on Zulip):

I guess you could also just parse the JSON it produces too and generate events from that? But having a timeTraceProfilerInitializeWithCallbacksfunction might be easier.

Zoxc (Jan 30 2020 at 18:52, on Zulip):

Also looks like it's designed for a single thread only

Zoxc (Jan 30 2020 at 18:53, on Zulip):

So use -j1 when testing it =P

Wesley Wiser (Jan 30 2020 at 18:53, on Zulip):

Ah

Wesley Wiser (Jan 30 2020 at 18:54, on Zulip):

Well that's why I'm getting a seg fault then

Wesley Wiser (Jan 30 2020 at 18:54, on Zulip):

and the issue doesn't repro under -C codegen-units=1

andjo403 (Jan 30 2020 at 22:06, on Zulip):

you can try to use the https://github.com/rust-lang/rust/pull/68406 and then compile with RUSTFLAGS="-Znew-llvm-pass-manager=y -Zself-profile -Zself-profile-events=llvm

andjo403 (Jan 30 2020 at 22:06, on Zulip):

is needs the new passmanager in llvm but maybe can give some info anyway

Wesley Wiser (Jan 30 2020 at 22:07, on Zulip):

Oh, that's helpful

Wesley Wiser (Jan 30 2020 at 22:07, on Zulip):

Thanks!

andjo403 (Jan 30 2020 at 22:10, on Zulip):

and if you do not want the new passmanager I have a supper old hack for it in https://github.com/andjo403/rust/tree/llvmProfiler but that needs to recompile llvm as some changes was needed there

Wesley Wiser (Jan 30 2020 at 22:12, on Zulip):

I see there is a NewGVN pass in LLVM but I have no idea if it's any better than the current one

andjo403 (Jan 30 2020 at 22:24, on Zulip):

about changing the codegen-units resulting in compile time regressions I found that the partitioning is not that good tried to fix part of it in https://github.com/rust-lang/rust/pull/65281 and some more info in https://github.com/rust-lang/rust/issues/64913

Wesley Wiser (Jan 30 2020 at 22:34, on Zulip):

Do you think there's blocking or something happening between cgu partitions?

andjo403 (Jan 30 2020 at 22:38, on Zulip):

no I think that you only get unlucky with the partitioning the smallest change in the estimated size can change alot of what is in the same cgu and due to that is it possible that one cgu gets alot larger and due to this affects the compile time

andjo403 (Jan 30 2020 at 22:41, on Zulip):

had crates that got up to 20% faster to compile by only changing the partitioning algorithm

Wesley Wiser (Jan 31 2020 at 13:21, on Zulip):

It took longer than I hoped but I have a self contained repro:

struct Empty(std::marker::PhantomData<u8>);

struct Chain<A, B> {
    a: A,
    b: B,
    which: bool,
}

fn main() {
    let _x: Box<Hint> = Box::new(empty()
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()));
}

fn empty() -> Empty {
    Empty(std::marker::PhantomData)
}

impl Hint for Empty {
    fn hint(&mut self) -> (usize, Option<usize>) {
        (0usize, Some(0usize))
    }
}

impl<A: Hint, B: Hint> Hint for Chain<A, B> {
    //#[inline(always)]
    fn hint(&mut self) -> (usize, Option<usize>) {
        match self.which {
            true => {
                self.which = true;
                self.a.hint()
            },
            false => {
                let (a_lower, a_upper) = self.a.hint();
                let (b_lower, b_upper) = self.b.hint();

                let lower = a_lower.saturating_add(b_lower);

                let upper = match (a_upper, b_upper) {
                    (Some(x), Some(y)) => x.checked_add(y),
                    _ => None,
                };

                (lower, upper)
            }
        }
    }
}

trait Hint {
    fn hint(&mut self) -> (usize, Option<usize>);

    fn chain<B: Hint>(self, other: B) -> Chain<Self, B> where Self: Sized {
        Chain { a: self, b: other, which: false }
    }
}
Wesley Wiser (Jan 31 2020 at 13:22, on Zulip):

If you uncomment the //#[inline(always)] line, it compiles very slowly but if you comment it out, it compiles very quickly.

Wesley Wiser (Jan 31 2020 at 13:22, on Zulip):

The issue still repros if you use the -Z no-parallel-llvm flag so I was able to capture an LLVM time trace

Wesley Wiser (Jan 31 2020 at 13:31, on Zulip):

https://www.speedscope.app/#profileURL=https%3A%2F%2Fgist.githubusercontent.com%2Fwesleywiser%2F3daf5710dfeaba66ae4f4d001cb36761%2Fraw%2Fd4d8141493f496e93574e7bd74335fdbd96f2b17%2Fllvm.json&title=llvm

Wesley Wiser (Jan 31 2020 at 13:32, on Zulip):

Which shows that most of the time taken is in the Chain::hint() function running GVN.

andjo403 (Jan 31 2020 at 15:21, on Zulip):

Have you compared the LLVM time trace with and without the PR so that the time is not moved from one cgu to another

Wesley Wiser (Jan 31 2020 at 15:32, on Zulip):

Do you mean PR 68406? I haven't had a chance yet. I'm hoping to get to it tonight.

andjo403 (Jan 31 2020 at 15:47, on Zulip):

Hmm think that I was not mening PR test with the different codegen-units and check that the time is not moved from one cgu to another

Wesley Wiser (Jan 31 2020 at 15:54, on Zulip):

Oh gotcha. I've tried with cgu={1,2,3,4,5,8} and cgu={1,2} does not experience the issue. cgu >= 3 does.

Wesley Wiser (Jan 31 2020 at 15:54, on Zulip):

Is there a way I can see what ended up in the cgu?

bjorn3 (Jan 31 2020 at 16:29, on Zulip):

@Wesley Wiser You could add println!() in the right place in the partitioning code, or you could use -Cno-prepopulate-passes to disable all llvm optimizations (including inlining), while keeping all rust optimizations enabled and -Csave-temps to keep all object files. You can then just look at the defined symbols in each object file.

Wesley Wiser (Jan 31 2020 at 16:31, on Zulip):

Brilliant, thanks!

Wesley Wiser (Feb 01 2020 at 19:38, on Zulip):

@andjo403 It looks pretty much the same as what I was able to get out of the time-trace thing. Having it integrated with the self-profiler is awesome though! Screenshot-from-2020-02-01-14-37-02.png

andjo403 (Feb 01 2020 at 19:40, on Zulip):

that was the plan hope it can help some one and it also do not force you to use -Z no-parallel-llvm

Wesley Wiser (Feb 01 2020 at 19:41, on Zulip):

Yeah, it works great!

Wesley Wiser (Feb 01 2020 at 19:41, on Zulip):

Having de-mangled function names is also a plus

andjo403 (Feb 01 2020 at 19:43, on Zulip):

yes it is nice

andjo403 (Feb 01 2020 at 19:44, on Zulip):

was you able to see some difference in the logs when it was slow vs fast?

Wesley Wiser (Feb 01 2020 at 19:48, on Zulip):

No, it looks pretty much the same

Wesley Wiser (Feb 01 2020 at 19:48, on Zulip):

I'm playing the the -Csave-temps flag and the llvm dissembler to see if there's anything interesting in the cgu

andjo403 (Feb 01 2020 at 19:51, on Zulip):

is it even more than one cgu created for that small program? can only see one in the speedscope log

Wesley Wiser (Feb 01 2020 at 19:53, on Zulip):

I believe so because I have stuff like test2.test2.hash-cgu-0.* and test2.test2.hash-cgu-1.* in the working directory

Wesley Wiser (Feb 01 2020 at 21:29, on Zulip):

So the only difference between the slow and fast test case I came up with at the IR level is that we pass the inline hint along to llvm

Wesley Wiser (Feb 01 2020 at 21:29, on Zulip):

However, when LLVM does the inlining, size_hint() balloons to have 327,680 locals

Wesley Wiser (Feb 01 2020 at 21:32, on Zulip):

Actually, I'm wrong. Those are only the trivially assigned variables.

Wesley Wiser (Feb 01 2020 at 21:37, on Zulip):

The function actually has 4,915,132 locals

Wesley Wiser (Feb 01 2020 at 21:37, on Zulip):

So it's not surprising GVN is falling over

andjo403 (Feb 01 2020 at 21:53, on Zulip):

but how is codegen-units affecting is the inline hint is passed to llvm or not? or is that an other fault?

andjo403 (Feb 01 2020 at 21:54, on Zulip):

if I want to reproduce this do I need some special PR or is master enough?

Wesley Wiser (Feb 01 2020 at 21:55, on Zulip):

I'm not sure yet

Wesley Wiser (Feb 01 2020 at 21:55, on Zulip):

Oh

Wesley Wiser (Feb 01 2020 at 21:56, on Zulip):

Ok

Wesley Wiser (Feb 01 2020 at 21:56, on Zulip):

When I rebased I lost my in progress changes the enabled the inliner

Wesley Wiser (Feb 01 2020 at 21:56, on Zulip):

Then -C codegen-units thing doesn't seem to make a difference once we've enabled the MIR inliner

Wesley Wiser (Feb 01 2020 at 21:59, on Zulip):

To recap:

  1. Cherry-pick this patch https://github.com/rust-lang/rust/pull/68213/commits/a3db6204e6d3d1d7fc994de4fb76a7f1d23a7e4f
  2. Build with ./x.py build --stage 1 src/libstd NOTE: if you build with -i you can't repro the issue
  3. Try compiling the test example I posted above. You'll need to uncomment the //#[inline(always)] line with rustc +stage1 -O repro.rs
andjo403 (Feb 01 2020 at 22:01, on Zulip):

ok will try to do that
by the way have rebased the llvm profiling PR and fixed the comments

Wesley Wiser (Feb 01 2020 at 22:02, on Zulip):

Awesome! I took a look real quick and it looks good to me but since mw left the feedback, I'd like him to sign off.

Wesley Wiser (Feb 01 2020 at 22:03, on Zulip):

Can we rebase it without the changes from #68170 or is it dependent on those to compile?

Wesley Wiser (Feb 01 2020 at 22:03, on Zulip):

If not that's fine but we'll have to wait for that PR to merge before we can merge this one

Wesley Wiser (Feb 01 2020 at 22:05, on Zulip):

I wonder if this (compile-time regression) is maybe expected? We've tweaked the std library's inline annotations for years without MIR inlining. Now that we're enabling that, there might be functions that have become just small enough to trigger LLVM to inline that were before too large.

andjo403 (Feb 01 2020 at 22:05, on Zulip):

yes it is still based on 68170 so that needs to be merged first

Wesley Wiser (Feb 01 2020 at 22:05, on Zulip):

Maybe we just need to undo a few key #[inline] hints?

Wesley Wiser (Feb 01 2020 at 22:06, on Zulip):

Ah, ok. That's fine :)

Wesley Wiser (Feb 01 2020 at 22:06, on Zulip):

Gotta run :wave:

andjo403 (Feb 01 2020 at 22:07, on Zulip):

ok will maby look at it tomorrow kind of late here also

andjo403 (Feb 02 2020 at 11:49, on Zulip):

I get the same behavior with and without cherry pick and almost the same times with different codegen-units even tried with stage2 and the same thing

Last update: Sep 18 2020 at 20:45UTC