Basically we have this very specific optimization for const propping into the operands in function arguments: https://github.com/rust-lang/rust/blob/0701419e96d94e5493c7ebfcecb66511ab0aa778/src/librustc_mir/transform/const_prop.rs#L1102
I think we can remove most of that and add a visit_operand
method to the visitor that does the Operand::Copy
or Operand::Move
to Operand::Const
conversion if the local is a known constant
Thanks :thumbs_up:
Awesome! I was going to start this after doing Boolean identities but the more the merrier :)
The following code would appear to be incomplete, because
the functionOperand::place()
returnsNone
if the
Operand
is of the variantOperand::Constant
.
what should be done for Operand::Constant
here, afaict we go from Operand::Copy|Move
to Operand::Constant
so I don't see what we would do for constants if they appeared here
don't do anything for Operand::Const
it's already constant ^^ nothing to propagate here
yeah, but I don't understand that comment rn
oh there's a comment? :grinning:
I want to replace this with for opr in args { self.visit_operand(opr, location) }
oooh
I think we should just have walk_terminator
directly at the beginning of visit_terminator
then you don't need any replacement code for the thing you linked
don't we already use super_terminator
here?
do all mir-opt-tests run with mir-opt-level =2
by default?
this felt slightly too easy so I might have missed something here: #74507
O_o
I think you can remove the TerminatorKind::SwitchInt
arm, too
and the else
arm in Assert
:thumbs_up:
As a casual observer, I’d love for these things to be documented as expected to improve compile performance (eg reduced LLVM IR) improve runtime performance (eg reduced code in the final executable) or other.
But as a casual observer, I don’t want to make any demands 😇
it doesn't look like this improves the compile time https://github.com/rust-lang/rust/pull/74507#issuecomment-660685419
@oli do we want to restrict this to mir-opt-level = 2
?
I think we do
This might be related to CGU partitioning, which has been a problem in the past (see: the propagation step into function arguments)
@Wesley Wiser do you think that's the case here as well?
That this regresses because of the CGU partitioning scheme?
I think per https://github.com/rust-lang/compiler-team/issues/319 it needs to go under mir-opt-level=3
Oh, true!
:)
would it be worth investigating the regressions further?
because doing this opt would unlock a lot of further opportunities for optimization
looking at the worst test case it seems to spend 15s more in LLVM_lto_optimize
, 5 more in LLVM_thin_lto_import
and 8 more in LLVM_module_codegen_emit_obj
, could it be that this transformation is too aggressive and blowing up the size of code that needs to be generated?
there's been a design meeting on this CGU partitioning issue already, see this topic (I don't remember what the plan is, maybe the incremental compilation wg will look at it)
@Xavier Denis regarding the code generation: we seem to currently be at a local optimum, i.e. the code generation unit partitioning scheme for incremental compilation seems to be tuned in favor of how the compiler behaves today, and its performance gets easily broken by changes in how we generate code (e.g. with better MIR opts)
@Wesley Wiser was looking into it. If I remember correctly, they did a refactor of the current scheme to prepare it for future changes.
Yeah, I think this is something the wg-incr-comp group wants to work on
But, of course, anyone is welcome to dig in and look for solutions :slight_smile:
They were also experimenting with generating a lot more CGUs, so that incremental compilation didn't have to re-do as much work as it is doing today for trivial changes like adding a println!
Because big CGUs help for runtime performance, which we don't really care about in debug incremental compilation
I would bet that at least half of the current "implemented but disabled" mir-opts regress just from the CGU partitioning scheme alone x3
is there any way to debug CGU partitioning? ie: is it possible to see how the units are determined before & after a change?
No idea. That sounds like it could be super useful to debug the quality of the scheme, tho!
@Wesley Wiser please tell me if I'm pinging you too much, but I think you're like the only one in the team who's remotely familiar with the CGU partitioning step?
It's fine :)
I've been using RUSTC_LOG=rustc_mir::monomorphize::partitioning {rustc or cargo invocation here}
I believe there's also a -Z
flag which dumps some data on this
yes -Z print-mono-items=lazy
(or say, "eager") :)
(which I'm slowly working on, to try and make slightly more helpful for different use cases, like at least 1. seeing the gains of polymorphization AKA "david's use case", 2. seeing the advantages/drawbacks of our cgu partitioning scheme and process and diff that AKA "wesley/wg-incr's", 3. having more visibility into the amount/cost of monomorphizations and the likes AKA "niko/matklad's/everyone with long compile times")
it'd be awesome to print the size estimates of different items so that we could compare the before and after of a change like this and see what actually changed
restricted this to mir-opt-level=3. As we now have the same locations for all operand propagations, this also disables it for SwitchInt
and Assert
, which isn't great. Not sure how to deal with that without just closing this PR :sweat_smile:
breaks an incremental test
yea, see also my comment on the PR :wink:
let's not regress things while blocking only the advanced things under mir-opt-level 3
i still want to know why this happens though, we probably remove the changed span due to the opt
which seems interesting
oh, you mean "breaks a test" as in turning off propagation makes it mark less stuff as dirty?
more stuff is dirty
if the optimization is not run at all
right, that makes sense
how can I emit the final mir in a test?
I want to see what exactly changes here
if we optimize away an if false
in both cases, the contents of the branch don't matter
uh.... something something -Z emir=mir
?
using// EMIT_MIR rustc.try_identity.SimplifyLocals.after.mir
I alway use -Z dump-mir=all
and be done with it XD
Has anyone checked if the compile time regression went away with LLVM 11?
I don't think so. Feel free to open a PR trying out setting the mir-opt-level to 1 and ping me, I'll start a perf run
Thanks! Hope I did it correctly
The results are in, and at least there's no 24% regression anywhere, but I don't know if it counts as good enough or not.