Cargo support is now in nightly! I'm drafting up an internals post and will post it here soon
@Alex Crichton rustup update only gets me 2019-05-16
oh I should probably say that
I just guessed
? so 16 has pipelining?
rustc 1.36.0-nightly (7d5aa4332 2019-05-16) that has pipelining
I think it's
rustup update nightly-2019-05-17
@nikomatsakis with lark that just made pipelined compilation a sealed and done deal for me
those numbers are "this is what rustc devs will see once we switch everything to rlibs"
@Alex Crichton I was surprised
I'm collecting all the results in a spreadsheet here - https://docs.google.com/spreadsheets/d/1CU7o3IocPtNAUevvrPsTI77z_AexsRloUgiusiXJWtg/edit?usp=sharing
from the internals post
if you want you can make the google spreadsheet coloring be gradient which .. could be helpful? (i.e., from light green for low % wins to darker for high % wins)
nice, that does look much better
So an interesting result I'm seeing is that this feels like some compelling data that we want to do the next step of pipelining
which is that cargo should be able to pipeline compliations that end in the linker
(e.g. compiling a binary or a proc-macro)
projects can always be reorganized to have a tiny crate which is the binary which basically just serves the purpose of calling the linker
but it's a bummer to have to do that manually
although this is also the same w/ sccache where rlibs are just inherently easier to work with than linked artifacts
fwiw I think there's already a general desire/feeling in the community to do the tiny binary split and big library backing
so that might be somewhat low priority -- I wonder how difficult it would be to get stats on that
that's true yeah, a bunch of stuff is already architected this way but I know of big stuff which isn't at least
e.g. unit tests can take awhile to compile, cargo's binary is relatively big, sccache's is huge,e tc
same w/ serde_derive
yeah, I think proc macros would probably benefit quite a bit from that layout
since generally they're written inline, somewhat unlike binary/library split since proc macro internals aren't really public, unlike lots of binary internals
@Alex Crichton you typoed the environment variable in one place in your post and it's causing chaos lol
So continuing to collate data, it looks like pipelining across the board is not a regression on any front, it averages a 10% reduction in build time, and we're seeing up to twice as fast builds
I've opened a dedicated tracking issue now for stabilizing pipelined compilation given how compelling the data is
@Alex Crichton Thanks for the measurements and internals post. Initial results are promising, though the improvements are not as high/universal as I had hoped.
@Alex Crichton But I can't manage to get the 2019-05-17 Cargo to test.
rustup update nightly gives me 2019-05-15, as does
rustup update nightly-2019-05-17. What am I missing?
my rustc is 2019-05-19 after
rustup update nightly, but the cargo is older
@nnethercote Were all the non-results on small numbers of cores? I think its reasonable to expect that Cargo + codegen-units is pretty good at keeping a small number of cores busy. The main wins will be when there's currently a lot of idle cores.
I don't know. I have a 14-physical-core machine, I plan to test the full rustc-perf benchmark (~30 programs) once I can
@nnethercote if you
rustup update nightly it should be good enough
the commit/date reported by
-V is the commit date not the build date
@nnethercote I agree it's not quite as good as I hoped, but still well within the range of "worth the stabilization effort"
I think there's actually even more to be gained by going a step further and pipelining up to the linker
but we'd want to gather more data first
it's also worth keeping in mind what I mentioned in the first post on the thread that incremental builds are likely to see a much bigger benefit than whole crate builds
but everyone's only been measuring whole crate builds
which makes the most sense of course because it's the easiest thing to do
@Alex Crichton oh, ok, thanks for the clarification. I definitely agree it's worth stabilizing, and there may be room for more improvement, e.g. by generating metadata in parallel with type-checking/borrow-checking.
@nnethercote I'd sort of love to get to a world where Cargo can spawn literally all rustc instances for a crate graph immediately, and then rustc just queries cargo every now and then for "wake me up when this is ready" or "this is ready" and Cargo orchestrates the notifications
like I could imagine rustc being a black box to Cargo and it just says "needs Vec<String> or "produced String" and Cargo just wakes things up as necessary assuming all the strings are unique and all
that way we could at least parse the whole crate graph in parallel
would of course be much more difficult to implement :)
I guess I'm also thinking that like if a crate requires a procedural macro it actually can progress largely through the whole resolution phase of the compiler until it finally needs the macro
I dunno, these may all be small wins
@Alex Crichton All interesting ideas! Anyway, I should be pleased with the currently results, given that they are quite good for what is the absolute simplest implementation :) I will do the rustc-perf measurements over the next couple of days, including incremental. I'll do Firefox as well.
@Alex Crichton I've been thinking about a model where a graph orchestration engine can start a graph of compilers and then start feeding them incremental input at the keystroke level (ie, rather than linking cargo into rls, make it cheap enough for rls to invoke "real" builds). That would require lots of fine-grained control over what outputs you want from each compilation step, which perhaps changes dynamically.
@Jeremy Fitzhardinge that does indeed sound like the dream :)
jturner was talking about that at the last all-hands
certainly an ambitious goal :)
I'm very interesting in continuing the conversation about how Buck and Cargo can work together. My project for the back half of the year will be trying to auto-generate buck build rules from Cargo.toml for crates.io, and there's a number of improvements to Cargo's model I'd like to discuss in that context.
@Alex Crichton wait, what is this pipelining up to if not the linker?
@Gankro If I understood you correctly, then we currently pipeline split after metadata (i.e., same output as you get from cargo check) is done but LLVM and linker has not yet run
so this is proposing splitting the llvm+link step into two parts? What could we possibly execute only once llvm was done, but before linking is done?
er, not sure what you mean -- the stable/no flags cargo has no pipelining
(maybe I wasn't clear)
i.e., we always wait for LLVM and linking to finish if we're running it (debug/release builds)
alex said "I think there's actually even more to be gained by going a step further and pipelining up to the linker"
I believe that might be referring to us not pipelining the final binary -- which can often be quite large
but not sure
I'm also .. not entirely sure what that would mean
oh right, the diagram does say we wait for all our libs to be fully codegen'd before even starting to compile the binary, which is odd yeah
but I guess in theory we could run LLVM before dependency LLVMs finish?
@Gankro yeah so the method of pipelining is pretty course and selective now, only when an rlib depends on another rlib can we pipeline those two compilations
we, for example, can't pipeline an executable depending on a bunch of other rlibs
(or any linked artifact for that matter)
similarly we can't pipeline anything depending on a build script or a procedural macro
so the general idea is just enabling more paralellism by having better synchronization between rustc/cargo
Why do the compiler invocations in https://github.com/rust-lang/compiler-team/blob/master/working-groups/pipelining/NOTES.md#step-2-work-with-only-metadata-as-input require
rustc libA.rs --emit metadata,link --crate-type lib instead of
rustc libA.rs --emit metadata --crate-type lib?
I can see the the
libA.rmetaproduced is different sometimes, but that's surprising.
Fiddling with it locally and run into an ICE w/ the latter:
>> rustc a.rs --emit=metadata --crate-type=lib >> rustc b.rs --emit=link --crate-type=rlib --extern=a=liba.rmeta error: internal compiler error: src/librustc_mir/monomorphize/collector.rs:775: Cannot create local mono-item for DefId(13:12 ~ a::number) thread 'rustc' panicked at 'Box<Any>', src/librustc_errors/lib.rs:637:9 note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace. error: aborting due to previous error note: the compiler unexpectedly panicked. this is a bug. note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports note: rustc 1.36.0-nightly (37ff5d388 2019-05-22) running on x86_64-unknown-linux-gnu note: compiler flags: --crate-type rlib
I was looking to sketch the invoke-rustc-twice approach in Bazel, but ran into that immediately
@Marco currently in rustc metadat is different if you emit just metadata vs if you emit metadata + a lib
we also want to use one rustc process to emit both
( so we don't need one rustc process for metadata and one for the lib)
Is it intentionally different? Does it make sense that it's non-deterministically different?
I wanted to see if the redundant metadata generation work in doing 2 rustc invocations from bazel is still a benefit, since that is substantially simpler to implement.
@Alex Crichton I did some measurements: https://internals.rust-lang.org/t/evaluating-pipelined-rustc-compilation/10199/62?u=nnethercote
Wow @nnethercote that's quite comprehensive! Thanks for gathering all that!
FWIW rustc sees no benefit b/c it has no pipelining opportunities
and I think in general that's why pipelining isn't as great as we expected is that there's just fewere pipelining opportunities than we originally though
@Marco I also want to prototype pipelining with separate invocations, so it would be useful to be able to do
--emit metadata which generates the same .rmetas as
--emit metadata,rlib. I wonder if
-Zalways-encode-mir makes up the difference?
@Jeremy Fitzhardinge I don't think
-Zalways-encode-mir will be enough (you can read more about that at https://github.com/rust-lang/rust/issues/58465#issuecomment-479032740), unless it has been changed recently. I also would expect running the compiler twice would be substantially slower, since the second invocation would need to repeat all the work of the first (unless you intend to do incremental everywhere?). On average, the codegen portion only covers about 30% of compile time, so it would be repeating the first 70% for every crate.
The thread doesn't say whether
-Zalways-encode-mir suffices or not. Running the compiler twice should still enable a speedup if there's unused parallelism (and it is much easier to implement/experiment with in Bazel this way), but it's not strictly better like the alternative.
Has changing compilation unit to module instead of crate come up in this vein? In the abstract (ie. in ignorance) it seems like a similar way to make more pipelining possible.
@Alex Crichton why does rustc have no pipelining opportunities? librustc takes so long that I was hoping to get some speedup there.
The time is spent in LLVM, which is parallel already. Might help with incremental when few LLVM modules change
@nnethercote rustc is entirely dylibs right now, which means that there are no pipelining opportunities
https://github.com/rust-lang/rust/pull/59800 is the solution for that
I see, thanks for the info
@Alex Crichton (asking this again:) Has changing compilation unit to module instead of crate come up in this vein? In the abstract (ie. in ignorance) it seems like a similar way to make more pipelining possible.
@Marco oh oops sorry must have missed this earlier! Currently we haven't considered that, mostly because it doesn't really fit cleanly into the compilation model today and would be a pretty significant undertaking
the current implementation of pipelining was actually very low effort for what is hoped to be quite a high reward (relative)
but you're right in that it's by no means the end-all-be-all of pipelining compilations, and there's a lot of theoretical possibilities for how we can improve things even more
it's mostly just a question of balancing that with the amoutn of effort needed to implement
Got it. Is there anywhere that has listed out some of those possibilities?
@Marco not currently, but we should probably make one!