Stream: t-compiler/wg-pipelining

Topic: internals post


Alex Crichton (May 17 2019 at 16:08, on Zulip):

Cargo support is now in nightly! I'm drafting up an internals post and will post it here soon

Alex Crichton (May 17 2019 at 16:09, on Zulip):

Ok, posted!

Gankro (May 17 2019 at 16:24, on Zulip):

@Alex Crichton rustup update only gets me 2019-05-16

Alex Crichton (May 17 2019 at 16:25, on Zulip):

oh I should probably say that

Alex Crichton (May 17 2019 at 16:25, on Zulip):

I just guessed

Gankro (May 17 2019 at 16:25, on Zulip):

? so 16 has pipelining?

Alex Crichton (May 17 2019 at 16:28, on Zulip):

rustc 1.36.0-nightly (7d5aa4332 2019-05-16) that has pipelining

Alex Crichton (May 17 2019 at 16:28, on Zulip):

I think it's rustup update nightly-2019-05-17

Alex Crichton (May 17 2019 at 16:47, on Zulip):

@nikomatsakis with lark that just made pipelined compilation a sealed and done deal for me

Alex Crichton (May 17 2019 at 16:47, on Zulip):

those numbers are "this is what rustc devs will see once we switch everything to rlibs"

nikomatsakis (May 17 2019 at 16:47, on Zulip):

@Alex Crichton I was surprised

Alex Crichton (May 17 2019 at 20:19, on Zulip):

I'm collecting all the results in a spreadsheet here - https://docs.google.com/spreadsheets/d/1CU7o3IocPtNAUevvrPsTI77z_AexsRloUgiusiXJWtg/edit?usp=sharing

Alex Crichton (May 17 2019 at 20:19, on Zulip):

from the internals post

simulacrum (May 17 2019 at 20:21, on Zulip):

if you want you can make the google spreadsheet coloring be gradient which .. could be helpful? (i.e., from light green for low % wins to darker for high % wins)

Alex Crichton (May 17 2019 at 20:22, on Zulip):

oh nice

Alex Crichton (May 17 2019 at 20:22, on Zulip):

/me pokes

Alex Crichton (May 17 2019 at 20:27, on Zulip):

nice, that does look much better

Alex Crichton (May 17 2019 at 21:03, on Zulip):

So an interesting result I'm seeing is that this feels like some compelling data that we want to do the next step of pipelining

Alex Crichton (May 17 2019 at 21:03, on Zulip):

which is that cargo should be able to pipeline compliations that end in the linker

Alex Crichton (May 17 2019 at 21:04, on Zulip):

(e.g. compiling a binary or a proc-macro)

Alex Crichton (May 17 2019 at 21:04, on Zulip):

projects can always be reorganized to have a tiny crate which is the binary which basically just serves the purpose of calling the linker

Alex Crichton (May 17 2019 at 21:04, on Zulip):

but it's a bummer to have to do that manually

Alex Crichton (May 17 2019 at 21:04, on Zulip):

although this is also the same w/ sccache where rlibs are just inherently easier to work with than linked artifacts

simulacrum (May 17 2019 at 21:10, on Zulip):

fwiw I think there's already a general desire/feeling in the community to do the tiny binary split and big library backing

simulacrum (May 17 2019 at 21:11, on Zulip):

so that might be somewhat low priority -- I wonder how difficult it would be to get stats on that

simulacrum (May 17 2019 at 21:11, on Zulip):

maybe lines of code in main.rs after expansion (i.e., including modules) and lib.rs or something along those lines

Alex Crichton (May 17 2019 at 21:32, on Zulip):

that's true yeah, a bunch of stuff is already architected this way but I know of big stuff which isn't at least

Alex Crichton (May 17 2019 at 21:32, on Zulip):

e.g. unit tests can take awhile to compile, cargo's binary is relatively big, sccache's is huge,e tc

Alex Crichton (May 17 2019 at 21:33, on Zulip):

same w/ serde_derive

simulacrum (May 17 2019 at 21:39, on Zulip):

yeah, I think proc macros would probably benefit quite a bit from that layout

simulacrum (May 17 2019 at 21:40, on Zulip):

since generally they're written inline, somewhat unlike binary/library split since proc macro internals aren't really public, unlike lots of binary internals

Gankro (May 18 2019 at 19:42, on Zulip):

@Alex Crichton you typoed the environment variable in one place in your post and it's causing chaos lol

Alex Crichton (May 18 2019 at 20:03, on Zulip):

oopsie

Alex Crichton (May 20 2019 at 14:49, on Zulip):

So continuing to collate data, it looks like pipelining across the board is not a regression on any front, it averages a 10% reduction in build time, and we're seeing up to twice as fast builds

Alex Crichton (May 20 2019 at 22:09, on Zulip):

I've opened a dedicated tracking issue now for stabilizing pipelined compilation given how compelling the data is

nnethercote (May 20 2019 at 22:41, on Zulip):

@Alex Crichton Thanks for the measurements and internals post. Initial results are promising, though the improvements are not as high/universal as I had hoped.

nnethercote (May 20 2019 at 22:42, on Zulip):

@Alex Crichton But I can't manage to get the 2019-05-17 Cargo to test. rustup update nightly gives me 2019-05-15, as does rustup update nightly-2019-05-17. What am I missing?

nnethercote (May 20 2019 at 22:43, on Zulip):

my rustc is 2019-05-19 after rustup update nightly, but the cargo is older

Jeremy Fitzhardinge (May 20 2019 at 22:45, on Zulip):

@nnethercote Were all the non-results on small numbers of cores? I think its reasonable to expect that Cargo + codegen-units is pretty good at keeping a small number of cores busy. The main wins will be when there's currently a lot of idle cores.

nnethercote (May 20 2019 at 22:45, on Zulip):

I don't know. I have a 14-physical-core machine, I plan to test the full rustc-perf benchmark (~30 programs) once I can

Alex Crichton (May 20 2019 at 23:03, on Zulip):

@nnethercote if you rustup update nightly it should be good enough

Alex Crichton (May 20 2019 at 23:03, on Zulip):

the commit/date reported by -V is the commit date not the build date

Alex Crichton (May 20 2019 at 23:04, on Zulip):

@nnethercote I agree it's not quite as good as I hoped, but still well within the range of "worth the stabilization effort"

Alex Crichton (May 20 2019 at 23:04, on Zulip):

I think there's actually even more to be gained by going a step further and pipelining up to the linker

Alex Crichton (May 20 2019 at 23:04, on Zulip):

but we'd want to gather more data first

Alex Crichton (May 20 2019 at 23:05, on Zulip):

it's also worth keeping in mind what I mentioned in the first post on the thread that incremental builds are likely to see a much bigger benefit than whole crate builds

Alex Crichton (May 20 2019 at 23:05, on Zulip):

but everyone's only been measuring whole crate builds

Alex Crichton (May 20 2019 at 23:05, on Zulip):

which makes the most sense of course because it's the easiest thing to do

nnethercote (May 20 2019 at 23:05, on Zulip):

@Alex Crichton oh, ok, thanks for the clarification. I definitely agree it's worth stabilizing, and there may be room for more improvement, e.g. by generating metadata in parallel with type-checking/borrow-checking.

Alex Crichton (May 20 2019 at 23:08, on Zulip):

@nnethercote I'd sort of love to get to a world where Cargo can spawn literally all rustc instances for a crate graph immediately, and then rustc just queries cargo every now and then for "wake me up when this is ready" or "this is ready" and Cargo orchestrates the notifications

Alex Crichton (May 20 2019 at 23:09, on Zulip):

like I could imagine rustc being a black box to Cargo and it just says "needs Vec<String> or "produced String" and Cargo just wakes things up as necessary assuming all the strings are unique and all

Alex Crichton (May 20 2019 at 23:09, on Zulip):

that way we could at least parse the whole crate graph in parallel

Alex Crichton (May 20 2019 at 23:09, on Zulip):

would of course be much more difficult to implement :)

Alex Crichton (May 20 2019 at 23:10, on Zulip):

I guess I'm also thinking that like if a crate requires a procedural macro it actually can progress largely through the whole resolution phase of the compiler until it finally needs the macro

Alex Crichton (May 20 2019 at 23:10, on Zulip):

I dunno, these may all be small wins

nnethercote (May 20 2019 at 23:16, on Zulip):

@Alex Crichton All interesting ideas! Anyway, I should be pleased with the currently results, given that they are quite good for what is the absolute simplest implementation :) I will do the rustc-perf measurements over the next couple of days, including incremental. I'll do Firefox as well.

Jeremy Fitzhardinge (May 20 2019 at 23:25, on Zulip):

@Alex Crichton I've been thinking about a model where a graph orchestration engine can start a graph of compilers and then start feeding them incremental input at the keystroke level (ie, rather than linking cargo into rls, make it cheap enough for rls to invoke "real" builds). That would require lots of fine-grained control over what outputs you want from each compilation step, which perhaps changes dynamically.

Alex Crichton (May 20 2019 at 23:26, on Zulip):

@Jeremy Fitzhardinge that does indeed sound like the dream :)

Alex Crichton (May 20 2019 at 23:26, on Zulip):

jturner was talking about that at the last all-hands

Alex Crichton (May 20 2019 at 23:26, on Zulip):

certainly an ambitious goal :)

Jeremy Fitzhardinge (May 20 2019 at 23:30, on Zulip):

I'm very interesting in continuing the conversation about how Buck and Cargo can work together. My project for the back half of the year will be trying to auto-generate buck build rules from Cargo.toml for crates.io, and there's a number of improvements to Cargo's model I'd like to discuss in that context.

Gankro (May 21 2019 at 03:08, on Zulip):

@Alex Crichton wait, what is this pipelining up to if not the linker?

simulacrum (May 21 2019 at 03:11, on Zulip):

@Gankro If I understood you correctly, then we currently pipeline split after metadata (i.e., same output as you get from cargo check) is done but LLVM and linker has not yet run

Gankro (May 21 2019 at 03:12, on Zulip):

so this is proposing splitting the llvm+link step into two parts? What could we possibly execute only once llvm was done, but before linking is done?

simulacrum (May 21 2019 at 03:14, on Zulip):

er, not sure what you mean -- the stable/no flags cargo has no pipelining

simulacrum (May 21 2019 at 03:15, on Zulip):

(maybe I wasn't clear)

simulacrum (May 21 2019 at 03:15, on Zulip):

i.e., we always wait for LLVM and linking to finish if we're running it (debug/release builds)

Gankro (May 21 2019 at 03:15, on Zulip):

alex said "I think there's actually even more to be gained by going a step further and pipelining up to the linker"

simulacrum (May 21 2019 at 03:16, on Zulip):

I believe that might be referring to us not pipelining the final binary -- which can often be quite large

simulacrum (May 21 2019 at 03:16, on Zulip):

but not sure

simulacrum (May 21 2019 at 03:16, on Zulip):

I'm also .. not entirely sure what that would mean

Gankro (May 21 2019 at 03:17, on Zulip):

oh right, the diagram does say we wait for all our libs to be fully codegen'd before even starting to compile the binary, which is odd yeah

simulacrum (May 21 2019 at 03:17, on Zulip):

but I guess in theory we could run LLVM before dependency LLVMs finish?

Alex Crichton (May 21 2019 at 13:51, on Zulip):

@Gankro yeah so the method of pipelining is pretty course and selective now, only when an rlib depends on another rlib can we pipeline those two compilations

Alex Crichton (May 21 2019 at 13:51, on Zulip):

we, for example, can't pipeline an executable depending on a bunch of other rlibs

Alex Crichton (May 21 2019 at 13:51, on Zulip):

(or any linked artifact for that matter)

Alex Crichton (May 21 2019 at 13:51, on Zulip):

similarly we can't pipeline anything depending on a build script or a procedural macro

Alex Crichton (May 21 2019 at 13:51, on Zulip):

so the general idea is just enabling more paralellism by having better synchronization between rustc/cargo

Gankro (May 21 2019 at 13:53, on Zulip):

cool

Marco (May 23 2019 at 06:54, on Zulip):

Why do the compiler invocations in https://github.com/rust-lang/compiler-team/blob/master/working-groups/pipelining/NOTES.md#step-2-work-with-only-metadata-as-input require rustc libA.rs --emit metadata,link --crate-type lib instead of rustc libA.rs --emit metadata --crate-type lib?
I can see the the libA.rmetaproduced is different sometimes, but that's surprising.

Fiddling with it locally and run into an ICE w/ the latter:

>>  rustc a.rs --emit=metadata --crate-type=lib
>> rustc b.rs --emit=link --crate-type=rlib --extern=a=liba.rmeta
error: internal compiler error: src/librustc_mir/monomorphize/collector.rs:775: Cannot create local mono-item for DefId(13:12 ~ a[8787]::number[0])

thread 'rustc' panicked at 'Box<Any>', src/librustc_errors/lib.rs:637:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error


note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.36.0-nightly (37ff5d388 2019-05-22) running on x86_64-unknown-linux-gnu

note: compiler flags: --crate-type rlib
Marco (May 23 2019 at 06:56, on Zulip):

I was looking to sketch the invoke-rustc-twice approach in Bazel, but ran into that immediately

Alex Crichton (May 23 2019 at 16:47, on Zulip):

@Marco currently in rustc metadat is different if you emit just metadata vs if you emit metadata + a lib

Alex Crichton (May 23 2019 at 16:47, on Zulip):

we also want to use one rustc process to emit both

Alex Crichton (May 23 2019 at 16:47, on Zulip):

( so we don't need one rustc process for metadata and one for the lib)

Marco (May 23 2019 at 17:36, on Zulip):

Is it intentionally different? Does it make sense that it's non-deterministically different?

I wanted to see if the redundant metadata generation work in doing 2 rustc invocations from bazel is still a benefit, since that is substantially simpler to implement.

nnethercote (May 24 2019 at 04:14, on Zulip):

@Alex Crichton I did some measurements: https://internals.rust-lang.org/t/evaluating-pipelined-rustc-compilation/10199/62?u=nnethercote

Alex Crichton (May 24 2019 at 14:17, on Zulip):

Wow @nnethercote that's quite comprehensive! Thanks for gathering all that!

Alex Crichton (May 24 2019 at 14:17, on Zulip):

FWIW rustc sees no benefit b/c it has no pipelining opportunities

Alex Crichton (May 24 2019 at 14:18, on Zulip):

and I think in general that's why pipelining isn't as great as we expected is that there's just fewere pipelining opportunities than we originally though

Jeremy Fitzhardinge (May 24 2019 at 21:07, on Zulip):

@Marco I also want to prototype pipelining with separate invocations, so it would be useful to be able to do --emit metadata which generates the same .rmetas as --emit metadata,rlib. I wonder if -Zalways-encode-mir makes up the difference?

Eric Huss (May 24 2019 at 21:37, on Zulip):

@Jeremy Fitzhardinge I don't think -Zalways-encode-mir will be enough (you can read more about that at https://github.com/rust-lang/rust/issues/58465#issuecomment-479032740), unless it has been changed recently. I also would expect running the compiler twice would be substantially slower, since the second invocation would need to repeat all the work of the first (unless you intend to do incremental everywhere?). On average, the codegen portion only covers about 30% of compile time, so it would be repeating the first 70% for every crate.

Marco (May 24 2019 at 23:37, on Zulip):

The thread doesn't say whether -Zalways-encode-mir suffices or not. Running the compiler twice should still enable a speedup if there's unused parallelism (and it is much easier to implement/experiment with in Bazel this way), but it's not strictly better like the alternative.

Has changing compilation unit to module instead of crate come up in this vein? In the abstract (ie. in ignorance) it seems like a similar way to make more pipelining possible.

nnethercote (May 26 2019 at 22:35, on Zulip):

@Alex Crichton why does rustc have no pipelining opportunities? librustc takes so long that I was hoping to get some speedup there.

Zoxc (May 27 2019 at 00:42, on Zulip):

The time is spent in LLVM, which is parallel already. Might help with incremental when few LLVM modules change

Alex Crichton (May 28 2019 at 12:31, on Zulip):

@nnethercote rustc is entirely dylibs right now, which means that there are no pipelining opportunities

Alex Crichton (May 28 2019 at 12:31, on Zulip):

https://github.com/rust-lang/rust/pull/59800 is the solution for that

nnethercote (May 28 2019 at 22:39, on Zulip):

I see, thanks for the info

Marco (May 30 2019 at 18:13, on Zulip):

@Alex Crichton (asking this again:) Has changing compilation unit to module instead of crate come up in this vein? In the abstract (ie. in ignorance) it seems like a similar way to make more pipelining possible.

Alex Crichton (May 30 2019 at 20:59, on Zulip):

@Marco oh oops sorry must have missed this earlier! Currently we haven't considered that, mostly because it doesn't really fit cleanly into the compilation model today and would be a pretty significant undertaking

Alex Crichton (May 30 2019 at 20:59, on Zulip):

the current implementation of pipelining was actually very low effort for what is hoped to be quite a high reward (relative)

Alex Crichton (May 30 2019 at 20:59, on Zulip):

but you're right in that it's by no means the end-all-be-all of pipelining compilations, and there's a lot of theoretical possibilities for how we can improve things even more

Alex Crichton (May 30 2019 at 20:59, on Zulip):

it's mostly just a question of balancing that with the amoutn of effort needed to implement

Marco (May 30 2019 at 22:13, on Zulip):

Got it. Is there anywhere that has listed out some of those possibilities?

Alex Crichton (May 31 2019 at 14:02, on Zulip):

@Marco not currently, but we should probably make one!

Last update: Nov 15 2019 at 11:05UTC