Stream: t-compiler/wg-parallel-rustc

Topic: truly parallel codegen


Alex Crichton (Oct 02 2019 at 18:15, on Zulip):

I think we touched on this briefly durning the planning meeting last time, but parallel rustc should enable us to actually enable truly parallel codegen for all cgus I think

Alex Crichton (Oct 02 2019 at 18:15, on Zulip):

where right now we codegen every CGU sequentially and then fork it off into the background to actually get llvm codegen'd

Alex Crichton (Oct 02 2019 at 18:15, on Zulip):

or well I should say truly parallel translation to LLVM IR, not codegen

simulacrum (Oct 02 2019 at 18:15, on Zulip):

My impression based on the comments in the codegen crates was that LLVM itself doesn't let us do that? Was that incorrect?

Alex Crichton (Oct 02 2019 at 18:16, on Zulip):

nah we can invoke LLVM in parallel as much as we like

simulacrum (Oct 02 2019 at 18:16, on Zulip):

Do we just not do it today because the compiler itself can't codegen in parallel since queries etc are single threaded?

Alex Crichton (Oct 02 2019 at 18:16, on Zulip):

we do that today for actual codegen/optimization

Alex Crichton (Oct 02 2019 at 18:16, on Zulip):

the only reason we don't translate to LLVM in parallel is that rustc's data structures don't support it

Alex Crichton (Oct 02 2019 at 18:16, on Zulip):

but this could be a pretty huge win for a parallel compiler

simulacrum (Oct 02 2019 at 18:16, on Zulip):

yeah, I somehow thought that LLVM did not support parallel 'IR creation'

Alex Crichton (Oct 02 2019 at 18:16, on Zulip):

because translation to IR is such a hgue time chunk in both debug and release builds

simulacrum (Oct 02 2019 at 18:16, on Zulip):

but yeah that sounds like a big win compared to today, maybe even the "ticket" so to speak

Alex Crichton (Oct 02 2019 at 18:17, on Zulip):

it would also, I think, radically simplify the backend

Alex Crichton (Oct 02 2019 at 18:17, on Zulip):

right now there's all this crazy jobserver stuff and threads flying around, but truly being parallel here would make things quite a bit easier

Alex Crichton (Oct 02 2019 at 18:17, on Zulip):

it is also a major departure from what we do today

Alex Crichton (Oct 02 2019 at 18:17, on Zulip):

so we wouldn't be able to capitalize on it until parallel compilation lands

Alex Crichton (Oct 02 2019 at 18:18, on Zulip):

but I think this is something we'll want to keep in mind, is that 100% parallel translation to LLVM IR is a win we have not actually unlocked yet

Alex Crichton (Oct 02 2019 at 18:18, on Zulip):

even if parallel rustc is built

simulacrum (Oct 02 2019 at 18:18, on Zulip):

indeed, yes -- so you think it's not viable to sort of have both at the same time? i.e., how we do with queries elsewhere?

Alex Crichton (Oct 02 2019 at 18:20, on Zulip):

well it's not really both at the same time

Alex Crichton (Oct 02 2019 at 18:20, on Zulip):

it's more of that rustc will at some point determine "here's N codegen units"

Alex Crichton (Oct 02 2019 at 18:21, on Zulip):

today we have this complicated way of one thread translates all N units to LLVM IR and then forks off work to background crates for codegen/optimization

Alex Crichton (Oct 02 2019 at 18:21, on Zulip):

whereas with a truly parallel rustc we can simply, like with rayon, have a "parallel for loop"

Alex Crichton (Oct 02 2019 at 18:21, on Zulip):

where we just translate to llvm ir, immediately optimize/codegen, and then go further

Alex Crichton (Oct 02 2019 at 18:21, on Zulip):

writing the synchronization point for LTO and/or ThinLTO would also be almost trivial

Alex Crichton (Oct 02 2019 at 18:21, on Zulip):

since you'd just wait for all codegen to be done, link everything, and then spin up another parallel for loop

Alex Crichton (Oct 02 2019 at 18:22, on Zulip):

there's a lot of complication today due to the inherent architecture

Alex Crichton (Oct 02 2019 at 18:22, on Zulip):

where you can't share rustc data structures across threads

Alex Crichton (Oct 02 2019 at 18:22, on Zulip):

but if we lift that restriction then all of a sudden we can write most of the translation backend in a much more natural way

simulacrum (Oct 02 2019 at 18:23, on Zulip):

Do we not already have that loop today though that could be "optionally parallel"?

simulacrum (Oct 02 2019 at 18:23, on Zulip):

Similar to how the rest of the compiler is written

Alex Crichton (Oct 02 2019 at 19:47, on Zulip):

hey sorry went away into a meeting

Alex Crichton (Oct 02 2019 at 19:47, on Zulip):

we don't have a straightforward loop for codegen, no

Alex Crichton (Oct 02 2019 at 19:48, on Zulip):

we have this -- https://github.com/rust-lang/rust/blob/f2023ac599c38a59f86552089e6791c5a73412d3/src/librustc_codegen_ssa/back/write.rs#L1059-L1193

Alex Crichton (Oct 02 2019 at 19:48, on Zulip):

it tries to balance work between the main thread and other threads

Alex Crichton (Oct 02 2019 at 19:48, on Zulip):

it gets significantly more complicated when most work is parallel but some work has to stick to one thread

simulacrum (Oct 02 2019 at 20:26, on Zulip):

yeah, that makes sense -- bit unfortunate that we don't have executors out there that already enable this, via two separate buckets of both Send tasks and non-Send tasks

simulacrum (Oct 02 2019 at 20:27, on Zulip):

though we probably know enough extra information that it wouldn't be ideal to just use an off the shelf solution even if it existed

Santiago Pastorino (Oct 02 2019 at 20:58, on Zulip):

but I think this is something we'll want to keep in mind, is that 100% parallel translation to LLVM IR is a win we have not actually unlocked yet

interesting!

nikomatsakis (Oct 07 2019 at 17:56, on Zulip):

@Alex Crichton I think this is definitely a goal -- the one major thing we have to be careful about I think is that we don't want to have too many LLVM modules created all at once, just to keep memory usage down.

nikomatsakis (Oct 07 2019 at 17:56, on Zulip):

but I think that the existing parallel compilation stuff totally supports this already

nikomatsakis (Oct 07 2019 at 17:56, on Zulip):

or am I missing something?

Alex Crichton (Oct 07 2019 at 22:12, on Zulip):

@nikomatsakis right yeah, but I think the jobserver plus rearchitecting things would basically fix that

Alex Crichton (Oct 07 2019 at 22:13, on Zulip):

today we have one thread, which if it runs as fast as possible, generates all llvm modules

Alex Crichton (Oct 07 2019 at 22:13, on Zulip):

and then those llvm modules sit idle in memory while we wait for parallelism to come optimize/codegen them

Alex Crichton (Oct 07 2019 at 22:13, on Zulip):

if we instead used one token at a time to take an entire llvm module through to completion

Alex Crichton (Oct 07 2019 at 22:13, on Zulip):

for example you translate to llvm ir, then immediately optimized, then immediately codegen

Alex Crichton (Oct 07 2019 at 22:13, on Zulip):

that would be the same solution we have now of trading off who does what

Alex Crichton (Oct 07 2019 at 22:14, on Zulip):

this, I think, is definitely an area of complexity that would simply "go away" if we switch to truly parallel codegen governed by a jobserver

Alex Crichton (Oct 07 2019 at 22:15, on Zulip):

currently we have to rate limit the main thread to stop codegen'ing and switch to optimizing to avoid producing too many llvm modules sitting in memory

Alex Crichton (Oct 07 2019 at 22:15, on Zulip):

but with truly parallel codegen no rate limiting is needed since we take an llvm module through to completion each time

Last update: Nov 17 2019 at 06:55UTC