Stream: t-compiler/wg-parallel-rustc

Topic: slowdown compiling Cargo


Alex Crichton (Sep 24 2019 at 20:59, on Zulip):

I was wondering, is this an alright place to discuss perf impact of parallel rustc nowadays? Or is an issue a better place? I just built a parallel rustc locally and compared it with the previous nightly, and I was curious to test out -Z timings and see what cpu usage looked like during the build, but it was unfortunately slower in that a parallel rustc took 74s for a full release build (nothing cached) compared to 62 on nightly

simulacrum (Sep 24 2019 at 21:06, on Zulip):

This seems like a perfect place.

That sounds somewhat expected, though -- I suspect we're pretty poorly tuned and overeager at grabbing jobserver tokens

simulacrum (Sep 24 2019 at 21:06, on Zulip):

You might see more success with e.g. RUSTFLAGS="-Zthreads=2"

Alex Crichton (Sep 24 2019 at 21:07, on Zulip):

Ok cool, so yeah taking a look at things I'm watching cpu usage on nightly and it's "all green" which I think means all in userspace

Alex Crichton (Sep 24 2019 at 21:08, on Zulip):

w/ parallel rustc it's "mostly red" which I think means most of the time is spent in the kernel

Alex Crichton (Sep 24 2019 at 21:08, on Zulip):

a quick perf shows a huge amount of time in the kernel

Alex Crichton (Sep 24 2019 at 21:08, on Zulip):

trying to track down what's where

Alex Crichton (Sep 24 2019 at 21:08, on Zulip):

I think the jobserver management may also be wrong?

Alex Crichton (Sep 24 2019 at 21:08, on Zulip):

when I ran a cargo build I got "45 (jobs=28 ncpu=28)"

Alex Crichton (Sep 24 2019 at 21:09, on Zulip):

er, that means that there were at most 45 rustc instances running in parallel

Alex Crichton (Sep 24 2019 at 21:09, on Zulip):

but the default, -j28, should have made it such that no more than 28 rustc instances were running

Alex Crichton (Sep 24 2019 at 21:09, on Zulip):

nightly does indeed not exceed 28

Alex Crichton (Sep 24 2019 at 21:10, on Zulip):

yeah a huge amount of time is spent acquiring/releaseing tokens

Alex Crichton (Sep 24 2019 at 21:10, on Zulip):

- 21.09% rustc_rayon_core::sleep::Sleep::sleep from perf

simulacrum (Sep 24 2019 at 21:11, on Zulip):

I am not really surprised -- I forget what our current management strategy is, but it might be something like "let's release/reacquire tokens whenever a rayon thread goes idle, which presumably happens quite often

Alex Crichton (Sep 24 2019 at 21:23, on Zulip):

is this something I should open an issue for?

simulacrum (Sep 24 2019 at 21:27, on Zulip):

I think at this point I would say no

simulacrum (Sep 24 2019 at 21:28, on Zulip):

we're not really at the point where performance is a (sufficient) concern

simulacrum (Sep 24 2019 at 21:28, on Zulip):

the current jobserver management strategy is unknown (I just pulled one out of thin air)

simulacrum (Sep 24 2019 at 21:28, on Zulip):

I think Zoxc might know it but not even sure about that

simulacrum (Sep 24 2019 at 21:29, on Zulip):

@Alex Crichton It might be good to open an issue about the jobserver management strategy though -- I don't know how we should do that, and discussing on an issue seems good

simulacrum (Sep 24 2019 at 21:29, on Zulip):

in particular I think we're going to need some more intelligent server than the current model allows to evenly distribute tokens between rustc instances

Alex Crichton (Sep 24 2019 at 21:29, on Zulip):

ok, I'll open an issue

simulacrum (Sep 24 2019 at 21:30, on Zulip):

e.g. if we have something like 8 cores we probably want 4 rustcs each with 2 threads (approximately) rather than 1 rustc with 8 threads, I'd imagine

Alex Crichton (Sep 24 2019 at 21:34, on Zulip):

https://github.com/rust-lang/rust/issues/64750

Alex Crichton (Sep 24 2019 at 21:38, on Zulip):

so additionally after focusing the call grpah a bit more

Alex Crichton (Sep 24 2019 at 21:38, on Zulip):

80% of rustc's time is spent in __GI_clone

Alex Crichton (Sep 24 2019 at 21:38, on Zulip):

which I think is spawning threads

Alex Crichton (Sep 24 2019 at 21:38, on Zulip):

does rustc spawn threads on demand or immediately?

simulacrum (Sep 24 2019 at 21:39, on Zulip):

hm, I thought it was immediately, but I think we leave it up to rayon -- maybe rayon is spinning up threads if they're idle?

simulacrum (Sep 24 2019 at 21:39, on Zulip):

@cuviper might know

cuviper (Sep 24 2019 at 21:41, on Zulip):

rayon creates threads for the global pool on first use

Alex Crichton (Sep 24 2019 at 21:41, on Zulip):

Capture.PNG

this is a small snippet of the perf sorted by self-time

cuviper (Sep 24 2019 at 21:41, on Zulip):

or if you create a manual ThreadPool, it's new threads for each time you do so

Alex Crichton (Sep 24 2019 at 21:41, on Zulip):

hm ok, I'll open an issue for that

simulacrum (Sep 24 2019 at 21:42, on Zulip):

@cuviper so to be clear we would expect num_cpus threads per ThreadPool creation, approximately, right?

simulacrum (Sep 24 2019 at 21:42, on Zulip):

@Alex Crichton I wonder if this is the jobserver threads -- IIRC, there's a thread spawn in that crate?

cuviper (Sep 24 2019 at 21:42, on Zulip):

the number is tunable, but it defaults to number of cpus, yes

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

@simulacrum yes there's one thread in the jobserver crate

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

but I suspect the 28 threads spawned by each rustc is dwarfing that

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

this is an aggregate of all rustc processes near the start of the build

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

(that profile I linked above)

cuviper (Sep 24 2019 at 21:43, on Zulip):

the pool is created here:
https://github.com/rust-lang/rust/blob/66bf391c3aabfc77f5f7139fc9e6944f995d574e/src/librustc_interface/util.rs#L209-L213

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

and so if you spawn 28 rustc's that each spawn 28 threads

Alex Crichton (Sep 24 2019 at 21:43, on Zulip):

that's a lot of threads to spawn very quickly

simulacrum (Sep 24 2019 at 21:43, on Zulip):

ah -- so we're spawning like 800 threads :)

Alex Crichton (Sep 24 2019 at 21:44, on Zulip):

the crate at the beginning of the crate graph are also very quick to compile, on the order of a hundred or so ms

Alex Crichton (Sep 24 2019 at 21:44, on Zulip):

so we're thrashing thread creation quite a lot

simulacrum (Sep 24 2019 at 21:44, on Zulip):

I wonder if we could get rayon to sort of "slow start" thread spawning

simulacrum (Sep 24 2019 at 21:45, on Zulip):

e.g. create the pool with size 1 and then if it detects work after 1 second grow to num_cpus

cuviper (Sep 24 2019 at 21:45, on Zulip):

we've talked about dynamic threads before, but that's ... challenging

cuviper (Sep 24 2019 at 21:46, on Zulip):

I guess you're not talking fully dynamic though, just lazy

simulacrum (Sep 24 2019 at 21:46, on Zulip):

right, yeah

Alex Crichton (Sep 24 2019 at 21:46, on Zulip):

https://github.com/rust-lang/rust/issues/64752

Zoxc (Sep 25 2019 at 13:14, on Zulip):

28 threads is also probably not the ideal number to use for rustc due to contention, etc.

cuviper (Sep 25 2019 at 17:26, on Zulip):

maybe rustc needs a heuristic max on its threads?

cuviper (Sep 25 2019 at 17:26, on Zulip):

just like codegen-units uses 16 regardless

nagisa (Sep 26 2019 at 02:16, on Zulip):

We also should have a minimum on job tokens that a rustc holds

nagisa (Sep 26 2019 at 02:16, on Zulip):

like a rustc should hold at least 1 token always.

nagisa (Sep 26 2019 at 02:17, on Zulip):

avoids spawning 56 rustcs on a 8 core system.

nikomatsakis (Sep 26 2019 at 17:46, on Zulip):

btw I remain fairly unconvinced that rayon is a good fit for us, in general

nikomatsakis (Sep 26 2019 at 17:46, on Zulip):

much as I love rayon of course :)

nikomatsakis (Sep 26 2019 at 17:46, on Zulip):

I think it might eventually be a good fit, but I sort of suspect we might also do better with some simpler setup for the time being

simulacrum (Sep 26 2019 at 19:02, on Zulip):

I think we might get away with managing the thread pool ourselves but otherwise using rayon, though I don't know how possible/realistic that is

Last update: Nov 17 2019 at 08:05UTC