I posted some initial benchmarks at https://internals.rust-lang.org/t/plan-to-test-parallel-rustc/11487/15 and https://internals.rust-lang.org/t/plan-to-test-parallel-rustc/11487/16 , and they look good!
The difference in output sizes are surprising. Do these also happen with
Also you didn't mention how many cores your CPU has. You may want to specify a higher thread count to rustc (it defaults to a maximum of 4).
So simon's question on internals about "what's still serial" led me to investigate a little
-Z self-profile it's actually really apparent what's still serial (yay!)
I compiled Cargo itself with
-Z self-profile with our parallel nightly, and I'm actually pretty surprised by the results
this is the overall picture: Screen-Shot-2019-12-18-at-12.51.36-PM.png
type checking is suprisingly not really parallel at all Screen-Shot-2019-12-18-at-12.51.59-PM.png
and partitioning is only really barely parallel -- Screen-Shot-2019-12-18-at-12.52.35-PM.png
is a lot of this work sort of just not inherently that parallel? or are things not getting stolen?
this was just a single crate build via
cargo +nightly-2019-12-18 rustc --lib -- -Z self-profile after a full build, so it was only building the one library crate
I'm not sure I know how to interpret those graphs
what do "parallel things" look like?
on the left there are thread ids
and I think we're only parallel if we have multiple thread ids with simultaneous bars
Yeah, that's correct. As a concrete example, if you look in the first image "Screen-Shot-2019-12-18-at-12.51.36-Pm.png", there are two purple bars on Thread 3 (see left hand gutter) around 1350 - 1380ms (see top bar for timeline) and a purple bar and a green bar at the same time on Thread 4. The two overlapping purple bars and the overlapping purple and green bars mean threads 3 & 4 were doing work at the same time.
@Alex Crichton Only wf checking and item-bodies checking is parallel within type_check_crate.
@Wesley Wiser Why doesn't
-Z self-profile include
@Zoxc No one's asked for it before :slight_smile:
I'm sure I mentioned it on some PR, since there's locations where there's both a self-profile timer and
time call, which seems a bit redundant.
@Josh Triplett Maybe you could do some benchmarks while bumping the shard count (You can adjust it in
sharded.rs). It's only been tuned for my 8-core R7 1700 =P