@Alex Crichton has kindly offered to look into providing CI-generated parallel rustc binaries.
the plan is to make CI build binaries for Linux, macOs, and Windows and have instructions for downloading them in a convenient way
as a follow up step, it would be great if we could use those binaries for setting up some external regression testing that compiles a number of projects and checks if the output is the same as the one from the non-parallel compiler.
(doing this as part of regular CI does not seem to be feasible at the moment)
cc @Zoxc @nikomatsakis
@mw is there a general tracking issue for parallel queries?
I'm gonna open a tracking issue on rust-lang/rust for this
@Alex Crichton There's a tacking issue for parallel queries in #48685
Thanks! I've posted a first pass at https://github.com/rust-lang/rust/pull/59417, and we'll see what comes out of that to keep testing
Ok so I had an idea with @nikomatsakis at Rust LATAM which was "what if we just enabled parallel on by default right now?". I was thinking we might not necessarily turn on the actual multithreaded part, but what if we started shipping parallel-capable compilers right now but continued to default the number of threads to 1 while we're working out all the bugs?
I ran a
@bors: try build on https://github.com/rust-lang/rust/pull/59417#issuecomment-479013998 and it turns out the slowdown, while there, is quite small at 2-3% across the board.
That seems quite impressive in terms of slowdown, and makes me think that we'd be perfectly suited to enable parallel compilation right now (ok, maybe after the next release on April 11) for testing
that way we won't need any crazy support with try or not, and we can just have normal nightly testing like we have for everything else
@Zoxc and @mw, thoughts?
Works for me; it means an extra sip of coffee while the compiler is running. :wink:
Ah this was brought up in the PR but the 2-3% number is actually incorrect, that's number of instructions where the more interesting metric for htis PR is wall time, and wall time regresses by 10% mostly which I think is probably too bad to land in rustc nightly right now
so less actionable than I thought :(
Although honestly the major regressions are all in tiny crates that take 1s extra to compile it looks like
as opposed to larger crates also only regressing by a few seconds, showing up as a large-ish percentage
so perhaps not that bad after all...
Do you know what is major cause of the 10% regression? I mean, is it drive access, too much swapping, etc.? If there was an extra flag in rustc/cargo to enable self-profiling and shipping the profiling data back to a central location for further analysis, that could be useful (Apple did something like this when they were switching from the old OS 9/Carbon APIs to the OS X APIs; in their case they were just getting information about the APIs being used by developers, but I can see having a simple way of shipping the profiling data back for analysis being useful here to see what areas need to be concentrated on 'in the wild').
are there wins too?
@Cem Karan I'm not sure, but I've pigned @Zoxc who may know more
@mw I pinned the number of threads to 1 because it sounds like we're not ready for true parallelism yet
so this was a test of "can we take a small hit to build in parallel mode always and just not use it"
so there were no improvements, only regressions
(as expected though)
hm, results from here seem to be mixed: https://github.com/rust-lang/rust/pull/59530
@mw oh that's a separate PR with a different strategy (turning on parallel rustc and defaulting threads to num_cpus)
you're looking at wall time as well, right?
it's still got some regressions but overall looks quite good
Yeah, I was wondering if that was an option: make parallel the default and also letting it take advantage of the parallelism.
but some of the regressions look quite severe. might be a bad interaction with the jobserver?
Cem Karan I'm not sure, but I've pigned Zoxc who may know more
@Zoxc Do you know any more about these regressions? Just curious as to what the cause is...
parking_lot::Mutex is the main cause. Atomic ops are slower
Got it, thanks.
I feel like those percentages... hmm. They are borderline. It may be worth it. It probably depends a bit on how far off we think the actual parallel execution is.