Hey @Zoxc -- I was wondering what the status was of the optimizations and PRs you had pending here. I was thinking about how to get forward motion on the plan we had laid out, and thinking about
(a) trying to advertise the plan on my blog
and (b) trying to recruit someone (or multiple someones) to help keep it organized. So, specifically, to convert the plan into work items, to track their status, write announcements, collate responses, and perhaps do some "around the edges" tasks like writing up the shell script.
The PRs have landed, but they didn't impact the parallel compiler overhead much.
I have created https://github.com/rust-lang/rust/pull/60035 in the meantime which gets rids of the lock when marking dep nodes green. It seems to make the parallel compiler faster for clean incremental cases though I haven't done extensive testing yet.
OK. What are the most impressive measurements you've seen @Zoxc? Can you point me at them again? i.e., if you wanted to sell this work and show a 222x speedup, what would you point at =)
I'd have to fake some benchmarks for a 222x speedup. Or import 1000 core CPUs from the future.
I think this is the latest perf run: https://perf.rust-lang.org/compare.html?start=3750348daff89741e3153e0e120aa70a45ff5b68&end=29efa193a1a6b5480ece517b29cc2a2ad035aa25&stat=wall-time
It uses 8 threads though (4 might be faster).
And I'd find a CPU with more cores to bench with if you want greater speedups =P
Would you argee that you have the "engineering effort largely in hand" @Zoxc, or do you think that it'd be helpful to have some more folks involved with those aspects specifically?
Draft of possible post -- just have to decide where to post it. Maybe internals.
cc @mw @Alex Crichton :point_up:
@nikomatsakis I'd say yes since I can't think of any easy tasks for people to help with =P
OK, that was my expectation.
It seems like there's still a small-ish but non-trivial amount of work involved in just getting the various bits of plumbing setup -- e.g., environment variables or what have you.
@nikomatsakis We should merge this soundness fix though https://github.com/rust-lang/rustc-rayon/pull/2
Zoxc is presently experimenting with ways to reduce that overhead. I'm not sure this is entirely accurate =P
Heh. As in, you're not experimenting -- or just don't really have any ideas?
Do you feel the overhead is "low enough"?
I don't have many ideas (which would improve the overhead when using 1 thread) and I'm not currently working on any of them.
I have ideas which would increase the overhead by speeding other parts the compiler which doesn't use atomic operations =P
@nikomatsakis that post looks good to me
I would be good to get an idea of what the overhead with 1 thread is at the moment. Do we have somewhat recent numbers there?
No but maybe i'll update the post to talk about trying to track those numbers :)
@mw Here is the overhead with 1 thread (from 6 days ago) https://perf.rust-lang.org/compare.html?start=22fa4bb0ebdfe9fcd7962f1fa6e758c036c878e6&end=e09f51168143d6b8a4da083242dbec77ef81ee2c&stat=wall-time
Hi all, I'm lwshang, recently joined the rust-lang org on Zulip. As suggested by @nikomatsakis , I come here to help organize the progress and "push it over the finish line". I read the posts, caught up all the conversation happened here since April 10th and looked through mentioned PRs. I believe I have got a good understanding on what's going on and what we want to achieve in the short future. Let me summarize my take-aways and please correct me if I have any mistakes.
1. In short, we want to make the parallel rustc a default behavior on July 4 or so. Before that we need to go through two test phases and collect perf information during the process.
2. We have criterions for each phases. As long as the schedule move forward and the corresponding criterion is met, we should be able to move on to the next stage.
Currently, almost all the tests about parallel rustc have been done by @Zoxc , which utilized the rustc-perf infrastructure. To push to the next stage, the opt-in phase, we must meet the criterion for it. The criterion was not stated explicitly in the post. But I can imply some necessary condition from the threshold for the actual experimental phase. We should guarantee that the comparison from rustc-perf shows no "major regression".
The latest such comparison provided by Zoxc still shows some tests can't pass the threshold. So it seems we still need to improve the technique itself for now. Or maybe we have some improvement during past two weeks, then of course we can move on to the opt-in experimental phase.
No matter the technique is ready or not, we should be able to prepare those tools/scripts that we want users to use during the phases. This is what we can start to address now and of course requiring some more discussion here to do it rightly and efficiently.
Hi @Zoxc , have there been any improvements which reduce overhead recently? Could you produce the latest comparison between parallel with one thread and native un-parallel rustc?
There hasn't been any significant improvements lately. https://github.com/rust-lang/rust/pull/60035 is looking good for incremental cases though
It seems that we all agree on the "threshold" for experiment phases. Are we now confident enough that, for those comparison items at least, the parallel-rustc is "ready" to be propagated to public?
Sorry I've not been able to follow-up here. I did however propose "the plan" here as a possible meeting topic, see compiler-team#82