I am likely not going to be able to make it today, but am unopposed if y'all want to meet.
I can't really make this time that much but
I definitely feel we need to agree on a plan and immediate steps
It still feels to me that we're moving in a few directions
As a related aside, one thing that @Santiago Pastorino and I were talking about was the idea of doing a "compile team lecture series" video (haven't done one of those in a while) about how the current support is working at a high level. @Santiago Pastorino felt it would be useful to them. I felt like I could do some of that, but I don't think I'd be the best choice, certainly not for all of it, but maybe a @simulacrum / @Alex Crichton / @Zoxc there could sort of be a joint effort to describe state?
i.e., somebody who runs but maybe tag teams @Alex Crichton to explain job server or something :P
But the other big issue I think is more about the overall philosophy and design we are shooting for. I raised this a bit in #t-compiler/wg-parallel-rustc > rustc-rayon extension, for example -- I think that in all the "compiler team" meetings I've been at, we've had a philosophy that we want to move to "vanilla rayon", and generally shoot for a simple setup overall to start, but I'm not sure @Zoxc what your take is on that. My feeling is you don't agree and it seems like we should hammer this out. =) I can imagine for example that maybe a simpler setup is just not going to get the perf we need, and I think e.g. the discussion about panic handling sounded relevant. I guess most of all I'd like to get some picture of the high-level setup we are shooting for in order to ship, so that we can place PRs in that context.
I owe Santiago a doodle poll for that as well as a bisect pr. I also have found that this time is not great for me.
I guess we can try to find a time to do such a session - to be honest, I can explain jobserver myself I think. Everything else I don't have a good handle on myself.
Er, bisect rustc explainer
Yes, I agree some consensus and hammering on overall design is necessary.
yeah, I've mentioned the idea to Mark too
also @simulacrum if you don't have a lot of time forget about the bisect thing, I can just sit down and prepare something and explain the rest of the people that need that
let me open a separate thread for that
I'm on the subway but I was thinking about this and I wanted to say -- I have the feeling @Zoxc that you do have something of a plan in your mind, or at least an idea of constraints that we're not all fully aware of. Maybe a good place to start would be that we can talk about the path that you have in mind (without trying to forge a consensus or pick anything yet), just so that we all understand each other more deeply?
That does seem a bit abstract =P
oof sorry on west coast where this time is a bit early for me, but catching up on stuff now. Always happy to chat about jobserver though! (and agreed on sync'ing with directions)
At some point, @simulacrum and I were tinkering with a hackmd trying to write out a proposal with core goals and some questions/answers
@Zoxc to make it concrete, maybe we can just try to lob specific questions your way
to start I would ask
- What do you see as the next steps to improve parallel performance, and/or what blockers are on your mind?
TyCtxtwould be very useful (to encode incremental state in the background and prefetching queries (like parsing a file when we see
In terms of avoiding regressions we might need to a flag that turns on parallelism/atomics (possibly on demand). We could turn off parallelism when building dependencies for example, which is more likely to saturate all cores.
- What criteria was used to decide where we insert parallelism like
Inserting it must improve performance with multiple threads (so there must be free cores at that point), should not affect single thread performance much, and it also must be efficient so CPU cores are not wasted on a single
rustc process when it could be doing something else.
How were locations for par_iter/parallel! as they exist today found? Those seem like good criteria for why not to insert it in some sense (or what to look at in detail when doing so), but how were these specific spots found? Just looking through the whole codebase?
-Z time-passes mostly, and my later PRs by
So there is no guiding principle beyond simple performance, if I follow you? i.e., we aren't trying to parallelize because of X where X is not performance?
(I'm struggling to come up with an example, but e.g., I could imagine some sort of reason like 'this is a loop and it must be parallel, even if we have no data to suggest so')