Stream: t-compiler/wg-parallel-rustc

Topic: meeting 2019-12-09


simulacrum (Dec 09 2019 at 14:25, on Zulip):

I don't think I have any updates

simulacrum (Dec 09 2019 at 14:26, on Zulip):

I have not spent time on looking into making Cargo act as the jobserver for rustc (vs. the pipe, etc)

simulacrum (Dec 09 2019 at 14:26, on Zulip):

@WG-parallel-rustc do we think it's worth meeting today, given this?

simulacrum (Dec 09 2019 at 14:26, on Zulip):

(not sure if others have things to talk about)

Santiago Pastorino (Dec 09 2019 at 14:35, on Zulip):

not from my side, last week I were doing some MIR stuff

simulacrum (Dec 09 2019 at 16:46, on Zulip):

okay, let's tentatively cancel, but if someone feels that we should meet, then please say so in the next hour

Alex Crichton (Dec 09 2019 at 16:56, on Zulip):

Cancelling sounds ok with me

nikomatsakis (Dec 09 2019 at 20:09, on Zulip):

Hey all sorry i'm slow

nikomatsakis (Dec 09 2019 at 20:10, on Zulip):

but i'm also ok :)

nikomatsakis (Dec 09 2019 at 20:12, on Zulip):

that said, I see there's been a lot of activity, @simulacrum is it possible to summarize what's up?

nikomatsakis (Dec 09 2019 at 20:12, on Zulip):

I saw some discussion of possible bugs in the new rustc scheduler, or at least the jobserver integration with it?

simulacrum (Dec 09 2019 at 20:12, on Zulip):

Yeah, I can provide some summary

simulacrum (Dec 09 2019 at 20:15, on Zulip):

We've loosely concluded that the current jobserver, at least on linux, is showing fairly high contention inside the kernel (for read/write calls on the pipe), which we believe is due to ~all rustc threads waiting for a jobserver token getting woken up on every new token (vs. just one). Using a POSIX IPC semaphore here seems to resolve things, but is not viable due to being incompatible with make (and cmake, etc.).

We also discovered that at least the new rayon branch is incorrectly releasing the implicit (and only) token held by a rustc process when it goes to sleep due to lack of work; we have a tentative fix planned that'll be within rustc itself essentially avoiding the bug by just not ever releasing the implicit token.

simulacrum (Dec 09 2019 at 20:17, on Zulip):

For the contention, the current plan is that rustc will no longer directly connect to the jobserver, but rather call into Cargo (via some sort of pipe, possibly, or some other mechanism -- this is a bit unclear right now), and Cargo will issue the blocking reads/writes on the jobserver pipe. This should give us flexibility in terms of which rustc's get tokens and is more generally nice for getting control over scheduling tokens across the rustcs we spawn. That work is planned for myself to do, but I've been semi-avoiding it so far since it sounds like a relatively large task (and I need to spend some design time writing up a spec before digging in, I think).

simulacrum (Dec 09 2019 at 20:18, on Zulip):

@Santiago Pastorino is planning on doing the rayon bug mitigation, I think.

nikomatsakis (Dec 09 2019 at 20:18, on Zulip):

Hmm

nikomatsakis (Dec 09 2019 at 20:18, on Zulip):

OK

nikomatsakis (Dec 09 2019 at 20:18, on Zulip):

It seems like this might also impact folks attempting to run rustc outside of cargo

nikomatsakis (Dec 09 2019 at 20:18, on Zulip):

Though I guess we've never really "approved" of them using the jobserver, right?

simulacrum (Dec 09 2019 at 20:18, on Zulip):

Ah, so, in general the thought is that we can always fallback on the old mode

simulacrum (Dec 09 2019 at 20:19, on Zulip):

that's not too hard

nikomatsakis (Dec 09 2019 at 20:19, on Zulip):

(I remember at some point us -- or maybe just me -- thinking that it'd be nice if they could integrate into "standard" workflows)

simulacrum (Dec 09 2019 at 20:19, on Zulip):

it's just that the old mode might basically limit you to, say, 2-4 threads at most per rustc

nikomatsakis (Dec 09 2019 at 20:19, on Zulip):

the main motivation for interjecting cargo

nikomatsakis (Dec 09 2019 at 20:19, on Zulip):

is..well, what is it? :)

nikomatsakis (Dec 09 2019 at 20:20, on Zulip):

being able to have a richer vocab than just "give me and I'll block"?

simulacrum (Dec 09 2019 at 20:20, on Zulip):

having just one thread/process doing the writing into the jobserver pipe to mitigate the contention

nikomatsakis (Dec 09 2019 at 20:20, on Zulip):

OK.

nikomatsakis (Dec 09 2019 at 20:21, on Zulip):

We've loosely concluded that the current jobserver, at least on linux, is showing fairly high contention inside the kernel (for read/write calls on the pipe), which we believe is due to ~all rustc threads waiting for a jobserver token getting woken up on every new token (vs. just one).

do we believe this is because of rayon's "wake-up-the-world" scheduler behavior?

simulacrum (Dec 09 2019 at 20:23, on Zulip):

I think, well, maybe, but mostly no -- this is more so because linux itself will wake up all T*N (in the large core count case, up to ~800 threads) threads when a single byte is written into the jobserver pipe, where all are doing read(&mut [_; 1]) basically

nikomatsakis (Dec 09 2019 at 20:24, on Zulip):

Ok. Yeah I was thinking that I didn't quite see how it could be linked to rayon since rayon threads are basically all just blocked

nikomatsakis (Dec 09 2019 at 20:24, on Zulip):

i.e., the rayon events will be going out to the threads, and if they were sleeping they'd wake up, but they're not sleeping, they're blocking waiting for jobserver events

simulacrum (Dec 09 2019 at 20:24, on Zulip):

it's only linked to rayon insofar as rayon currently is not quite well supporting the "don't spawn T threads until you need them" / slow-start or so, but that's just exacerbating the problem rather than causing it most likely

nikomatsakis (Dec 09 2019 at 20:25, on Zulip):

yes. the new scheduler in principle would help with that specific part of the problem perhaps

simulacrum (Dec 09 2019 at 20:25, on Zulip):

to be clear, ~all benchmarks were done with your fork of rayon

nikomatsakis (Dec 09 2019 at 20:27, on Zulip):

yes, ok

nikomatsakis (Dec 09 2019 at 20:27, on Zulip):

it won't help beyond the benchmarks

nikomatsakis (Dec 09 2019 at 20:27, on Zulip):

since they're already using it :)

simulacrum (Dec 09 2019 at 20:29, on Zulip):

I think that's all -- I don't think we've made other progress etc

nikomatsakis (Dec 09 2019 at 20:35, on Zulip):

thanks @simulacrum :heart:

Last update: Jan 28 2020 at 01:15UTC