Stream: t-compiler/wg-parallel-rustc

Topic: jobserver improvements


simulacrum (Nov 26 2019 at 15:21, on Zulip):

quick update on jobserver work: afaict, the new(?) rayon is buggy and releases jobserver tokens it didn't acquire

simulacrum (Nov 26 2019 at 15:22, on Zulip):

which may have led to magical performance that we were seeing, uncertain

simulacrum (Nov 26 2019 at 15:22, on Zulip):

in any case, investigating to see if I can fix now

Zoxc (Nov 26 2019 at 15:25, on Zulip):

As in rustc-rayon, used on master?

simulacrum (Nov 26 2019 at 15:52, on Zulip):

no, the one we're trialing

simulacrum (Nov 26 2019 at 15:53, on Zulip):

@Alex Crichton Am I correct that we expect rustc to never release a jobserver token until after it's acquired it itself? i.e., it should not release the token that cargo acquired to spawn it

Alex Crichton (Nov 26 2019 at 15:53, on Zulip):

Yes

simulacrum (Nov 26 2019 at 15:54, on Zulip):

(in particular, rayon does this before blocking -- I'm not sure yet on what)

Zoxc (Nov 26 2019 at 16:00, on Zulip):

rustc can and will release the implicit token given to it by cargo though, which it didn't explicitly acquire.

simulacrum (Nov 26 2019 at 16:09, on Zulip):

@Zoxc I'm confused

simulacrum (Nov 26 2019 at 16:09, on Zulip):

@Alex Crichton just said that we should never do that

simulacrum (Nov 26 2019 at 16:10, on Zulip):

(and I tend to agree, otherwise we can end up with a bunch of rustcs even with -j1)

Zoxc (Nov 26 2019 at 16:23, on Zulip):

Ideally we shouldn't do it, but it's not buggy

simulacrum (Nov 26 2019 at 16:27, on Zulip):

well

simulacrum (Nov 26 2019 at 16:28, on Zulip):

I feel like it is buggy?

simulacrum (Nov 26 2019 at 16:28, on Zulip):

i.e., if you release that token then we're basically ending up in a situation where we can deadlock, right? since no one is able to make progress

simulacrum (Nov 26 2019 at 16:29, on Zulip):

you need each process to always have at least one token, otherwise it shouldn't do anything

Zoxc (Nov 26 2019 at 16:30, on Zulip):

It will immediately (or before) acquire the token back with the Rayon threadpool though.

simulacrum (Nov 26 2019 at 16:30, on Zulip):

but it can lose that race

simulacrum (Nov 26 2019 at 16:31, on Zulip):

at which point no one is going to make progress

Zoxc (Nov 26 2019 at 16:31, on Zulip):

How so?

simulacrum (Nov 26 2019 at 16:31, on Zulip):

hm, okay, I should clarify -- eventually, it'll make progress

simulacrum (Nov 26 2019 at 16:32, on Zulip):

but it'll stall until (potentially) N other rustcs try to finish

simulacrum (Nov 26 2019 at 16:32, on Zulip):

in any case it seems pretty clear that this behavior is pretty suboptimal

simulacrum (Nov 26 2019 at 16:32, on Zulip):

(since it means that -j1 is no longer guaranteed to have at most one rustc process, etc)

simulacrum (Nov 26 2019 at 16:33, on Zulip):

and that to me even seems like just flat "buggy"

simulacrum (Nov 26 2019 at 16:35, on Zulip):

I guess I won't try to fix it and just hope things actually work, even if I can't really tell due to this

Zoxc (Nov 26 2019 at 16:42, on Zulip):

rustc can also "stall" in the same way due to Rayon giving up tokens to one of the threads it's using while it's idle, or due to rustc waiting on a query which is currently computing on another thread.

Zoxc (Nov 26 2019 at 16:49, on Zulip):

It would be nice to ensure that rustc always keeps at least one token. There was code to do this in some of the cases, but I removed it since it was buggy (https://github.com/rust-lang/rust/pull/59804/files). To fix the transfer of the jobserver token from the main thread to the thread pool would probably require some less pretty Rayon changes (an API to enter a thread pool while holding a token to be given to the thread pool).

simulacrum (Nov 26 2019 at 16:51, on Zulip):

well, it'd mostly be a matter of using the thread pool on the side

simulacrum (Nov 26 2019 at 16:51, on Zulip):

e.g. you'd not have the 'main' thread be a threadpool thread

simulacrum (Nov 26 2019 at 16:51, on Zulip):

(which I think is largely true most of the time? e.g. in non-rustc land)

Zoxc (Nov 26 2019 at 16:51, on Zulip):

It has to be a threadpool thread though, for efficiency and for WorkerLocal to work.

Zoxc (Nov 26 2019 at 16:52, on Zulip):

Using it off the threadpool hits the slow paths in Rayon.

simulacrum (Nov 26 2019 at 16:58, on Zulip):

okay, but that seems like a solveable problem

simulacrum (Nov 26 2019 at 16:58, on Zulip):

maybe this is a good reason to not use rayon (as we've discussed a little)

Zoxc (Nov 26 2019 at 17:03, on Zulip):

No, any other thread pool should be designed the same way and we should be running the main rustc thread on it =P

Maybe an API for a rustc thread to become one of the Rayon threadpool threads would work? Or just an API for spawning and joining the threadpool at once should also probably suffice.

Zoxc (Nov 26 2019 at 17:11, on Zulip):

Actually we could just provide a parameter to the thread pool with how many jobserver tokens it starts with and skip acquiring tokens for some threads, and then have an API to enter the thread pool without giving up any tokens.

simulacrum (Nov 26 2019 at 17:13, on Zulip):

I don't really know what the APIs etc need to be like

simulacrum (Nov 26 2019 at 17:13, on Zulip):

but it does seem like we need to more deeply integrate token management into rayon

Zoxc (Nov 26 2019 at 17:27, on Zulip):

I guess we can just do nothing with jobserver tokens when entering a thread pool then. So we just need a flag which tells Rayon one of its threads don't need to release / acquire a jobserver token on startup / exit

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

Sorry haven't caught up on the discussion here, but it is incorrect for rustc to release its implicit token

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

and that is a bug we cannot ship with

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

part of the major purpose of the jobserver is to limit memory consumption

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

if you release the implicit token then it's possible to spawn unlimited rustc instances

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

and they're all just sitting idle consuming memory

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

for example if you run with -j1, only one will ever be doing work

Alex Crichton (Nov 26 2019 at 17:44, on Zulip):

but we may spawn hundreds of rustc instances in parallel

Alex Crichton (Nov 26 2019 at 17:45, on Zulip):

which can kill builds that rely on -j for limiting memory

Zoxc (Nov 26 2019 at 18:46, on Zulip):

That doesn't seem to be an easy property to guarantee

Alex Crichton (Nov 26 2019 at 18:51, on Zulip):

no, it is not, and this is one reason why the llvm backend is so complicated right now, it has to juggle around this "implicit token"

Zoxc (Nov 26 2019 at 19:00, on Zulip):

I'm guessing it would require fibers in Rayon to avoid giving up jobserver tokens when waiting on queries, but I'm not sure if that covers all cases. Fiber also can also cause excessive memory usage if not limited by waiting, so it's not a great fix.

simulacrum (Nov 26 2019 at 20:32, on Zulip):

@Alex Crichton in theory cf92e2ebf129f4fd8075650b21c612d196b7a2f8 should be a valid try commit (you want alt build as usual, and you'll need to rebuild cargo -- I'm not sure if we uploaded a copy)

simulacrum (Nov 26 2019 at 20:33, on Zulip):

or how to get that copy if we did

simulacrum (Nov 26 2019 at 20:34, on Zulip):

https://github.com/Mark-Simulacrum/cargo/commit/c8373fbde42777e389c889e8b16a1d5f823e7a68 should be enough to rebuild cargo so hopefully not too hard

Alex Crichton (Nov 26 2019 at 23:45, on Zulip):

@simulacrum awesome, thanks!

Alex Crichton (Nov 26 2019 at 23:46, on Zulip):

my computer with 28 cores got shut down today and is being shipped to me over the thanksgiving break

Alex Crichton (Nov 26 2019 at 23:46, on Zulip):

so I'llg et back to you next monday on the numbers for that

simulacrum (Nov 26 2019 at 23:46, on Zulip):

okay, sounds good

simulacrum (Nov 26 2019 at 23:46, on Zulip):

I tried to do some loose profiling of locks and such but that mostly failed (wasn't able to get low enough overhead / enough samples)

simulacrum (Nov 26 2019 at 23:47, on Zulip):

mostly pointed at env::var calls in MIR interpret and Cargo, along with Cargo's interner

simulacrum (Nov 26 2019 at 23:47, on Zulip):

neither of which seemed... plausible to actually be a hot spot

simulacrum (Nov 26 2019 at 23:47, on Zulip):

hopefully Santiago will have more success

Alex Crichton (Nov 27 2019 at 02:20, on Zulip):

Hm well env accesses are actually locked

Alex Crichton (Nov 27 2019 at 02:20, on Zulip):

So not entirely implausible if something is slamming env vars

Alex Crichton (Nov 27 2019 at 02:21, on Zulip):

We may need to instrument rustc to try to learn about contended locks

simulacrum (Nov 27 2019 at 02:28, on Zulip):

https://hackmd.io/CBdNMYStS_GH2MawrgARfw?view#lock-contention

simulacrum (Nov 27 2019 at 02:28, on Zulip):

I think this pretty much does not see parking_lot locks at all, which might explain high percetage here (though unclear -- the numbers from perf stat indicate that it sees all calls to futex, at least)

simulacrum (Nov 27 2019 at 02:29, on Zulip):

cargo::core::interning::InternedString::new, <rustc::mir::interpret::error::InterpErrorInfo as core::convert::From<...>>, cargo::util::profile::enabled_level

simulacrum (Nov 27 2019 at 02:31, on Zulip):

I think the cargo ones probably don't matter in practice

simulacrum (Nov 27 2019 at 02:32, on Zulip):

like, maybe we could get wins, but ultimately that's not important, cargo is mostly idle

simulacrum (Nov 27 2019 at 02:33, on Zulip):

<rustc::mir::interpret::error::InterpErrorInfo as core::convert::From<...>> I plan to file a PR switching it out for a lazy static or something like that

simulacrum (Nov 27 2019 at 02:38, on Zulip):

unfortunately our normal benchmarks (perf.rlo) probably won't notice, as the uncontended lock is presumably quite cheap

Alex Crichton (Nov 27 2019 at 03:23, on Zulip):

Cargo should be easy to fix yeah, the string one seems the most plausible

Alex Crichton (Nov 27 2019 at 03:23, on Zulip):

Can you swap out parking lot for libstd to get contention numbers?

simulacrum (Nov 27 2019 at 13:07, on Zulip):

hm, maybe, I can try that

simulacrum (Nov 27 2019 at 13:11, on Zulip):

std's APIs are different I think and sufficiently so that this might not be readily possible

simulacrum (Nov 27 2019 at 14:29, on Zulip):

@Alex Crichton also -- forgot to note this -- you'll want to run sudo rm -f /dev/shm/sem.jobserver-rust* between builds; obviously that's not tenable long-term but for now I believe is necessary-ish

Zoxc (Nov 27 2019 at 14:55, on Zulip):

sync.rs has a custom Lock wrapper, so API doesn't matter too much (LockGuard is reused though)

simulacrum (Nov 27 2019 at 14:57, on Zulip):

parking lot parks ~2000 times total in the first 3 seconds from raw_mutex...

simulacrum (Nov 27 2019 at 15:02, on Zulip):

~6000 total futex calls from parking lot

simulacrum (Nov 27 2019 at 15:02, on Zulip):

so parking lot is not the source of trouble

simulacrum (Nov 27 2019 at 15:02, on Zulip):

(this is on a run with perf stat reporting 457,198 futex calls)

simulacrum (Nov 27 2019 at 15:03, on Zulip):

@Zoxc yeah, std RwLock/Mutex don't support mapping afaict

simulacrum (Nov 27 2019 at 17:42, on Zulip):

initial findings suggest that contention is happening in:

simulacrum (Nov 27 2019 at 17:44, on Zulip):

utilized this patch to libstd to gather data: https://gist.github.com/Mark-Simulacrum/6dfa7678f2d449175aa1f3d8856340f7

simulacrum (Nov 27 2019 at 17:44, on Zulip):

it has fairly high overhead, though, so measurements may not be super reliable

simulacrum (Nov 27 2019 at 17:46, on Zulip):

overhead is much lower if I adjust the elapsed higher

simulacrum (Nov 27 2019 at 17:46, on Zulip):

(initial measurements were with ~100 nanoseconds)

simulacrum (Nov 27 2019 at 17:47, on Zulip):

but I can get ~65 job units in the 3 second run if I adjust to 300, which is about the same as I get without this

simulacrum (Nov 27 2019 at 17:51, on Zulip):

trying to run some more benchmarks and get a good sense of how reliable this is

simulacrum (Nov 27 2019 at 17:59, on Zulip):

I am consistently seeing stalls for hundreds of milliseconds in rustc_rayon_core::sleep::Sleep::wake_specific_thread

simulacrum (Nov 27 2019 at 17:59, on Zulip):

but maybe that's expected?

simulacrum (Nov 27 2019 at 18:03, on Zulip):

e.g., 51.508068ms; 772.124223ms; 773.960212ms; 58.569277ms; 951.101171ms; 51.476288ms; 866.302171ms; 867.230675ms

Alex Crichton (Dec 02 2019 at 17:15, on Zulip):

to make sure I understand this right, @simulacrum cf92e2ebf129f4fd8075650b21c612d196b7a2f8 is a build which is new rayon plus semaphore-for-jobserver, right? no other changes/

Alex Crichton (Dec 02 2019 at 17:17, on Zulip):

this looks to be a massive improvement

simulacrum (Dec 02 2019 at 17:26, on Zulip):

Correct, yes

simulacrum (Dec 02 2019 at 17:26, on Zulip):

Possibly modulo cargo update to latest master but I can't imagine that makes a difference

Alex Crichton (Dec 02 2019 at 17:27, on Zulip):

ok so the numbers aren't entirely exonerating

Alex Crichton (Dec 02 2019 at 17:27, on Zulip):

but I posted them to https://hackmd.io/bmB35-7oRzCeGOCanI0SkA?both

Alex Crichton (Dec 02 2019 at 17:27, on Zulip):

they look massively better than the prior version

simulacrum (Dec 02 2019 at 17:27, on Zulip):

I found a flag on epoll that might allow us to avoid thundering herd problem with the pipe, too, which might be enough that we can use that for now

Alex Crichton (Dec 02 2019 at 17:27, on Zulip):

oh nice

simulacrum (Dec 02 2019 at 17:28, on Zulip):

oh -- are you using the right cargo there?

simulacrum (Dec 02 2019 at 17:28, on Zulip):

you need to use the one from that toolchain

Alex Crichton (Dec 02 2019 at 17:28, on Zulip):

I ran cargo +...

Alex Crichton (Dec 02 2019 at 17:28, on Zulip):

basically what the script snippets say

simulacrum (Dec 02 2019 at 17:28, on Zulip):

I think rustup-toolchain-install-master doesn't do that though

Alex Crichton (Dec 02 2019 at 17:28, on Zulip):

oh fascinating

Alex Crichton (Dec 02 2019 at 17:28, on Zulip):

I may need to fix my thing then

simulacrum (Dec 02 2019 at 17:28, on Zulip):

i.e., you're not downloading cargo by default

Alex Crichton (Dec 02 2019 at 17:28, on Zulip):

also how did this

Alex Crichton (Dec 02 2019 at 17:29, on Zulip):

I must be using the equivalent of no jobserver then

Alex Crichton (Dec 02 2019 at 17:29, on Zulip):

ok I'll need to recollect data

simulacrum (Dec 02 2019 at 17:29, on Zulip):

yeah normal cargo would basically be no jobserver (well, cargo itself would limit to 28, but not internal parallelism)

Alex Crichton (Dec 02 2019 at 17:29, on Zulip):

I cna't collect numbers right this second, will be a bit

simulacrum (Dec 02 2019 at 17:32, on Zulip):

sounds good -- I think you shouldn't need to build the cargo, but it's probably best to do so -- just to be safe -- the submodule in that commit was updated as needed to an appropriate commit I think

Alex Crichton (Dec 02 2019 at 17:38, on Zulip):

ok I think it's working now

Alex Crichton (Dec 02 2019 at 17:38, on Zulip):

I downloaded the precompiled cargo

Alex Crichton (Dec 02 2019 at 17:38, on Zulip):

and i confirmed that -j3 actually limits things

simulacrum (Dec 02 2019 at 17:38, on Zulip):

(it'll loosely not work due to new-rayon bugs that we spoke about last week)

simulacrum (Dec 02 2019 at 17:39, on Zulip):

i.e., I think we can get in a situation where we have more than N running rustcs and such, but we will (hopefully) never actually be doing more than N threads of work

Alex Crichton (Dec 02 2019 at 17:42, on Zulip):

ok those numbers look even better -- https://hackmd.io/bmB35-7oRzCeGOCanI0SkA?both

Alex Crichton (Dec 02 2019 at 17:42, on Zulip):

system time barely changes

Alex Crichton (Dec 02 2019 at 17:43, on Zulip):

top still shows a smidge of red, but nowhere near where it was before

Alex Crichton (Dec 02 2019 at 17:44, on Zulip):

14 threads seems to be a sweet spot for me

Alex Crichton (Dec 02 2019 at 17:44, on Zulip):

so not ahuge amount of benefit for hyperthreads

simulacrum (Dec 02 2019 at 17:44, on Zulip):

do you know if you're using LLD by default locally?

Alex Crichton (Dec 02 2019 at 17:45, on Zulip):

I am not

Alex Crichton (Dec 02 2019 at 17:45, on Zulip):

gtg now

simulacrum (Dec 02 2019 at 17:46, on Zulip):

okay, was going to suggest disabling threading if yes

Zoxc (Dec 02 2019 at 18:30, on Zulip):

When compiling one crate on 8 cores, SMT made things slower until I landed some PRs which reduced contention. I'm using a Ryzen R7 1700, so I don't know how SMT affect performance on Intel chips, but I know AMD generally performs better in multi-threaded workloads.

Santiago Pastorino (Dec 02 2019 at 18:41, on Zulip):

@simulacrum https://github.com/rust-lang/rust/compare/master...spastorino:acquire_thread_then_spawn?expand=1

Santiago Pastorino (Dec 02 2019 at 18:42, on Zulip):

handler code is basically the default handler code and added the acquire and release stuff before and after spawning

bjorn3 (Dec 02 2019 at 19:03, on Zulip):

That would release the token before the spawned thread finished, right?

Santiago Pastorino (Dec 02 2019 at 19:24, on Zulip):

ouch, we want that right after the run call

Santiago Pastorino (Dec 02 2019 at 19:36, on Zulip):

@simulacrum https://github.com/rust-lang/rust/pull/66972

Zoxc (Dec 02 2019 at 19:49, on Zulip):

@Santiago Pastorino Wouldn't that cause rustc to wait until it has all the tokens before doing anything? Or did you do some Rayon changes to thread spawning too?

Santiago Pastorino (Dec 02 2019 at 19:59, on Zulip):

did this as request of @simulacrum but we didn't touch rayon thread spawning

Zoxc (Dec 02 2019 at 20:01, on Zulip):

Rayon also acquires tokens when spawning / closing threads, so you'll need to remove those too

simulacrum (Dec 04 2019 at 20:18, on Zulip):

@Alex Crichton I've been looking into the EPOLLEXCLUSIVE improvement I mentioned, and it looks like it's definitely linux-specific (along with all of epoll). It also looks like kqueue on macOS is claimed to avoid the thundering herd problem, but I can find no evidence of this in manpages (just in random blog posts).

Right now as far as I can tell we can just break compatibility with make, and use semaphores (which are supported on macOS and Linux, at least, and are in POSIX so I expect pretty wide support). That's probably not really tenable though. The alternative then is to work on the Cargo integration to have a single process read/write the file descriptor I guess.

Do you feel that working on the Cargo integration would be reasonable? Should we consider 'just' breaking?

Alex Crichton (Dec 04 2019 at 20:23, on Zulip):

If we pursue the route of semaphores, here's what I think we should do:

Alex Crichton (Dec 04 2019 at 20:23, on Zulip):

well, let me just type it out

Alex Crichton (Dec 04 2019 at 20:24, on Zulip):

I think we should add dual support in the jobserver crate for semaphores and pipes. Cargo would then tell the compiler either (a) here's a semaphore, go wild, or (b) here's a pipe jobserver, you can spawn at most N threads where N is like 10.

Alex Crichton (Dec 04 2019 at 20:24, on Zulip):

Most compilations with Cargo do not integrate with an external jobserver

Alex Crichton (Dec 04 2019 at 20:24, on Zulip):

so that means most compilations would use the semaphore and can go wild

Alex Crichton (Dec 04 2019 at 20:24, on Zulip):

for those that do we'd limit rustc's parallelism to alleviate the thundering herd problem

Alex Crichton (Dec 04 2019 at 20:25, on Zulip):

and we can have a work item for later to fix this

Alex Crichton (Dec 04 2019 at 20:25, on Zulip):

basically Cargo just needs the ability to query a jobserver::Client if it's a semaphore or a pipe

Alex Crichton (Dec 04 2019 at 20:25, on Zulip):

and the jobserver crate auto-detects semaphores or pipes

Alex Crichton (Dec 04 2019 at 20:25, on Zulip):

@simulacrum does that sound reasonable?

Alex Crichton (Dec 04 2019 at 20:25, on Zulip):

I think that should give us a lot of bang for not a lot of buck, while leaving it possible to fix this in the future

simulacrum (Dec 04 2019 at 20:26, on Zulip):

my impression is that most compilations do shell out to cmake/make somewhere or so which my impression was limits what we can do

simulacrum (Dec 04 2019 at 20:27, on Zulip):

but yes, that sounds reasonable

simulacrum (Dec 04 2019 at 20:27, on Zulip):

I can work on a PR to jobserver-rs to make that logic happen

simulacrum (Dec 04 2019 at 20:27, on Zulip):

(it's worth noting that e.g. on Windows it seems initial measurements suggest that a semaphore there still suffers from the thundering herd problem though we don't have super knowledgeable people testing that yet)

simulacrum (Dec 04 2019 at 20:29, on Zulip):

but this does sounds reasonable

simulacrum (Dec 04 2019 at 20:29, on Zulip):

@Alex Crichton will go ahead and start work on this

Alex Crichton (Dec 04 2019 at 20:30, on Zulip):

Oh wait right

Alex Crichton (Dec 04 2019 at 20:30, on Zulip):

I forgot that

Alex Crichton (Dec 04 2019 at 20:30, on Zulip):

The cmake make stuff

Alex Crichton (Dec 04 2019 at 20:32, on Zulip):

@simulacrum let's leave this on the backburner for now

Alex Crichton (Dec 04 2019 at 20:32, on Zulip):

and fix windows first

Alex Crichton (Dec 04 2019 at 20:32, on Zulip):

and see what that requires

Alex Crichton (Dec 04 2019 at 20:32, on Zulip):

if semaphores don't work there we'll likely require a system where cargo manages everything anyway

simulacrum (Dec 04 2019 at 20:33, on Zulip):

hm, okay

simulacrum (Dec 04 2019 at 20:33, on Zulip):

so fixing windows is basically a "shrug" from me -- I can look around at documentation for the primitives we use today

simulacrum (Dec 04 2019 at 20:34, on Zulip):

but if those -- already being semaphores -- are still leading to a bunch of wakeups, then there's plausibly nothing we can do

Alex Crichton (Dec 04 2019 at 20:40, on Zulip):

well the thing we can do to fix that is what we'd do to fix the integration with the jobserver

Alex Crichton (Dec 04 2019 at 20:40, on Zulip):

which is that each rustc instances has a separate ipc mechanism connecting it to cargo

Alex Crichton (Dec 04 2019 at 20:40, on Zulip):

and cargo selects which rustc to wake up, only waking up one instead of all of them

Alex Crichton (Dec 04 2019 at 20:40, on Zulip):

like there would be N jobservers instead of just 1

Alex Crichton (Dec 04 2019 at 20:40, on Zulip):

(ish)

Alex Crichton (Dec 04 2019 at 20:41, on Zulip):

but not literally jobservers because cargo has to react to a request for a token

simulacrum (Dec 04 2019 at 20:41, on Zulip):

oh

simulacrum (Dec 04 2019 at 20:41, on Zulip):

I thought you meant fixing windows without doing the cargo thing

simulacrum (Dec 04 2019 at 20:55, on Zulip):

@Alex Crichton so do you expect to invest time into trying to fix windows semaphores, and then if that fails, we invest into making cargo be the one to dispatch all jobserver operations?

Alex Crichton (Dec 04 2019 at 20:56, on Zulip):

oh sorry so to clarify, I mean that we should "fix windows" in that we should find some solution that is on the same par of performance as linux

Alex Crichton (Dec 04 2019 at 20:57, on Zulip):

I don't think it matters what that is, and it could be the semaphores we have today with a tweak, but we should fix it somehow

Alex Crichton (Dec 04 2019 at 20:57, on Zulip):

and then we can go from there to see how to unify all the implementations and land this all for real

simulacrum (Dec 04 2019 at 20:57, on Zulip):

ah, okay

Alex Crichton (Dec 04 2019 at 20:57, on Zulip):

I don't think I'll personally have time to work on this, but I can allocate time if necessary

Zoxc (Dec 04 2019 at 20:57, on Zulip):

Windows won't really get on par with Linux though. It's pretty known to be slow, especially with multithreading =P

simulacrum (Dec 04 2019 at 20:58, on Zulip):

makes sense; I can look at the flags to the semaphore primitives we use today at least

simulacrum (Dec 04 2019 at 20:58, on Zulip):

and then failing that take a look at making the cargo refactor

andjo403 (Dec 04 2019 at 21:21, on Zulip):

but do not know how mush the jobserver is affecting the times because when I look at eg. the cargo crate then there is no other crate running and I still get longer times with 32 threads compared with 4 threads at least on windows

Zoxc (Dec 04 2019 at 21:50, on Zulip):

There's likely contention with thread spawning in Windows and inside rustc too (miri, symbols, spans, unknown stuff)

simulacrum (Dec 04 2019 at 21:58, on Zulip):

@Zoxc fwiw, all the contention during normal execution we were able to detect with some loose benchmarks wasn't really at the syscall level -- of course, this may just be indicative of either bad measurement or lack of good parallelism in the compiler

simulacrum (Dec 06 2019 at 18:28, on Zulip):

as a quick update it looks like there's no better primitive to be using on windows at least that I can see

simulacrum (Dec 06 2019 at 18:28, on Zulip):

(based on MS docs)

simulacrum (Dec 06 2019 at 18:29, on Zulip):

I guess then the next step is to start working on making cargo act as the jobserver for rustc instances.. I'll start working on that soon then I suppose, though it feels quite painful

Last update: Dec 12 2019 at 00:50UTC