Stream: t-compiler/wg-parallel-rustc

Topic: trying out the new rayon scheduler


nikomatsakis (Nov 14 2019 at 19:27, on Zulip):

So one of the work items we identified was to try out the new rayon scheduler to see if it made any different to our jobserver woes.

nikomatsakis (Nov 14 2019 at 19:27, on Zulip):

I was talking to @Santiago Pastorino and they expressed an interest in doing this

nikomatsakis (Nov 14 2019 at 19:28, on Zulip):

I've been wanting to do it but I am not sure if I have time

nikomatsakis (Nov 14 2019 at 19:28, on Zulip):

I guess what would need to happen is:

nikomatsakis (Nov 14 2019 at 19:28, on Zulip):
nikomatsakis (Nov 14 2019 at 19:28, on Zulip):
nikomatsakis (Nov 14 2019 at 19:28, on Zulip):
nikomatsakis (Nov 14 2019 at 19:28, on Zulip):

maybe a first step, @Santiago Pastorino, would just be to try and reproduce @Alex Crichton's results

nikomatsakis (Nov 14 2019 at 19:28, on Zulip):

just so we have a solid basis for comparison

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

that's probably what I would do

Santiago Pastorino (Nov 14 2019 at 19:29, on Zulip):

those changes are on a rustc-rayon branch?

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

and then do this, there are two ways to go about it

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

no, that's the trick

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

rustc-rayon is a fork of rayon

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

and the changes are on a branch of rayon

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

so they have to be rebased atop rust-rayon --

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

or else the rustc-rayon changes could be rebased atop the branch

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

rustc-rayon is basically tracking rayon with a custom diff

nikomatsakis (Nov 14 2019 at 19:29, on Zulip):

(hopefully it would eventually just be rayon, or we'd move to something custom written)

Santiago Pastorino (Nov 14 2019 at 19:30, on Zulip):

I see, so we need to put rayon's repo changes on rustc-rayon and follow from there, right?

Santiago Pastorino (Nov 14 2019 at 19:30, on Zulip):

sorry to re-ask silly questions, I have the english understanding level of a child of 3 :P

Santiago Pastorino (Nov 14 2019 at 19:32, on Zulip):

@nikomatsakis actually, https://github.com/rayon-rs/rayon/branches there's no active branch there, or is the stuff just in master?

nikomatsakis (Nov 14 2019 at 19:33, on Zulip):

@Santiago Pastorino it's on my fork

Santiago Pastorino (Nov 14 2019 at 19:35, on Zulip):

I see :)

Santiago Pastorino (Nov 14 2019 at 19:36, on Zulip):

https://github.com/nikomatsakis/rayon/tree/latch-target-thread I guess

nikomatsakis (Nov 15 2019 at 19:15, on Zulip):

@Santiago Pastorino yes

nikomatsakis (Nov 15 2019 at 19:15, on Zulip):

do you want to follow up on this?

Santiago Pastorino (Nov 15 2019 at 19:15, on Zulip):

I'm literally doing this

Santiago Pastorino (Nov 15 2019 at 19:15, on Zulip):

was talking with @Alex Crichton in private

Santiago Pastorino (Nov 15 2019 at 19:15, on Zulip):

was also saying him, let's jump into the thread maybe? :P

Santiago Pastorino (Nov 15 2019 at 19:17, on Zulip):

gonna paste the same results that @Alex Crichton have shared but for my machine and for 8 cores

Santiago Pastorino (Nov 15 2019 at 19:23, on Zulip):

so running on cargo repo, for ...

Santiago Pastorino (Nov 15 2019 at 19:23, on Zulip):
rm -rf target
RUSTFLAGS=-Zthreads=8 /usr/bin/time --verbose cargo +8d78bf6b273848d17da8f5c92162c6a6b9b10dfd-alt check &
sleep 3
pkill cargo
Santiago Pastorino (Nov 15 2019 at 19:23, on Zulip):

got

Santiago Pastorino (Nov 15 2019 at 19:24, on Zulip):
    Command being timed: "cargo +8d78bf6b273848d17da8f5c92162c6a6b9b10dfd-alt check"
    User time (seconds): 6.93
    System time (seconds): 0.92
    Percent of CPU this job got: 259%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.02
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 160440
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 183788
    Voluntary context switches: 13599
    Involuntary context switches: 4617
    Swaps: 0
    File system inputs: 0
    File system outputs: 57104
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
Alex Crichton (Nov 15 2019 at 19:28, on Zulip):

FWIW the stats alone aren't too significant, but relative to others is where it helps a lot

Santiago Pastorino (Nov 15 2019 at 19:31, on Zulip):

yeah, you meant with different threads=N values?

Santiago Pastorino (Nov 15 2019 at 19:31, on Zulip):

I've done it with a bunch of values

Santiago Pastorino (Nov 15 2019 at 19:32, on Zulip):

my idea was to keep pasting the output but I got stuck for a couple of minutes trying to run perf

Santiago Pastorino (Nov 15 2019 at 19:32, on Zulip):
Error:  No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
Hint:   Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing/'
Santiago Pastorino (Nov 15 2019 at 19:32, on Zulip):

can just do as root :P

Santiago Pastorino (Nov 15 2019 at 19:37, on Zulip):

I guess I'm going to paste this in a document as you did @Alex Crichton otherwise would be complicated to follow

Santiago Pastorino (Nov 15 2019 at 21:31, on Zulip):

unfortunately my perf installation is segfaulting when running perf trace

Santiago Pastorino (Nov 15 2019 at 21:31, on Zulip):

have reported an issue

Santiago Pastorino (Nov 15 2019 at 21:36, on Zulip):

gonna try building it from source code

nikomatsakis (Nov 18 2019 at 18:22, on Zulip):

so @Santiago Pastorino

nikomatsakis (Nov 18 2019 at 18:23, on Zulip):

yeah so here's the thing

nikomatsakis (Nov 18 2019 at 18:24, on Zulip):

rustc-rayon is itself a fork of upstream rayon

nikomatsakis (Nov 18 2019 at 18:24, on Zulip):

it has some number of commits relative to the base rayon

nikomatsakis (Nov 18 2019 at 18:24, on Zulip):

I think what we want to do is to rebase those commits atop my branch

nikomatsakis (Nov 18 2019 at 18:24, on Zulip):

this probably will generate some conflicts, I realize now, since I massively overhauled the sleep module

nikomatsakis (Nov 18 2019 at 18:24, on Zulip):

the other option would be to rebase my branch (squashed) atop rustc-rayon, but I think that will be harder

Santiago Pastorino (Nov 18 2019 at 18:25, on Zulip):

yeah, the second thing was what I was going after

Santiago Pastorino (Nov 18 2019 at 18:26, on Zulip):

but I've realized that yeah, I have no idea how different are nikomatsakis/rayon with rust-lang/rustc-rayon

Santiago Pastorino (Nov 18 2019 at 18:27, on Zulip):

@nikomatsakis I can do whatever you prefer

nikomatsakis (Nov 18 2019 at 18:27, on Zulip):

rust-lang/rustc-rayon should be relatively recent

nikomatsakis (Nov 18 2019 at 18:27, on Zulip):

@cuviper rebased it fairly recently

Santiago Pastorino (Nov 18 2019 at 18:27, on Zulip):

I'm not sure how is the process we follow with rustc-rayon

Santiago Pastorino (Nov 18 2019 at 18:28, on Zulip):

I can also rebase rustc-rayon atop rayon first

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

hmm

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

I wonder how out of date my branch is :)

Santiago Pastorino (Nov 18 2019 at 18:28, on Zulip):

and then do the rest

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

relative to base rayon

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

I'm also somewhat...unclear if this will actually help

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

I guess it's worth a try

Santiago Pastorino (Nov 18 2019 at 18:28, on Zulip):

it seems like there are 100 extra commits

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

oh yeah but the real question is

nikomatsakis (Nov 18 2019 at 18:28, on Zulip):

what it's based atop

Santiago Pastorino (Nov 18 2019 at 18:28, on Zulip):

so 100 commits that are not in sync with rustc-rayon besides the stuff you added

nikomatsakis (Nov 18 2019 at 18:29, on Zulip):

I don't think I ever rebased it, but rayon changes slowly

cuviper (Nov 18 2019 at 18:29, on Zulip):

one way I expect it to help is that more targeted wakeups should reduce jobserver traffic

nikomatsakis (Nov 18 2019 at 18:29, on Zulip):

oh you mean it's 100 commits "behind" rayon-rs:master?

cuviper (Nov 18 2019 at 18:29, on Zulip):

instead of storming all threads

nikomatsakis (Nov 18 2019 at 18:29, on Zulip):

one way I expect it to help is that more targeted wakeups should reduce jobserver traffic

we cerainly wake up fewer things; we may also transition between sleeping/waking faster, not sure if that helps

Santiago Pastorino (Nov 18 2019 at 18:30, on Zulip):

oh you mean it's 100 commits "behind" rayon-rs:master?

no there are 100 commits that rustc-rayon lacks

nikomatsakis (Nov 18 2019 at 18:30, on Zulip):

not including my branch, you mean

Santiago Pastorino (Nov 18 2019 at 18:30, on Zulip):

exactly

cuviper (Nov 18 2019 at 18:30, on Zulip):

https://github.com/rayon-rs/rayon/compare/master...nikomatsakis:latch-target-thread
https://github.com/rayon-rs/rayon/compare/master...rust-lang:rustc

cuviper (Nov 18 2019 at 18:31, on Zulip):

even github says rustc-rayon is "6 commits ahead, 18 commits behind rayon-rs:master", so I'm not sure where you get 100

Santiago Pastorino (Nov 18 2019 at 18:32, on Zulip):

yeah

Santiago Pastorino (Nov 18 2019 at 18:32, on Zulip):

I was going to do so

Santiago Pastorino (Nov 18 2019 at 18:32, on Zulip):

but this is what I'm talking about https://github.com/rust-lang/rustc-rayon/compare/master...nikomatsakis:master

Santiago Pastorino (Nov 18 2019 at 18:32, on Zulip):

130 commits

Santiago Pastorino (Nov 18 2019 at 18:32, on Zulip):

so when I try to put nikomatsakis:latch-target-thread on top of rustc-rayon master we have around 200

cuviper (Nov 18 2019 at 18:33, on Zulip):

why are you using niko's master? that's just plain old

nikomatsakis (Nov 18 2019 at 18:33, on Zulip):

yeah, nikomatsakis:master isn't really an interesting branch -- although I think it does happen to be the "merge-base" of latch-target-thread and rayon-rs:master

Santiago Pastorino (Nov 18 2019 at 18:33, on Zulip):

ahh your branch started from rayon-rs:master

Santiago Pastorino (Nov 18 2019 at 18:34, on Zulip):

https://github.com/rust-lang/rustc-rayon/compare/master...rayon-rs:master

Santiago Pastorino (Nov 18 2019 at 18:35, on Zulip):

this is the comparison we are talking about then?

Santiago Pastorino (Nov 18 2019 at 18:35, on Zulip):

ahhh the branch is called rustc

nikomatsakis (Nov 18 2019 at 18:35, on Zulip):

ah yeah :)

Santiago Pastorino (Nov 18 2019 at 18:35, on Zulip):

the default branch is rustc on rust-lang/rustc-rayon

nikomatsakis (Nov 18 2019 at 18:36, on Zulip):

I remember that now...

Santiago Pastorino (Nov 18 2019 at 18:36, on Zulip):

that was why I was getting zillions of commits

Santiago Pastorino (Nov 18 2019 at 18:36, on Zulip):

cool :)

Santiago Pastorino (Nov 18 2019 at 18:36, on Zulip):

ok, @nikomatsakis what you prefer me to do then?

Santiago Pastorino (Nov 18 2019 at 18:37, on Zulip):

I can first add the missing commits from rayon-rs/rayon:master to rust-lang/rustc-rayon:rustc

Santiago Pastorino (Nov 18 2019 at 18:37, on Zulip):

and then add your branch on top of rust-lang/rustc-rayon:rustc

nikomatsakis (Nov 18 2019 at 18:38, on Zulip):

hmm yes so probably rebasing atop rayon-rs:master is a good first step

nikomatsakis (Nov 18 2019 at 18:38, on Zulip):

well

nikomatsakis (Nov 18 2019 at 18:38, on Zulip):

I think I would just rebase directly to nikomatsakis:latch-target-thread

nikomatsakis (Nov 18 2019 at 18:38, on Zulip):

given that this branch is itself behind rayon-rs:master

Santiago Pastorino (Nov 18 2019 at 18:38, on Zulip):

:+1:

Santiago Pastorino (Nov 18 2019 at 18:38, on Zulip):

ok

Santiago Pastorino (Nov 18 2019 at 19:06, on Zulip):

@nikomatsakis @cuviper I imagine that after this commit https://github.com/rayon-rs/rayon/commit/4dbd63b788cda2b9f8520da3b0cd41cdcd043294 we don't want SleepData, that structure is only in rustc-rayon and not in rayon

Santiago Pastorino (Nov 18 2019 at 19:08, on Zulip):

sorry, I think I've just confused myself

Santiago Pastorino (Nov 18 2019 at 19:08, on Zulip):

let me recheck

Santiago Pastorino (Nov 18 2019 at 19:13, on Zulip):

@nikomatsakis @cuviper yeah, what I've said is right

Santiago Pastorino (Nov 18 2019 at 19:14, on Zulip):

talking about this https://github.com/rust-lang/rustc-rayon/blob/rustc/rayon-core/src/sleep/mod.rs#L12

Santiago Pastorino (Nov 18 2019 at 19:16, on Zulip):

I guess that's kind of the point of https://github.com/rayon-rs/rayon/commit/4dbd63b788cda2b9f8520da3b0cd41cdcd043294 and we should remove SleepData

Santiago Pastorino (Nov 18 2019 at 19:17, on Zulip):

it also has a deadlock_check function that I'm not sure if it's covered on Niko's branch

nikomatsakis (Nov 18 2019 at 19:26, on Zulip):

@Santiago Pastorino let me look briefly..

nikomatsakis (Nov 18 2019 at 19:27, on Zulip):

ok so --

nikomatsakis (Nov 18 2019 at 19:27, on Zulip):

hmm

nikomatsakis (Nov 18 2019 at 19:27, on Zulip):

:)

nikomatsakis (Nov 18 2019 at 19:28, on Zulip):

ok so the deadlock checking in rustc-rayon

nikomatsakis (Nov 18 2019 at 19:28, on Zulip):

I'm trying to remember but I think there is some callback by which a thread says "I'm blocked now"

nikomatsakis (Nov 18 2019 at 19:28, on Zulip):

and the idea is that if all threads are either blocked or asleep

nikomatsakis (Nov 18 2019 at 19:28, on Zulip):

we can trigger the deadlock callback

nikomatsakis (Nov 18 2019 at 19:29, on Zulip):

one quick check, @Santiago Pastorino, you are rebasing the rustc-rayon commits atop the branch, right?

nikomatsakis (Nov 18 2019 at 19:29, on Zulip):

i.e., you're not rebasing all the branch commits

Santiago Pastorino (Nov 18 2019 at 19:30, on Zulip):

we can trigger the deadlock callback

registry::{mark_blocked, mark_unblocked} ?

Santiago Pastorino (Nov 18 2019 at 19:30, on Zulip):

one quick check, Santiago Pastorino, you are rebasing the rustc-rayon commits atop the branch, right?

I'm rebasing rustc's branch of rustc-rayon atop your branch

Santiago Pastorino (Nov 18 2019 at 19:31, on Zulip):

so, to be 100% clear, I'm on your branch

Santiago Pastorino (Nov 18 2019 at 19:31, on Zulip):

and did git rebase rustc-rayon/rustc

Santiago Pastorino (Nov 18 2019 at 19:32, on Zulip):

that is going to place your commits on top of rustc-rayon/rustc

Santiago Pastorino (Nov 18 2019 at 19:34, on Zulip):

@nikomatsakis trying to understand what would be the idea, so from what you're saying I guess we would need to keep all that SleepData stuff?

Santiago Pastorino (Nov 18 2019 at 19:35, on Zulip):

I'm mainly wondering how much of rebasing of things is and how much of things work a bit differently we need to rewrite parts of the code and adapt things in a deeper way

Santiago Pastorino (Nov 18 2019 at 19:37, on Zulip):

if it's rebasing I guess I can easily get this working, but if I need to do major surgery I may need to start understanding deeper how things work

nikomatsakis (Nov 18 2019 at 19:37, on Zulip):

that is going to place your commits on top of rustc-rayon/rustc

I'm confused

nikomatsakis (Nov 18 2019 at 19:37, on Zulip):

this is the command you executed?

nikomatsakis (Nov 18 2019 at 19:37, on Zulip):

if so, I think you're in for a world of trouble :)

nikomatsakis (Nov 18 2019 at 19:37, on Zulip):

I'm mainly wondering how much of rebasing of things is and how much of things work a bit differently we need to rewrite parts of the code and adapt things in a deeper way

I think we're going to have to do some deeper edits, yeah, I may have underestimated how much work it would be

nikomatsakis (Nov 18 2019 at 19:38, on Zulip):

maybe I should just put a bit of time into it -- or we could share screens

nikomatsakis (Nov 18 2019 at 19:38, on Zulip):

(that could, indeed, be the parallel rustc meeting...)

Santiago Pastorino (Nov 18 2019 at 19:38, on Zulip):

if so, I think you're in for a world of trouble :)

why?

Santiago Pastorino (Nov 18 2019 at 19:38, on Zulip):

unsure I'm understanding :)

Santiago Pastorino (Nov 18 2019 at 19:39, on Zulip):

(that could, indeed, be the parallel rustc meeting...)

definitely I can open tmux or something and share the current state of things

Santiago Pastorino (Nov 18 2019 at 19:39, on Zulip):

if so, I think you're in for a world of trouble :)

why?

didn't we want your commits on top of rustc-rayon's ones?

Santiago Pastorino (Nov 18 2019 at 19:40, on Zulip):

and indeed I got just a few unsynced from rayon-rs/rayon:master and then yours

nikomatsakis (Nov 18 2019 at 19:42, on Zulip):

didn't we want your commits on top of rustc-rayon's ones?

no, I think we want rustc-rayon's commits atop my branch

nikomatsakis (Nov 18 2019 at 19:42, on Zulip):

mainly because

nikomatsakis (Nov 18 2019 at 19:42, on Zulip):

there was like 6 rustc-rayon commits

nikomatsakis (Nov 18 2019 at 19:42, on Zulip):

and the branch has a very untidy, windy history

nikomatsakis (Nov 18 2019 at 19:42, on Zulip):

I mean I could squash the branch too

nikomatsakis (Nov 18 2019 at 19:43, on Zulip):

but .. in general we've been rebasing rustc-rayon atop rayon

nikomatsakis (Nov 18 2019 at 19:43, on Zulip):

definitely I can open tmux or something and share the current state of things

maybe we try this in a bit?

Santiago Pastorino (Nov 18 2019 at 19:44, on Zulip):

yes

Santiago Pastorino (Nov 18 2019 at 19:44, on Zulip):

at the meeting time?

Santiago Pastorino (Nov 18 2019 at 19:44, on Zulip):

which is in ~ 15 minutes from now

Santiago Pastorino (Nov 18 2019 at 21:14, on Zulip):

@Alex Crichton I've just pushed a rayon branch and a compiler branch that uses it

Alex Crichton (Nov 18 2019 at 21:15, on Zulip):

Nice! The next steps would be to send that as a PR to rust-lang/rust and then run @bors: try

Santiago Pastorino (Nov 18 2019 at 21:16, on Zulip):

does that already run in parallel mode?

Santiago Pastorino (Nov 18 2019 at 21:17, on Zulip):

I thought you needed to do something with it

Santiago Pastorino (Nov 18 2019 at 21:19, on Zulip):

@Alex Crichton :point_up:

simulacrum (Nov 18 2019 at 21:20, on Zulip):

@Santiago Pastorino yeah, that'll produce parallel artifacts on the alt commit

simulacrum (Nov 18 2019 at 21:20, on Zulip):

Not sure if we want non-alt parallel artifacts (i.e., fast ones)

Santiago Pastorino (Nov 18 2019 at 21:20, on Zulip):

ahhh ok

Santiago Pastorino (Nov 18 2019 at 21:20, on Zulip):

well, done here https://github.com/rust-lang/rust/pull/66527

simulacrum (Nov 18 2019 at 21:21, on Zulip):

probably? then you'll want to comment these two lines: https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/ci/run.sh#L61-L62

simulacrum (Nov 18 2019 at 21:21, on Zulip):

depends on the diffs we ran last time

Santiago Pastorino (Nov 18 2019 at 21:23, on Zulip):

ahh so should I stop the try run?

simulacrum (Nov 18 2019 at 23:43, on Zulip):

nah, this is fine too

simulacrum (Nov 18 2019 at 23:43, on Zulip):

good to have both

Santiago Pastorino (Nov 19 2019 at 13:16, on Zulip):

@Alex Crichton @simulacrum what else with https://github.com/rust-lang/rust/pull/66527 ?

Santiago Pastorino (Nov 19 2019 at 13:16, on Zulip):

I'm not sure I understood the process we are following the get perf results

Santiago Pastorino (Nov 19 2019 at 13:16, on Zulip):

there are tidy failures but I guess we don't need to bother fixing those

simulacrum (Nov 19 2019 at 13:17, on Zulip):

I don't think tidy failures matter, yes

simulacrum (Nov 19 2019 at 13:17, on Zulip):

I imagine Alex can go bench now? We can also get a try build with parallel but not, uh, llvm asserting

simulacrum (Nov 19 2019 at 13:17, on Zulip):

but I think that's not actually too useful for the kind of benchmarking we've been doing

Santiago Pastorino (Nov 19 2019 at 13:35, on Zulip):

ok, let's see what @Alex Crichton says then :)

Alex Crichton (Nov 19 2019 at 15:19, on Zulip):

Looks like the binaries don't work I think @Santiago Pastorino

Alex Crichton (Nov 19 2019 at 15:19, on Zulip):

rm -rf target && RUSTFLAGS=-Zthreads=28 cargo +540c0b9a1aea77a85ec449623ce5c509f0833899-alt check

Alex Crichton (Nov 19 2019 at 15:19, on Zulip):

that instantly deadlocks

Santiago Pastorino (Nov 19 2019 at 15:21, on Zulip):

:S

Santiago Pastorino (Nov 19 2019 at 15:24, on Zulip):

cc @nikomatsakis

nikomatsakis (Nov 19 2019 at 15:26, on Zulip):

got me :)

nikomatsakis (Nov 19 2019 at 15:26, on Zulip):

I can try to look later maybe tho

nikomatsakis (Nov 19 2019 at 15:26, on Zulip):

there's probably some bugs introducd by the rebase I guess

Santiago Pastorino (Nov 19 2019 at 15:31, on Zulip):

I can try the rebase again and check if I find something weird

simulacrum (Nov 19 2019 at 15:50, on Zulip):

hm, that seems surprising

simulacrum (Nov 19 2019 at 15:50, on Zulip):

oh, I guess when we're doing it in-tree we only have the one thread

nikomatsakis (Nov 19 2019 at 16:22, on Zulip):

I will say that the latch-target-thread branch may well be buggy

nikomatsakis (Nov 19 2019 at 16:22, on Zulip):

I wonder if we should discuss a bit

nikomatsakis (Nov 19 2019 at 16:23, on Zulip):

what we would do if we did not use rayon

nikomatsakis (Nov 19 2019 at 16:23, on Zulip):

created a topic for it

Santiago Pastorino (Nov 19 2019 at 17:45, on Zulip):

@nikomatsakis @simulacrum what would this https://github.com/spastorino/rustc-rayon/commit/70bb40ed4e2a04b95c9c6d0a05ad8718dc9b2d06 be useful for if we do not implement mark_blocked and mark_unblocked?

Santiago Pastorino (Nov 19 2019 at 17:45, on Zulip):

wouldn't be better to just skip the commit for now?

simulacrum (Nov 19 2019 at 17:45, on Zulip):

should be able to, yeah

Santiago Pastorino (Nov 19 2019 at 17:46, on Zulip):

I guess it would be simpler

Santiago Pastorino (Nov 19 2019 at 17:46, on Zulip):

but yeah, anyway that part rebase shouldn't be causing the problem anyway

Santiago Pastorino (Nov 19 2019 at 18:19, on Zulip):

after rebasing again all the differences I've found are that commit which I've removed now

Santiago Pastorino (Nov 19 2019 at 18:20, on Zulip):

and this difference ...

Santiago Pastorino (Nov 19 2019 at 18:20, on Zulip):

https://github.com/spastorino/rustc-rayon/commit/807a281581a1a998337fc35ac00ffaea5af0e655#diff-19c860748b8faafc557a11db2d185bd9R909

Santiago Pastorino (Nov 19 2019 at 18:20, on Zulip):

vs

Santiago Pastorino (Nov 19 2019 at 18:20, on Zulip):

https://github.com/spastorino/rustc-rayon/commit/64a886ed6c099677dc8fbde580f6294f7763609e#diff-19c860748b8faafc557a11db2d185bd9R884

Santiago Pastorino (Nov 19 2019 at 18:20, on Zulip):

but I don't think that matters

Santiago Pastorino (Nov 19 2019 at 18:21, on Zulip):

basically if acquire_thread should wrap ...

Santiago Pastorino (Nov 19 2019 at 18:21, on Zulip):
    let my_terminate_latch = &registry.thread_infos[index].terminate;
    worker_thread.log(|| ThreadStart {
        worker: index,
        terminate_addr: my_terminate_latch.as_core_latch().addr(),
    });
Santiago Pastorino (Nov 19 2019 at 18:21, on Zulip):

or not

Santiago Pastorino (Nov 19 2019 at 18:25, on Zulip):

@nikomatsakis just in case you want to check this ^^^

Santiago Pastorino (Nov 19 2019 at 18:44, on Zulip):

more info ...

Santiago Pastorino (Nov 19 2019 at 18:44, on Zulip):
 3:37 PM

#0  0x00007fddfbe35c45 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007fddfe3afb17 in rustc_rayon_core::latch::LockLatch::wait () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#2  0x00007fddfe3b0598 in rustc_rayon_core::thread_pool::ThreadPool::wait_until_stopped () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#3  0x00007fddfc41a202 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
   from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#4  0x00007fddfc3f3dae in std::panicking::try::do_call () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#5  0x00007fddfbefcf6a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:81
#6  0x00007fddfc41cc41 in crossbeam_utils::thread::scope () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#7  0x00007fddfc423d93 in scoped_tls::ScopedKey<T>::set () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#8  0x00007fddfc401f64 in syntax::with_globals () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#9  0x00007fddfc43cb60 in rustc_interface::util::spawn_thread_pool () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#10 0x00007fddfc3d0020 in rustc_driver::run_compiler () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#11 0x00007fddfc41a9a4 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
   from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#12 0x00007fddfc3f3f4c in std::panicking::try::do_call () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#13 0x00007fddfbefcf6a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:81
#14 0x00007fddfc3d5999 in rustc_driver::catch_fatal_errors () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#15 0x00007fddfc3d6ab1 in rustc_driver::main () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#16 0x000055f21eb61423 in std::rt::lang_start::{{closure}} ()
#17 0x00007fddfbeebfd3 in std::rt::lang_start_internal::{{closure}} () at src/libstd/rt.rs:48
#18 std::panicking::try::do_call () at src/libstd/panicking.rs:287
#19 0x00007fddfbefcf6a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:81
#20 0x00007fddfbeecb4d in std::panicking::try () at src/libstd/panicking.rs:265
#21 std::panic::catch_unwind () at src/libstd/panic.rs:395
#22 std::rt::lang_start_internal () at src/libstd/rt.rs:47
#23 0x000055f21eb61412 in main ()
Santiago Pastorino (Nov 19 2019 at 18:56, on Zulip):

well maybe I should add more threads

Santiago Pastorino (Nov 19 2019 at 18:56, on Zulip):
(gdb) bt
#0  0x00007f0abb827c45 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007f0abdda6f02 in rustc_rayon_core::sleep::Sleep::sleep () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#2  0x00007f0abdda5f13 in rustc_rayon_core::registry::WorkerThread::wait_until_cold () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#3  0x00007f0abdda3e33 in rustc_rayon_core::registry::ThreadBuilder::run () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#4  0x00007f0abbe163f9 in scoped_tls::ScopedKey<T>::set () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#5  0x00007f0abbe10cd9 in std::thread::local::LocalKey<T>::with () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#6  0x00007f0abbe16650 in scoped_tls::ScopedKey<T>::set () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#7  0x00007f0abbe0e5e7 in crossbeam_utils::thread::ScopedThreadBuilder::spawn::{{closure}} () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#8  0x00007f0abbe178eb in std::sys_common::backtrace::__rust_begin_short_backtrace () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#9  0x00007f0abb8eef6a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:81
#10 0x00007f0abbe28799 in core::ops::function::FnOnce::call_once{{vtable-shim}} () from /home/santiago/.rustup/toolchains/540c0b9a1aea77a85ec449623ce5c509f0833899-alt/bin/../lib/librustc_driver-6c2442420060056d.so
#11 0x00007f0abb8bfccf in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/540c0b9a1aea77a85ec449623ce5c509f0833899/src/liballoc/boxed.rs:942
#12 0x00007f0abb8ed990 in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/540c0b9a1aea77a85ec449623ce5c509f0833899/src/liballoc/boxed.rs:942
#13 std::sys_common::thread::start_thread () at src/libstd/sys_common/thread.rs:13
#14 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:79
#15 0x00007f0abb8214cf in start_thread () from /usr/lib/libpthread.so.0
#16 0x00007f0abb73e2d3 in clone () from /usr/lib/libc.so.6
Santiago Pastorino (Nov 19 2019 at 18:58, on Zulip):

most threads are locked in Sleep::sleep

Santiago Pastorino (Nov 19 2019 at 19:02, on Zulip):

I wonder if the release_thread calls are placed correctly https://github.com/spastorino/rustc-rayon/commit/64a886ed6c099677dc8fbde580f6294f7763609e#diff-c5cfc0fcc3d66d15f5c96fa0bc58fa93L136

Santiago Pastorino (Nov 19 2019 at 19:16, on Zulip):
   0x00007f0abdda6ee3 <+259>:   callq  *0xb3b8c7(%rip)        # 0x7f0abe8e27b0
   0x00007f0abdda6ee9 <+265>:   mov    (%r12),%rbp
   0x00007f0abdda6eed <+269>:   mov    %rbx,%rdi
   0x00007f0abdda6ef0 <+272>:   callq  *0xb3acb2(%rip)        # 0x7f0abe8e1ba8
   0x00007f0abdda6ef6 <+278>:   mov    %rbp,%rdi
   0x00007f0abdda6ef9 <+281>:   mov    %rax,%rsi
   0x00007f0abdda6efc <+284>:   callq  *0xb3db3e(%rip)        # 0x7f0abe8e4a40
   0x00007f0abdda6f02 <+290>:   movzbl 0x8(%r14),%eax
   0x00007f0abdda6f07 <+295>:   test   %al,%al
   0x00007f0abdda6f09 <+297>:   jne    0x7f0abdda6ff2 <_ZN16rustc_rayon_core5sleep5Sleep5sleep17ha30ef7a045af5111E+530>
   0x00007f0abdda6f0f <+303>:   cmpb   $0x0,0x9(%r14)
   0x00007f0abdda6f14 <+308>:   mov    %r14,%r13
   0x00007f0abdda6f17 <+311>:   jne    0x7f0abdda6ed0 <_ZN16rustc_rayon_core5sleep5Sleep5sleep17ha30ef7a045af5111E+240>
   0x00007f0abdda6f19 <+313>:   mov    %r14,(%rsp)
nikomatsakis (Nov 19 2019 at 21:23, on Zulip):

I wonder if the release_thread calls are placed correctly https://github.com/spastorino/rustc-rayon/commit/64a886ed6c099677dc8fbde580f6294f7763609e#diff-c5cfc0fcc3d66d15f5c96fa0bc58fa93L136

I think they are correct, @Santiago Pastorino

nikomatsakis (Nov 19 2019 at 21:23, on Zulip):

right now the release is basically right before we go to sleep

nikomatsakis (Nov 19 2019 at 21:24, on Zulip):

and the acquire is effectively right after

nikomatsakis (Nov 19 2019 at 21:24, on Zulip):

that seems correct

nikomatsakis (Nov 19 2019 at 21:24, on Zulip):

wouldn't be better to just skip the commit for now?

would be ok, probably doesn't help tho

nikomatsakis (Nov 19 2019 at 21:25, on Zulip):

I guess I don't really know why it's deadlocking, could well be a bug in the code, or maybe some kind of rebase error

nikomatsakis (Nov 19 2019 at 21:26, on Zulip):

One thing we could do is to run with the logging enabled

nikomatsakis (Nov 19 2019 at 21:27, on Zulip):

that would require building with RUSTFLAGS='--cfg rayon_rs_log', or else altering the branch to enable logs all the time

nikomatsakis (Nov 19 2019 at 21:27, on Zulip):

the latter is prob easier

nikomatsakis (Nov 19 2019 at 21:28, on Zulip):

there is a line of code in rayon-core/src/log.rs like

pub(super) const LOG_ENABLED: bool = cfg!(any(rayon_rs_log, debug_assertions));

which we could change to

pub(super) const LOG_ENABLED: bool = true;
nikomatsakis (Nov 19 2019 at 21:28, on Zulip):

and then run with RAYON_RS_LOG=tail:killme.csv

nikomatsakis (Nov 19 2019 at 21:28, on Zulip):

that will make a file called killme.csv with the last 10K events or something like that

nikomatsakis (Nov 19 2019 at 21:28, on Zulip):

which is usually enough to kind of track what happened

Santiago Pastorino (Nov 19 2019 at 21:31, on Zulip):

I could do that

Santiago Pastorino (Nov 19 2019 at 21:31, on Zulip):

meanwhile I got some info on exactly where are we deadlocking

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

threads are locked in:

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/sleep/mod.rs#L206

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/registry.rs#L374

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/registry.rs#L543

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/registry.rs#L381

Santiago Pastorino (Nov 19 2019 at 21:32, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/sleep/mod.rs#L369

nikomatsakis (Nov 19 2019 at 21:40, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/registry.rs#L374

huh, that's an interesting one

nikomatsakis (Nov 19 2019 at 21:41, on Zulip):

ah wait

nikomatsakis (Nov 19 2019 at 21:42, on Zulip):

@Santiago Pastorino hmm ok I think I know the problem

nikomatsakis (Nov 19 2019 at 21:42, on Zulip):

I suspect we need to add drop(is_blocked) before the call to acquire_thread here

nikomatsakis (Nov 19 2019 at 21:43, on Zulip):

actually i'm a bit surprised I acquire that lock so early

nikomatsakis (Nov 19 2019 at 21:43, on Zulip):

though I think I have to do it before some of the other operations to avoid missing an interrupt

nikomatsakis (Nov 19 2019 at 21:43, on Zulip):

anyway, as the code is now, if we block in acquire_thread, other threads may fail when trying to wake us up

nikomatsakis (Nov 19 2019 at 21:44, on Zulip):

https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/sleep/mod.rs#L369

the tip off is here -- we should not be blocked here

nikomatsakis (Nov 19 2019 at 21:44, on Zulip):

as is_blocked should never be locked for an indefinite amount of time

Santiago Pastorino (Nov 20 2019 at 04:12, on Zulip):

@nikomatsakis yeah, adding a drop there works

Santiago Pastorino (Nov 20 2019 at 04:13, on Zulip):

just pushed to spastorino/rustc-rayon and pushed to the PR and re-run try again

Santiago Pastorino (Nov 20 2019 at 11:53, on Zulip):

@Alex Crichton can you give it a try now?

Alex Crichton (Nov 20 2019 at 15:35, on Zulip):

Ok I've given the build a spin

Alex Crichton (Nov 20 2019 at 15:35, on Zulip):

I'm getting pretty similar numbers though

Alex Crichton (Nov 20 2019 at 15:35, on Zulip):

I don't think that there's really too much different

Alex Crichton (Nov 20 2019 at 15:35, on Zulip):
   syscall              calls       total       min       avg       max
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------

   futex                56856  1183229.780     0.000    20.811  2110.401
   sched_yield         189175  1042027.023     0.001     5.508   911.355
   poll                  7079   469753.013     0.000    66.359  1676.488
   read                 11149   446902.643     0.000    40.085  1186.069
   write                 8890    57014.269     0.000     6.413  1010.733
   munmap                7093    34554.982     0.000     4.872  1010.743
   rt_sigprocmask        2837    14168.891     0.000     4.994   204.912
Alex Crichton (Nov 20 2019 at 15:35, on Zulip):

that's still a huge amount of time in sched_yield

Alex Crichton (Nov 20 2019 at 15:41, on Zulip):

@Santiago Pastorino ^

Santiago Pastorino (Nov 20 2019 at 15:42, on Zulip):

:+1: cc @nikomatsakis

Santiago Pastorino (Nov 20 2019 at 15:42, on Zulip):

so I guess the rayon work didn't help that much

simulacrum (Nov 20 2019 at 16:10, on Zulip):

I really would like to get a % allocation of where those calls are coming from

simulacrum (Nov 20 2019 at 16:11, on Zulip):

e.g. is it actually rayon? or are our locks for example not great (e.g., we don't do the "fast" path in Mutex I believe)

simulacrum (Nov 21 2019 at 01:57, on Zulip):

hm so I do not have 14/28 cores -- but I tried to run the benchmark on cargo locally, and I'm not seeing a similar problem

threads=1:

   syscall              calls       total       min       avg       max   calls/process
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------  -------------

   futex                45212    60899.383     0.001     1.347  1237.811       64
   poll                   406    28968.423     0.002    71.351   937.439        0
   read                  5913    11218.518     0.001     1.897   391.518        8
   execve                 212      348.855     0.144     1.646    12.630        0
   clone                 1308      319.756     0.000     0.244     5.406        1
   openat               16447      139.963     0.002     0.009    16.200       23
   mmap                 11993      124.975     0.002     0.010     7.969       17
   munmap                3038      115.923     0.002     0.038     3.028        4
   write                 3199       63.374     0.001     0.020     0.360        4
   lstat                16320       56.009     0.001     0.003     0.144       23

threads=16:

   syscall              calls       total       min       avg       max   calls/process
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------  -------------

   futex                49488   492724.271     0.001     9.956  2275.793       29
   poll                  4438   308205.607     0.002    69.447   777.625        2
   read                  8790   116879.351     0.001    13.297   293.349        5
   sched_yield         120716     3300.245     0.001     0.027    30.798       72
   clone                 3269     1608.880     0.000     0.492    32.415        1
   munmap                4961     1286.013     0.002     0.259    46.807        2
   write                 6558     1052.591     0.001     0.161    46.031        3
   mmap                 14455      425.569     0.002     0.029    11.968        8
   execve                 170      375.855     0.148     2.211    16.231        0
   madvise               1731      190.411     0.003     0.110    46.988        1
simulacrum (Nov 21 2019 at 01:57, on Zulip):

certainly many more sched_yield calls

simulacrum (Nov 21 2019 at 01:57, on Zulip):

but not drastically so

simulacrum (Nov 21 2019 at 01:58, on Zulip):

and very little time spent (3.3 seconds only, across 16 cores of 3 seconds so -- 7% of time?)

simulacrum (Nov 21 2019 at 01:59, on Zulip):

going to try out the original commit

simulacrum (Nov 21 2019 at 02:01, on Zulip):

the old commit (8d78bf6b273848d17da8f5c92162c6a6b9b10dfd-alt) I also don't see anything like the numbers you report:

   syscall              calls       total       min       avg       max   calls/process
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------  -------------

   futex                44951   414095.520     0.001     9.212  2084.394       29
   poll                  4403   386375.166     0.002    87.753   840.755        2
   read                  8081   168858.116     0.001    20.896   485.283        5
   sched_yield         250017     2598.084     0.001     0.010    10.362      165
   clone                 2951     1580.496     0.000     0.536    12.840        1
   write                 6045     1161.817     0.001     0.192    32.851        4
   mmap                 12679      465.769     0.002     0.037    27.262        8
   statx                 2214      355.845     0.002     0.161    34.590        1
   openat               11254      302.685     0.002     0.027    34.587        7
   execve                 146      211.512     0.145     1.449    13.039        0
simulacrum (Nov 21 2019 at 02:03, on Zulip):

certainly we're yielding more than we do at threads=1 but that seems ... reasonable

simulacrum (Nov 21 2019 at 02:06, on Zulip):

@Alex Crichton I'm on somewhat similar hardware (1800X ryzen) and running 5.0.0-36-generic as my kernel... I wonder if there's something about your system that is leading to these different numbers

simulacrum (Nov 21 2019 at 02:06, on Zulip):

I wonder if we can try to find another system to try this out on (to try and replicate)

simulacrum (Nov 21 2019 at 02:15, on Zulip):

separately trying to get a picture of how "good" we are at using the CPUs during those initial 3 seconds; this is from perf sched timehist -s

Idle stats:
    CPU  0 idle for    404.750  msec  ( 14.03%)
    CPU  1 idle for     59.332  msec  (  2.06%)
    CPU  2 idle for    156.965  msec  (  5.44%)
    CPU  3 idle for    436.694  msec  ( 15.14%)
    CPU  4 idle for      3.127  msec  (  0.11%)
    CPU  5 idle for    332.809  msec  ( 11.53%)
    CPU  6 idle for    390.287  msec  ( 13.53%)
    CPU  7 idle for    196.263  msec  (  6.80%)
    CPU  8 idle for    213.140  msec  (  7.39%)
    CPU  9 idle for    165.615  msec  (  5.74%)
    CPU 10 idle for    220.957  msec  (  7.66%)
    CPU 11 idle for    246.667  msec  (  8.55%)
    CPU 12 idle for    361.374  msec  ( 12.52%)
    CPU 13 idle for    415.110  msec  ( 14.39%)
    CPU 14 idle for    440.170  msec  ( 15.26%)
    CPU 15 idle for     93.169  msec  (  3.23%)
simulacrum (Nov 21 2019 at 02:16, on Zulip):

this also seems not too bad, although the numbers are higher than I'd like

Alex Crichton (Nov 21 2019 at 02:48, on Zulip):

@simulacrum I believe this problem is directly related to the number of cores actually

Alex Crichton (Nov 21 2019 at 02:49, on Zulip):

Can you try overcommitting with j28 and Z16?

Alex Crichton (Nov 21 2019 at 02:49, on Zulip):

The problem I think is happening by definition goes away on low core counts

simulacrum (Nov 21 2019 at 02:49, on Zulip):

hm, does -j28 have effect? I sort of thought we "limited" it at num_cpus

Alex Crichton (Nov 21 2019 at 02:49, on Zulip):

We don't

simulacrum (Nov 21 2019 at 02:49, on Zulip):

ah okay will try

Alex Crichton (Nov 21 2019 at 02:49, on Zulip):

J28 tells cargo to spawn 28 things

Alex Crichton (Nov 21 2019 at 02:50, on Zulip):

You just gotta override both rustc and cargo

simulacrum (Nov 21 2019 at 02:50, on Zulip):

and the numbers above are for first 3 seconds still right?

Alex Crichton (Nov 21 2019 at 02:50, on Zulip):

Yeah

simulacrum (Nov 21 2019 at 02:50, on Zulip):

and are you cargo check or cargo build?

Alex Crichton (Nov 21 2019 at 02:51, on Zulip):

If you look at top you should see a lot of.red

Alex Crichton (Nov 21 2019 at 02:51, on Zulip):

Er htop

Alex Crichton (Nov 21 2019 at 02:51, on Zulip):

Check I think

Alex Crichton (Nov 21 2019 at 02:51, on Zulip):

like if you watch htop during a normal build it should be a lot of.green

Alex Crichton (Nov 21 2019 at 02:51, on Zulip):

Given lots of parallelism it should go to a lot of red

Alex Crichton (Nov 21 2019 at 02:53, on Zulip):

basically I'm trying to quantify how we're wasting tons of cpu cycles to parallelism

Alex Crichton (Nov 21 2019 at 02:53, on Zulip):

in such a way that's actually hurtful sometimes

Alex Crichton (Nov 21 2019 at 02:53, on Zulip):

although I do not know precisely where all the wasted cpu cycles are going

simulacrum (Nov 21 2019 at 02:53, on Zulip):

hm

Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

I'm just suspecting rayon's original management of threads, given that niko says threads take awhile before going to sleep

simulacrum (Nov 21 2019 at 02:54, on Zulip):

so -Z16 and -j28 is indeed a little red, but not too much

Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

try Z 28

simulacrum (Nov 21 2019 at 02:54, on Zulip):

z16,j28

   syscall              calls       total       min       avg       max   calls/process
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------  -------------

   futex                49412   644980.216     0.000    13.053  2434.576       29
   poll                  5053   350509.408     0.000    69.367  1261.564        3
   read                  9007   218561.099     0.000    24.266  1343.709        5
   sched_yield         136958    23475.238     0.000     0.171   408.015       82
   clone                 3073     2463.893     0.000     0.802    94.421        1
   lstat                12284     2180.009     0.001     0.177   116.997        7
   write                 7050     1602.995     0.001     0.227   408.168        4
   mmap                 13393      835.740     0.000     0.062    58.253        8
   execve                 155      703.738     0.000     4.540    31.124        0
   statx                 2494      320.205     0.002     0.128   116.849        1
Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

like I think it's an n^2 problem

Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

I feel like that alone is somewhat damning though

Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

er no, microseconds not milliseconds

Alex Crichton (Nov 21 2019 at 02:54, on Zulip):

but even still sched_yield is super high

Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

when the compiler basically shouldn't ever need to call that

Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

like that's pure wasted work

Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

or inefficient locks

simulacrum (Nov 21 2019 at 02:55, on Zulip):

z28,j28

   syscall              calls       total       min       avg       max   calls/process
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------  -------------

   futex                44449  1077188.759     0.000    24.234  2435.781       21
   poll                  5044   707108.738     0.000   140.188  1255.582        2
   read                  8110   395605.906     0.000    48.780  1044.792        3
   sched_yield         130912   361197.555     0.000     2.759   826.776       62
   write                 6331    16191.301     0.000     2.557   826.988        3
   munmap                4582     7346.600     0.000     1.603   424.413        2
   clone                 4029     5330.293     0.000     1.323   208.436        1
   madvise               1420     2435.015     0.000     1.715   424.451        0
   sigaltstack           5344     2205.971     0.000     0.413   424.403        2
   mmap                 14459     1444.194     0.000     0.100    38.904        6
Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

it's also a massive amount of time in futex...

simulacrum (Nov 21 2019 at 02:55, on Zulip):

futex time is crazy, yes

simulacrum (Nov 21 2019 at 02:55, on Zulip):

I agree that these numbers are suspicious at best :)

Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

I also think /usr/bin/time can be informative

Alex Crichton (Nov 21 2019 at 02:55, on Zulip):

https://hackmd.io/oUdvUU2lTk2ZxfOWQBT9xQ#Crates-in-the-first-3-seconds

Alex Crichton (Nov 21 2019 at 02:56, on Zulip):

like the involuntary context switches is really weird

simulacrum (Nov 21 2019 at 02:57, on Zulip):

28/28 I am seeing similar number of involuntary switches (64309)

Alex Crichton (Nov 21 2019 at 02:58, on Zulip):

also the user time is pretty indicative

Alex Crichton (Nov 21 2019 at 02:58, on Zulip):

like the more threads the more user time went down

simulacrum (Nov 21 2019 at 02:58, on Zulip):
        Command being timed: "/home/mark/.cargo/bin/cargo +ab0dc4e2702a383b051a08b88350c8139a8b47c7-alt check -j28"
        User time (seconds): 6.35
        System time (seconds): 13.35
        Percent of CPU this job got: 650%
Alex Crichton (Nov 21 2019 at 02:58, on Zulip):

which meant that the more parallelism you have the more that's being wsated in the kernel

simulacrum (Nov 21 2019 at 02:59, on Zulip):

with t8,j28

        Command being timed: "/home/mark/.cargo/bin/cargo +ab0dc4e2702a383b051a08b88350c8139a8b47c7-alt check -j28"
        User time (seconds): 28.44
        System time (seconds): 7.64
        Percent of CPU this job got: 1195%
Alex Crichton (Nov 21 2019 at 02:59, on Zulip):

for a compiler user time should be like 100%

Alex Crichton (Nov 21 2019 at 02:59, on Zulip):

like compare those numbers with the single-threaded compiler

simulacrum (Nov 21 2019 at 03:00, on Zulip):

single threaded is indeed on par with -t8 I think:

        Command being timed: "/home/mark/.cargo/bin/cargo +ab0dc4e2702a383b051a08b88350c8139a8b47c7 check -j28"
        User time (seconds): 31.49
        System time (seconds): 4.02
        Percent of CPU this job got: 1176%
Zoxc (Nov 21 2019 at 03:01, on Zulip):

Is sched_yield just called by locks? It doesn't seem like something Rayon / thread spawning would call

simulacrum (Nov 21 2019 at 03:01, on Zulip):

rayon threads yield while trying to sleep

simulacrum (Nov 21 2019 at 03:02, on Zulip):

https://github.com/rayon-rs/rayon/blob/401678ee554f90ab11abe70eb23737e26b489ddd/rayon-core/src/sleep/mod.rs#L63-L95

Alex Crichton (Nov 21 2019 at 03:02, on Zulip):

like we may be able to ship what we have now if we just cap compiler threads at 6

Alex Crichton (Nov 21 2019 at 03:02, on Zulip):

instead of ncpus

Alex Crichton (Nov 21 2019 at 03:02, on Zulip):

is what I'm reading from this data

simulacrum (Nov 21 2019 at 03:02, on Zulip):

it does seem that way

simulacrum (Nov 21 2019 at 03:02, on Zulip):

it's a bit worrisome that I can't reproduce the sched_yield stuff no matter what, like, I get higher numbers but not astronomically like you do

Zoxc (Nov 21 2019 at 03:04, on Zulip):

Hm.. I don't see why Rayon should call thread::yield_now. There won't be any threads to switch to if everything works correctly

simulacrum (Nov 21 2019 at 03:04, on Zulip):

I think the thought is that it is "inexpensive sleep"

simulacrum (Nov 21 2019 at 03:04, on Zulip):

that might be false though :)

simulacrum (Nov 21 2019 at 03:04, on Zulip):

and it's not strictly true that might not be effective

Alex Crichton (Nov 21 2019 at 03:04, on Zulip):

mutexes may call yield, depends on the implementation

simulacrum (Nov 21 2019 at 03:04, on Zulip):

that's only true on a system where rustc is the only thing

Zoxc (Nov 21 2019 at 03:06, on Zulip):

It's a very expensive sleep, std::sync::atomic::spin_loop_hint is probably what should be called?

simulacrum (Nov 21 2019 at 03:06, on Zulip):

hm, I don't know, maybe

simulacrum (Nov 21 2019 at 03:06, on Zulip):

it's called pretty eagerly on rayon master

simulacrum (Nov 21 2019 at 03:07, on Zulip):

and same with Niko's branch (https://github.com/spastorino/rustc-rayon/blob/latch-target-thread-rustc/rayon-core/src/sleep/mod.rs#L104)

Zoxc (Nov 21 2019 at 03:07, on Zulip):

We could patch it out and see if it helps with Alex's 14 cores.

simulacrum (Nov 21 2019 at 03:07, on Zulip):

I suspect it might just want to do nothing

simulacrum (Nov 21 2019 at 03:08, on Zulip):

cc @Santiago Pastorino would you be up to removing the yield_now calls from rayon and re-pushing that up as a try build?

simulacrum (Nov 21 2019 at 03:09, on Zulip):

But yeah I think it may be that we just limit ourselves to, say, -Zthreads=6 or something for now

simulacrum (Nov 21 2019 at 03:09, on Zulip):

otoh, we can't even get major performance wins with that (nowhere near linear speedup)

Zoxc (Nov 21 2019 at 03:10, on Zulip):

We'd want to limit rustc to the number of threads which gives the fastest builds anyway. There's probably some contention in rustc which limits that far below 14 cores currently

simulacrum (Nov 21 2019 at 03:11, on Zulip):

(e.g., worth noting that we shard to 32 - that might be too small if your thread count is half that)

simulacrum (Nov 21 2019 at 03:11, on Zulip):

anyway, I'm off for the night but hopefully we can rebenchmark with rayon's yielding removed at least

simulacrum (Nov 21 2019 at 03:12, on Zulip):

but either way seems like we have rough consensus that limiting thread count would be fine for now

Alex Crichton (Nov 21 2019 at 03:16, on Zulip):

It's a very expensive sleep, std::sync::atomic::spin_loop_hint is probably what should be called?

this makes me a bit worried... this sort of conveys a bit of a misunderstanding of what a spin loop is used for?

Alex Crichton (Nov 21 2019 at 03:17, on Zulip):

if you peg a core waiting for a mutex to get unlocked if an off-core thread actually holds the lock you just eat your whole time slice spinning

Alex Crichton (Nov 21 2019 at 03:17, on Zulip):

the thinking is that you spin for a little bit, which I suspect that parking lot already does

Alex Crichton (Nov 21 2019 at 03:17, on Zulip):

and then you switch to the system, likely with a futex (hopefully) rather than a thread yield

Zoxc (Nov 21 2019 at 03:22, on Zulip):

Rayon spins a bit (64 iterations) before giving up its jobserver token and actually sleeping. It calls yield_now per iteration of that loop currently.

Zoxc (Nov 21 2019 at 03:22, on Zulip):

We can tune the amount of iterations too, to see if it that helps.

Zoxc (Nov 21 2019 at 03:30, on Zulip):

We probably want to replace the yield_now calls in Rayon with an user-mode sleep operation to avoid contention on Rayon's data structures. Possibly with quadratic back-off.

Zoxc (Nov 21 2019 at 03:33, on Zulip):

Just removing them would let us verify that it is the Rayon calls that is showing up in the profile.

Zoxc (Nov 21 2019 at 03:44, on Zulip):

It would also be interesting to know how much thread spawning is slowing down small crates. Perhaps we could create a branch which just crates 27 thread which just acquires a jobserver token and releases it, and just uses 1 thread for rustc itself, and compare how that performs on small crates versus 28 rustc threads

Zoxc (Nov 21 2019 at 03:53, on Zulip):

Limiting Rayon to only creating a thread every 5 ms or so might be a good idea. It would avoid contention in the kernel, avoid wasting time creating extra threads for small crates and it would also effectively load balance jobserver tokens between rustc processes if cargo spawns a bunch of rustc instances.

Alex Crichton (Nov 21 2019 at 04:41, on Zulip):

Yes I think a big issue is that for my system every rustc process spawns 28 threads which then yield a bunch and then sleep. Every single compiler does this while trading around who's got the token to idle for a bit I believe

Alex Crichton (Nov 21 2019 at 04:42, on Zulip):

Throttling thread creation would be a huge win I suspect, but I doubt time limiting is needed but rather only spinning up a thread when there is parallel work to do

Alex Crichton (Nov 21 2019 at 04:42, on Zulip):

Rather than spawning all threads at the start

Alex Crichton (Nov 21 2019 at 04:43, on Zulip):

This means we would need some sort of coordinator thread or something like that but I don't think that's the end of the world

Zoxc (Nov 21 2019 at 05:33, on Zulip):

There is likely enough parallel work in a small crate (say typechecking 28 functions) to spawn 28 threads. We'll immediately do parallel parsing too, but small crates will probably have just a few files.

Santiago Pastorino (Nov 21 2019 at 12:43, on Zulip):

cc Santiago Pastorino would you be up to removing the yield_now calls from rayon and re-pushing that up as a try build?

I can try that yeah

simulacrum (Nov 21 2019 at 12:51, on Zulip):

if you have 28 functions then you shouldn't spin up 28 threads, right? Like, I would hope that we can tune rayon or w/e to be less eager

simulacrum (Nov 21 2019 at 12:52, on Zulip):

type checking 28 functions should likely be done on a single thread

Alex Crichton (Nov 21 2019 at 13:24, on Zulip):

Some rough thinking I have is that the main thread, the one that main starts on, is the one responsible for Fielding requests to spin up new threads

Alex Crichton (Nov 21 2019 at 13:25, on Zulip):

It then always reads a jobswrver token before it spins up a new thread

Alex Crichton (Nov 21 2019 at 13:25, on Zulip):

Which should ideally throttle it by quite a bit, and also caps the total number of threads

Alex Crichton (Nov 21 2019 at 13:25, on Zulip):

By adding a few Ms of latency we can also amortize tiny bursts of parallelism I'd hope

Santiago Pastorino (Nov 21 2019 at 15:42, on Zulip):

https://github.com/rust-lang/rust/pull/66608

Santiago Pastorino (Nov 21 2019 at 15:42, on Zulip):

I thought I've already shared that but seems I didn't (?)

Santiago Pastorino (Nov 21 2019 at 15:43, on Zulip):

anyway, try didn't finish yet

Santiago Pastorino (Nov 21 2019 at 19:26, on Zulip):

@simulacrum https://github.com/rust-lang/rust/pull/66608#issuecomment-557234516 unsure if that means that we need to run try again or what

simulacrum (Nov 21 2019 at 19:32, on Zulip):

Yeah bors retry should do it

Santiago Pastorino (Nov 22 2019 at 15:23, on Zulip):

done and we have a try build ready https://github.com/rust-lang/rust/pull/66608#issuecomment-557569098

Santiago Pastorino (Nov 22 2019 at 15:23, on Zulip):

can you try again your tests?

Santiago Pastorino (Nov 22 2019 at 15:23, on Zulip):

cc @Alex Crichton @simulacrum

Alex Crichton (Nov 22 2019 at 16:35, on Zulip):

building now

Alex Crichton (Nov 22 2019 at 16:36, on Zulip):

well that definitely erased sched_yield from the perf output

Alex Crichton (Nov 22 2019 at 16:37, on Zulip):
   syscall              calls       total       min       avg       max
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------

   futex                52673    45873.689     0.001     0.871  1289.374
   poll                   608    12383.221     0.001    20.367   325.203
   read                  7374     2967.659     0.001     0.402   171.086
   clone                 1593      777.233     0.000     0.488     5.518
   execve                 232      224.929     0.081     0.970    14.779
   openat               27286      126.343     0.001     0.005    18.379
Alex Crichton (Nov 22 2019 at 16:37, on Zulip):

er spoke too soon

Alex Crichton (Nov 22 2019 at 16:37, on Zulip):

that's threads=1

Alex Crichton (Nov 22 2019 at 16:37, on Zulip):

this is threads=28

   syscall              calls       total       min       avg       max
                                    (msec)    (msec)    (msec)    (msec)
   ----------------- -------- ------------ --------- --------- ---------

   futex                70130  1146037.646     0.001    16.342  1918.354
   read                 14774   472861.060     0.001    32.006   787.389
   poll                 10155   295364.289     0.000    29.086   787.293
   sched_yield          15723    22342.877     0.001     1.421   124.090
   write                12407    13890.783     0.001     1.120   787.618
   clone                 5097    13820.337     0.000     2.711    69.469
   munmap                8538     4180.142     0.000     0.490    66.131
Alex Crichton (Nov 22 2019 at 16:39, on Zulip):

still a massive amount of system time

Alex Crichton (Nov 22 2019 at 16:39, on Zulip):

for threads=28

Alex Crichton (Nov 22 2019 at 16:39, on Zulip):
        User time (seconds): 26.96
        System time (seconds): 40.44
Alex Crichton (Nov 22 2019 at 16:39, on Zulip):

vs threads=1

        User time (seconds): 36.97
        System time (seconds): 4.15
Santiago Pastorino (Nov 22 2019 at 17:07, on Zulip):

doh :(

Santiago Pastorino (Nov 22 2019 at 17:08, on Zulip):

@Alex Crichton for threads=28 did this change something or not much?

Alex Crichton (Nov 22 2019 at 17:08, on Zulip):

it probably changed something but doesn't seem to fix the much of an issue

Santiago Pastorino (Nov 22 2019 at 17:10, on Zulip):

yeah I see what you mean but ...

Santiago Pastorino (Nov 22 2019 at 17:10, on Zulip):

the old commit (8d78bf6b273848d17da8f5c92162c6a6b9b10dfd-alt) I also don't see anything like the numbers you report:

syscall calls total min avg max calls/process
(msec) (msec) (msec) (msec)


futex 44951 414095.520 0.001 9.212 2084.394 29
poll 4403 386375.166 0.002 87.753 840.755 2
read 8081 168858.116 0.001 20.896 485.283 5
sched_yield 250017 2598.084 0.001 0.010 10.362 165
clone 2951 1580.496 0.000 0.536 12.840 1
write 6045 1161.817 0.001 0.192 32.851 4
mmap 12679 465.769 0.002 0.037 27.262 8
statx 2214 355.845 0.002 0.161 34.590 1
openat 11254 302.685 0.002 0.027 34.587 7
execve 146 211.512 0.145 1.449 13.039 0

if this was threads=28

simulacrum (Nov 22 2019 at 17:10, on Zulip):

that was on my system

Santiago Pastorino (Nov 22 2019 at 17:10, on Zulip):

we were getting 250000 calls and are now at 15000

simulacrum (Nov 22 2019 at 17:10, on Zulip):

(where sched_yield never featured highly for whatever reason)

Santiago Pastorino (Nov 22 2019 at 17:11, on Zulip):

what I meant is ... saw in some of the results @Alex Crichton shared more than 100.000 calls in some and more than 200.000 in others, and we are now at 15.000

Santiago Pastorino (Nov 22 2019 at 17:12, on Zulip):

I understand that didn't fix the problem but this seems like an improvement I guess (?)

simulacrum (Nov 22 2019 at 17:13, on Zulip):

It is an improvement for sure

Alex Crichton (Nov 22 2019 at 17:13, on Zulip):

yes sched_yield is clearly down

Alex Crichton (Nov 22 2019 at 17:14, on Zulip):

but the issue is that the system time is crazy huge

Alex Crichton (Nov 22 2019 at 17:14, on Zulip):

which at this point probably means a massive amount of contention

Alex Crichton (Nov 22 2019 at 17:14, on Zulip):

like my compiler is getting slower the more threads I throw at it

Santiago Pastorino (Nov 22 2019 at 17:14, on Zulip):

yep

Alex Crichton (Nov 22 2019 at 17:14, on Zulip):

which is the opposite of what we want

simulacrum (Nov 22 2019 at 17:14, on Zulip):

I would like to try to figure out where that contention is coming from

Santiago Pastorino (Nov 22 2019 at 17:14, on Zulip):

yeah, for sure something like that's going on :)

Santiago Pastorino (Nov 22 2019 at 17:15, on Zulip):

I would like to try to figure out where that contention is coming from

I wonder what kind of tool do you use to figure that out

Santiago Pastorino (Nov 22 2019 at 17:15, on Zulip):

I'd need to research a bit

simulacrum (Nov 22 2019 at 17:16, on Zulip):

perf lock looked possibly promising but seems to need a kernel compiled with some flags

Santiago Pastorino (Nov 22 2019 at 17:16, on Zulip):

but we want to get a summary of what's the state of the rest of the threads on every thread acquisition

Santiago Pastorino (Nov 22 2019 at 17:17, on Zulip):

perf lock looked possibly promising but seems to need a kernel compiled with some flags

I could try to play with all that next week

simulacrum (Nov 22 2019 at 17:17, on Zulip):

I'm hoping to spend sometime this weekend trying out a few ideas I have -- we'll see how far I get

Santiago Pastorino (Nov 22 2019 at 17:17, on Zulip):

like 10 years ago I used to tweak all the kernel flags on my system, today I'm old to do that :P

Santiago Pastorino (Nov 22 2019 at 17:17, on Zulip):

but if needed of course I can play again with it :)

Santiago Pastorino (Nov 22 2019 at 17:17, on Zulip):

I'm hoping to spend sometime this weekend trying out a few ideas I have -- we'll see how far I get

better then :), :+1:

Santiago Pastorino (Nov 22 2019 at 17:18, on Zulip):

if it's not a problem, please try to gather as much as you do, may be something cool to document, share or at the least may be good for me to check out :)

simulacrum (Nov 22 2019 at 17:29, on Zulip):

mhm, will try to

Zoxc (Nov 23 2019 at 08:50, on Zulip):

I'd try sharding the symbol and maybe the span interner. Those are behind a single lock currently.

Zoxc (Nov 23 2019 at 08:51, on Zulip):

Bumping the number of shards might also help for Alex's 14 cores. The current number of sufficient for my 8 cores.

Zoxc (Nov 25 2019 at 16:36, on Zulip):

Oh, and Miri could probably use some sharding too

Last update: Dec 12 2019 at 00:55UTC