Stream: t-compiler/wg-parallel-rustc

Topic: contention debugging


simulacrum (Nov 27 2019 at 18:14, on Zulip):

trying a different tack -- perf record can give me the mutex pointers and those are always stable in our case so going to log those on creation time, hopefully there's a) not too many mutexes being created and b) creation isn't as such hot enough that it causes problems

simulacrum (Nov 27 2019 at 20:14, on Zulip):

hm, okay, so that sort of didn't succeed - the address that is given to futex is not of the pthread_mutex_t it seems

simulacrum (Nov 27 2019 at 20:35, on Zulip):

it does seem like a lot of the futex calls are not coming from rustc? but this does not jive well with what we're seeing elsewhere, i.e., in alex's foo.rs tool

    808 cargo
 236398 ld
 125675 rustc
simulacrum (Nov 27 2019 at 20:35, on Zulip):

I wonder if that tool is buggy

simulacrum (Nov 27 2019 at 20:44, on Zulip):

otoh, 125k is still pretty high

simulacrum (Nov 27 2019 at 21:36, on Zulip):

I am consistently getting most futex calls in ld, not rustc

simulacrum (Nov 27 2019 at 21:36, on Zulip):

and total futex call count is still around 400k so I don't think it's because I've made changes

simulacrum (Nov 27 2019 at 21:39, on Zulip):

I am kinda confused about the futex calls from ld as I didn't think that was parallelized -- I wonder if the ld I have is not the normal ld

simulacrum (Nov 27 2019 at 21:40, on Zulip):

ah, indeed, I'm using lld as my default linker it seems

Alex Crichton (Nov 28 2019 at 00:54, on Zulip):

What in the world is the linker using futexes for

Alex Crichton (Nov 28 2019 at 00:54, on Zulip):

Isn't it single threaded?

simulacrum (Nov 28 2019 at 00:55, on Zulip):

lld is not, apparently, according to some random googling

Alex Crichton (Nov 28 2019 at 00:55, on Zulip):

Interesting, so my script should probably change

Alex Crichton (Nov 28 2019 at 00:55, on Zulip):

And only account for rustc things

simulacrum (Nov 28 2019 at 00:55, on Zulip):

(to be clear, my interpretation of your script was that it tried to do so)

Alex Crichton (Nov 28 2019 at 00:55, on Zulip):

The linker we don't care too much about and likely just adds noise

Alex Crichton (Nov 28 2019 at 00:55, on Zulip):

Oh, nvmd then

simulacrum (Nov 28 2019 at 00:56, on Zulip):

though maybe it failed since it does seem like the numbers I'm getting are heavily inconsistent with the theory that the script was successful at doing so

simulacrum (Nov 28 2019 at 00:57, on Zulip):

it's also possible that perf trace has different properties than what I'm doing now, which is (I think) more low level

simulacrum (Nov 28 2019 at 00:58, on Zulip):

right now I've largely been struggling to figure out how to track down callers of e.g. Mutex::new -- collecting backtraces via std::backtrace is apparently way too slow (I get something like an order of magnitude, or two, slowdown)

simulacrum (Nov 28 2019 at 00:59, on Zulip):

I'm going to move it to just the lock() method or something

Alex Crichton (Nov 28 2019 at 01:00, on Zulip):

I'd actually warn against that

Alex Crichton (Nov 28 2019 at 01:00, on Zulip):

Or take.the numbers with a grain of salt

Alex Crichton (Nov 28 2019 at 01:00, on Zulip):

Backtraces are single threaded and extremely expensive to capture

Alex Crichton (Nov 28 2019 at 01:01, on Zulip):

So it may not be a very reliable way to gather contention data that way

simulacrum (Nov 28 2019 at 01:01, on Zulip):

to be clear the new plan only collects the backtrace once we've already detected at least some contention

simulacrum (Nov 28 2019 at 01:01, on Zulip):

but yes, I agree that it's likely not perfect

simulacrum (Nov 28 2019 at 01:01, on Zulip):

(I'm grasping at straws a bit, to be honest)

simulacrum (Nov 28 2019 at 01:03, on Zulip):

one possible easy win that I should try though I have no numbers to back me up is to try and switch rayon over to parking_lot

simulacrum (Nov 28 2019 at 01:03, on Zulip):

since at least primitive benchmarks seem to make us believe that parking_lot is much better at both the contended and uncontended cases, and I am seeing at least some evidence pointing at rayon being a cause of contention

simulacrum (Nov 28 2019 at 01:04, on Zulip):

(in particular, parking lot currently barely calls futex)

simulacrum (Nov 28 2019 at 23:21, on Zulip):

parking lot didn't seem to help much

simulacrum (Nov 29 2019 at 00:01, on Zulip):

here's flamegraph of the futex calls in the first 3 seconds of cargo check on cargo for master, the rayon branch, and rayon+semaphores:

futex-rayon.svg futex-master.svg futex-semaphores.svg

simulacrum (Dec 02 2019 at 21:13, on Zulip):

@Alex Crichton sudo --preserve-env perf record -m256 --switch-output=600M --aio=4 -e syscalls:sys_enter_futex --call-graph dwarf -o ~/perf.data -- timeout 3s $CARGO check -j$T with RUSTC and CARGO set appropriately and T=28, then perf script into stack-collapse / flamegraph (I use inferno, but shouldn't matter)

simulacrum (Dec 02 2019 at 21:13, on Zulip):

(I also have my own custom perf script that's quite a bit faster, but I haven't gotten around to publishing it -- it should be about equivalent in terms of output though)

Alex Crichton (Dec 02 2019 at 21:13, on Zulip):

woo thanks!

simulacrum (Dec 02 2019 at 21:13, on Zulip):

I'll try and type up the other commands I was using

bjorn3 (Dec 02 2019 at 21:24, on Zulip):

How long does processing a 1GB perf.data file captured using perf record --call-graph dwarf normally take with perf script? I got one which still wasn't done after an hour. Is it just my computer or is it normal?

simulacrum (Dec 02 2019 at 21:27, on Zulip):

that's normal

simulacrum (Dec 02 2019 at 21:27, on Zulip):

my tool is much faster (which is why I wrote it)

simulacrum (Dec 02 2019 at 21:27, on Zulip):

I should get around to figuring out licensing etc and publish it

bjorn3 (Dec 02 2019 at 21:39, on Zulip):

Looking forward to see it being publish! I want to be abable to get a flamegraph of a full cargo build run for the relatively small project I profiled it for.

Last update: Dec 12 2019 at 00:50UTC