Stream: t-compiler/wg-parallel-rustc

Topic: cargo timing results


simulacrum (Oct 03 2019 at 18:15, on Zulip):

https://mark.rousskov.org/parallel-compiler-data/ might be of interest to @Alex Crichton -- in particular, I'm not sure if I got the CPU usage %ages right, but those are really low compared to what I was expecting across the board

simulacrum (Oct 03 2019 at 18:16, on Zulip):

this might be cool to roll into cargo for diffs between -Ztimings runs

simulacrum (Oct 03 2019 at 18:16, on Zulip):

CPU %ages are just being summed and divided by count, ignoring time(?) in the first parameter

Alex Crichton (Oct 03 2019 at 18:44, on Zulip):

@simulacrum the CPU usage via average may not be quite right because samples aren't taken with a consistent time interval

Alex Crichton (Oct 03 2019 at 18:44, on Zulip):

I think you'd also need to factor in how far apart each sample is

simulacrum (Oct 03 2019 at 18:45, on Zulip):

Okay, I thought that might be the case

Alex Crichton (Oct 03 2019 at 18:45, on Zulip):

cargo should probably put this on the report

Alex Crichton (Oct 03 2019 at 18:45, on Zulip):

and do the math for you

simulacrum (Oct 03 2019 at 18:45, on Zulip):

I'll look into some scaling or something - probably linear, for simplicity

Alex Crichton (Oct 03 2019 at 18:46, on Zulip):

(this is awesome data btw, thank you so much for collecting all of it!)

Alex Crichton (Oct 03 2019 at 18:46, on Zulip):

in theory it's all based on trapezoids

Alex Crichton (Oct 03 2019 at 18:46, on Zulip):

average each pair of measurements and multiple by their distance

Alex Crichton (Oct 03 2019 at 18:46, on Zulip):

sum all that up and divide total time * 100

Alex Crichton (Oct 03 2019 at 18:46, on Zulip):

and that should be cpu usage

Alex Crichton (Oct 03 2019 at 18:47, on Zulip):

these results do look sort of sad unfortunatley

Alex Crichton (Oct 03 2019 at 18:47, on Zulip):

but I'm hoping that's due to excessive jobserver use

Alex Crichton (Oct 03 2019 at 18:47, on Zulip):

(if that hasn't changed since I last measured)

simulacrum (Oct 03 2019 at 18:47, on Zulip):

I did some spot checking and they look pretty realistic

Alex Crichton (Oct 03 2019 at 18:48, on Zulip):

oh I believe them

Alex Crichton (Oct 03 2019 at 18:48, on Zulip):

we just definitely can't ship this

Alex Crichton (Oct 03 2019 at 18:48, on Zulip):

as-is that is

simulacrum (Oct 03 2019 at 18:48, on Zulip):

Hm, why? It's better than master?

Alex Crichton (Oct 03 2019 at 18:49, on Zulip):

my take on these numbers at least is that it's better, but nowhere near "mountains better" that is what you expect from parallelism

simulacrum (Oct 03 2019 at 18:49, on Zulip):

Oh, sure

Alex Crichton (Oct 03 2019 at 18:49, on Zulip):

and there's currently what I perceive as a very high cost in terms of implementation and bugs

Alex Crichton (Oct 03 2019 at 18:49, on Zulip):

so at least to me it seems like not a great tradeoff right now

Alex Crichton (Oct 03 2019 at 18:49, on Zulip):

but I think this is all due to the jobserver integration if that hasn't changed

Alex Crichton (Oct 03 2019 at 18:49, on Zulip):

like on my machin building cargo regressed when given full parallelism

simulacrum (Oct 03 2019 at 18:50, on Zulip):

Hm yeah that doesn't seem to be the case based on these measurements

Alex Crichton (Oct 03 2019 at 18:50, on Zulip):

I've also got 28 cores though, and it looks like it gets worse with more cores

simulacrum (Oct 03 2019 at 18:50, on Zulip):

That's quite plausible

simulacrum (Oct 03 2019 at 18:50, on Zulip):

I have 8/16

Alex Crichton (Oct 03 2019 at 18:50, on Zulip):

e.g. sccache-opt got worse in your benchmarks with 16 cores vs 8 cores

Alex Crichton (Oct 03 2019 at 18:51, on Zulip):

is "master" here -j1

Alex Crichton (Oct 03 2019 at 18:51, on Zulip):

or is it a non-parallel-capable-compiler?

simulacrum (Oct 03 2019 at 18:51, on Zulip):

No, non parallel compiler

Alex Crichton (Oct 03 2019 at 18:51, on Zulip):

ah ok, that's probably what we mostly care about anyway

simulacrum (Oct 03 2019 at 18:51, on Zulip):

This data does not contain j1 data or it's equivalent

simulacrum (Oct 03 2019 at 18:51, on Zulip):

I think j1 is basically not interesting

simulacrum (Oct 03 2019 at 18:52, on Zulip):

Maybe for our poor 2 core CI

simulacrum (Oct 03 2019 at 18:53, on Zulip):

This is also all with default cargo settings, btw

simulacrum (Oct 03 2019 at 18:53, on Zulip):

So e.g. incremental is on

simulacrum (Oct 03 2019 at 18:54, on Zulip):

But that shouldn't matter too much, I tried servo with and without, made very little difference

Alex Crichton (Oct 03 2019 at 18:54, on Zulip):

nah yeah this is great, these are the measurments I think we largely care about

Alex Crichton (Oct 03 2019 at 18:54, on Zulip):

is this using RUSTFLAGS?

simulacrum (Oct 03 2019 at 18:54, on Zulip):

But that shouldn't matter too much, I tried servo with and without, made very little difference

simulacrum (Oct 03 2019 at 18:54, on Zulip):

At least to the shape of the CPU graph

simulacrum (Oct 03 2019 at 18:55, on Zulip):

But that shouldn't matter too much, I tried servo with and without, made very little difference

simulacrum (Oct 03 2019 at 18:55, on Zulip):

At least to the shape of the CPU graph

simulacrum (Oct 03 2019 at 18:55, on Zulip):

But that shouldn't matter too much, I tried servo with and without, made very little difference

simulacrum (Oct 03 2019 at 18:56, on Zulip):

Yeah rustflags -Z threads=8

Alex Crichton (Oct 03 2019 at 18:56, on Zulip):

for the compiler you're using, the default is -Zthreads=1, right?

Alex Crichton (Oct 03 2019 at 18:57, on Zulip):

the reason I ask is that if Cargo receives --target, like it does for rustc, RUSTFLAGS isn't passed to procedural macros

Alex Crichton (Oct 03 2019 at 18:57, on Zulip):

or build scripts

simulacrum (Oct 03 2019 at 18:57, on Zulip):

Ah, yeah, but I always override it

simulacrum (Oct 03 2019 at 18:57, on Zulip):

Er

simulacrum (Oct 03 2019 at 18:58, on Zulip):

Yeah, so for proc macros and build scripts I guess we're building with -Zthreads=1 which is "worst case"

Alex Crichton (Oct 03 2019 at 18:58, on Zulip):

right yeah

Alex Crichton (Oct 03 2019 at 18:58, on Zulip):

probably not for most of these, only for the rustc ones

Alex Crichton (Oct 03 2019 at 18:58, on Zulip):

where I'm assuming you instrumented rustbuild to pass -Ztimings

simulacrum (Oct 03 2019 at 18:58, on Zulip):

Yes

Alex Crichton (Oct 03 2019 at 18:59, on Zulip):

can I install the toolchain you're using?

simulacrum (Oct 03 2019 at 18:59, on Zulip):

Yes

Alex Crichton (Oct 03 2019 at 18:59, on Zulip):

I want to double-check my measurement that cargo is slower

simulacrum (Oct 03 2019 at 18:59, on Zulip):

the commit hashes should be in the results

Alex Crichton (Oct 03 2019 at 18:59, on Zulip):

(what's the command again?)

Alex Crichton (Oct 03 2019 at 18:59, on Zulip):

it's the alt version right?

simulacrum (Oct 03 2019 at 18:59, on Zulip):

rustup-toolchain-install-master or https://gist.githubusercontent.com/nikomatsakis/81e50fdf7254da8870c682109c404694/raw/d364e5d69809c54bdd2694f6bd304c0032de1552/bors-curl

simulacrum (Oct 03 2019 at 19:00, on Zulip):

no, the master and parallel commits are specifically built as "default"

simulacrum (Oct 03 2019 at 19:00, on Zulip):

you don't need alt builds

Alex Crichton (Oct 03 2019 at 19:00, on Zulip):

so install 702b45e409495a41afcccbe87a251a692b0cefab and the previous commit

Alex Crichton (Oct 03 2019 at 19:00, on Zulip):

and that's the two csomparison toolchains?

Alex Crichton (Oct 03 2019 at 19:00, on Zulip):

wait I'm hurting myself in confusion

simulacrum (Oct 03 2019 at 19:01, on Zulip):

parallel you should use dc78b8ba1 and 702b45e40 for master

Alex Crichton (Oct 03 2019 at 19:01, on Zulip):

ok got it

simulacrum (Oct 03 2019 at 19:01, on Zulip):

neither is an alt build and the parallel commit defaults to -Zthreads=1

Alex Crichton (Oct 03 2019 at 19:05, on Zulip):

ok so building cargo in debug mode w/ 28 threads is slightly faster, 46 -> 43s

Alex Crichton (Oct 03 2019 at 19:05, on Zulip):

but that's still nothing close to what I would expect

Alex Crichton (Oct 03 2019 at 19:06, on Zulip):

"File upload is not yet available for your browser." omg seriously

simulacrum (Oct 03 2019 at 19:06, on Zulip):

I guess we probably do worse due to contention or so

Alex Crichton (Oct 03 2019 at 19:07, on Zulip):

anyway these are just known bugs

Alex Crichton (Oct 03 2019 at 19:07, on Zulip):

we can't ship until we fix jobserver things

Alex Crichton (Oct 03 2019 at 19:07, on Zulip):

and we'll likely need to recollect data after jobserver things are sorted out

simulacrum (Oct 03 2019 at 19:09, on Zulip):

indeed, yes

simulacrum (Oct 03 2019 at 19:25, on Zulip):

@Alex Crichton is there more data you'd be interested in me gathering? we have this, -Zthreads=1 diff, and will soon have -Zthreads=8 vs. master for single-crate on perf.rlo

Alex Crichton (Oct 03 2019 at 19:26, on Zulip):

nah this is perfect imo

Alex Crichton (Oct 03 2019 at 19:26, on Zulip):

like this is clearly showing me that we can't "just turn on -Zthreads=1 on nightly" and we also can't "just ship what's there today"

Alex Crichton (Oct 03 2019 at 19:26, on Zulip):

so we've still got work to do before calling out to internals

Alex Crichton (Oct 03 2019 at 19:26, on Zulip):

and this also is a lot of data to go on for making quantitiative evaluations of parallel rustc

simulacrum (Oct 03 2019 at 19:29, on Zulip):

okay sounds good

simulacrum (Oct 03 2019 at 19:29, on Zulip):

I'll pivot to audit work then

simulacrum (Oct 03 2019 at 19:29, on Zulip):

well, preparing for auditing

lqd (Oct 03 2019 at 19:37, on Zulip):

(unrelated to parallelism but deduplicating syn/quote could also be an interesting win for these cargo build times — I know it wouldn't be exactly easy because they're transitive dependencies but still worth mentioning)

simulacrum (Oct 03 2019 at 22:17, on Zulip):

for anyone following -- updated the site to use the trapezoidal approximation for CPU usage

simulacrum (Oct 03 2019 at 22:17, on Zulip):

basically didn't change anything though I think

Zoxc (Oct 05 2019 at 05:47, on Zulip):

It seems like small crates gets hurt a lot here, presumably due to spawning threads. Have you tried builds with only 2 and 4 threads too?

And are the check/debug builds here incremental?

simulacrum (Oct 05 2019 at 15:48, on Zulip):

yes, though no significant difference was noted for an incremental vs. non-incremental check build

simulacrum (Oct 05 2019 at 15:48, on Zulip):

I/we decided that it's best to measure the reality, which is that check and debug builds are usually incremental

simulacrum (Oct 05 2019 at 15:49, on Zulip):

We did not test 2 or 4 threads -- but I do not expect much difference, to be honest. We already always spawn 1 thread for all builds, maybe even 2? -- not sure.

Zoxc (Oct 05 2019 at 22:18, on Zulip):

I guess crates.io dependencies won't get incremental anyway

simulacrum (Oct 05 2019 at 23:02, on Zulip):

indeed, though for e.g. servo the primary weight comes from in tree deps

simulacrum (Oct 05 2019 at 23:03, on Zulip):

regardless I think @Alex Crichton conclusion that the tradeoffs are not yet worth it is sound.

Zoxc (Oct 06 2019 at 08:02, on Zulip):

I'll take compiler check builds in half the time, thank you very much =P

simulacrum (Oct 06 2019 at 12:12, on Zulip):

eh, it's unlikely that it's representative -- that's an entirely clean build, whereas most of the time you'll see incremental play a good role in check builds

Zoxc (Oct 06 2019 at 13:06, on Zulip):

I do clean check build all the time, but that's mostly because our build system is horrible =P

simulacrum (Oct 06 2019 at 13:20, on Zulip):

Hm it should definitely not be necessary anymore

Zoxc (Oct 06 2019 at 14:01, on Zulip):

Did you make it use a separate folder for check builds?

simulacrum (Oct 06 2019 at 14:19, on Zulip):

Hm, no, I don't think so

simulacrum (Oct 06 2019 at 14:19, on Zulip):

I guess I don't switch too often between check and build

simulacrum (Oct 06 2019 at 14:20, on Zulip):

we could definitely do a separate folder -- somewhat easier than before, in fact

simulacrum (Oct 06 2019 at 14:20, on Zulip):

But I would personally expect that a separate folder is no longer necessary, unless there's cargo bugs

simulacrum (Oct 06 2019 at 14:20, on Zulip):

since we don't ourselves delete folders anymore

nikomatsakis (Oct 07 2019 at 19:16, on Zulip):

like this is clearly showing me that we can't "just turn on -Zthreads=1 on nightly" and we also can't "just ship what's there today"

I'm a bit more optimistic -- I agree it prob doesn't make sense to turn it on yet, but it'd be good to discuss what threshold we are looking for. When we discussed this in the meeting, it seemed to me that in terms of perf work it'd be good to look more at what we can do to get better wins than seq overhead per se, since the max seq overhead is really just a few seconds of walltime

nikomatsakis (Oct 07 2019 at 20:50, on Zulip):

In particular, based on the results, it seems like it's probably worth it for me to enable parallel rustc in my personal use? :)

cuviper (Oct 07 2019 at 21:06, on Zulip):

AKA dogfood it

Last update: Nov 17 2019 at 07:20UTC