Stream: t-compiler/wg-parallel-rustc

Topic: overall strategy


nikomatsakis (Sep 23 2019 at 20:57, on Zulip):

So I think we should discuss our overall strategy here. Unfortunately, I'm pretty sure @Aaron Turon won't be putting more time into this effort after all. I think we have one fundamental choice, presuming of course we still want to ship (and I do!):

- we can try to complete the work of removing shared state before we try to ship
- or we can try to start getting people experimenting and find bugs the "ol' fashioned way".

Of course, the work of removing and refactoring shared state can continue in parallel (no pun intended).

We have previously planned to land the changes so that we build a parallel compiler by default and then give people a period to opt-in to testing, a period to opt-out, and eventually make this the default. This is kind of "second plan". I kind of feel like we should being the period of making this available for people to pay with it, and we can pursue documentation / refactorings simultaneously. If we find people are not reporting a lot of bugs, that will also give us a lot of increased confidence, obviously.

The main downside is that we will suffer a (one-time) slowdown for sequential work, since we'll be shipping a parallel-enabled build by default. But IIRC we had kind of determined that this cost was pretty minimal before.

Anyway, maybe the first question is -- what is the best way to discuss this and how do we make the overall decision? We already had a design meeting where the state was roughly similar, and it seemed like we approved the overall plan, though I feel like there was also a desire to document more of the design (which has not really happened). Maybe we should just run from that consensus?

I've been talking to @Santiago Pastorino about trying to do some of the "organizational" work involved here. I assume @Zoxc will also be around to fix bugs that come up, and I think that probably we'd be able to also start to ramp up a small set of folks around that (e.g., @simulacrum, @Paul Faria).

Anyway, I wanted to kick off these conversations. (Also, cc @pnkfelix)

simulacrum (Sep 23 2019 at 21:03, on Zulip):

Based on https://rust-lang.zulipchat.com/#narrow/stream/187679-t-compiler.2Fwg-parallel-rustc/topic/Gathering.20parallel.20data and my own recollections I don't think shipping a parallel compiler with -Zthreads=1 being the default is viable, that's a 10-20% perf hit on a single crate and 5% on a crate graph. We do recover that perf on crate graphs with ~2 threads (or more) but that still seems suboptimal. It's also unclear to me that we're ready from a pipelining etc perspective for rustc to consume more than 1 job token throughout the compilation -- I think that'd pretty much kill all of the benefits of pipelining due to starvation of tokens.

nikomatsakis (Sep 23 2019 at 21:04, on Zulip):

Hmm, maybe we want to focus on knocking down those limitations first, then

simulacrum (Sep 23 2019 at 21:05, on Zulip):

what is the best way to discuss this and how do we make the overall decision?

I think we can enable parallel compilers on the -alt builders today; this isn't really good from a "I want faster compiler" standpoint because those builders also ship with llvm asserts, but that does give us the advantage of having regular published builds

simulacrum (Sep 23 2019 at 21:07, on Zulip):

Personally I think we should aim to get -Zthreads=1 to be at parity (<5% loss on a single crate compilation) before we start shipping parallel compilers by default

Paul Faria (Sep 23 2019 at 21:07, on Zulip):

There's also definite race conditions if we ship as-is. I know the current cache predecessors can definitely dead lock, and that's not going to be fun for people to run into. I've being talking to @oli in #t-compiler/wg-mir-opt about options for this one

nikomatsakis (Sep 23 2019 at 21:08, on Zulip):

Personally I think we should aim to get -Zthreads=1 to be at parity (<5% loss on a single crate compilation) before we start shipping parallel compilers by default

We set some threshholds before

nikomatsakis (Sep 23 2019 at 21:08, on Zulip):

I'm trying to remember what they were

nikomatsakis (Sep 23 2019 at 21:08, on Zulip):

Anyway, I think that's a reasonable goal

nikomatsakis (Sep 23 2019 at 21:09, on Zulip):

I think we can enable parallel compilers on the -alt builders today; this isn't really good from a "I want faster compiler" standpoint because those builders also ship with llvm asserts, but that does give us the advantage of having regular published builds

This seems promising -- in particular, we could start asking people to help us test

nikomatsakis (Sep 23 2019 at 21:09, on Zulip):

There's also definite race conditions if we ship as-is. I know the current cache predecessors can definitely dead lock, and that's not going to be fun for people to run into. I've being talking to oli in #t-compiler/wg-mir-opt about options for this one

OK. I am not sure if it can really dead lock but I definitely agree that the design is not what we ultimately want.

nikomatsakis (Sep 23 2019 at 21:09, on Zulip):

I mean ultimately having some kind of sense of "where the shared state is and what the protocols are for each one" being documented feels very good.

nikomatsakis (Sep 23 2019 at 21:10, on Zulip):

So maybe there are like two or three prongs to think about

nikomatsakis (Sep 23 2019 at 21:11, on Zulip):

- what do we need to do to get feedback from people in practice? (alt builds is a key enabler here!)
- what do we need to ship -Zthreads=1 enabled? (let's say 5%, as @simulacrum wrote)
- what are the "big pieces" that should be documented in rustc-guide, and how can we get motion on that?
- how can we audit/document/refactor the shared state?

nikomatsakis (Sep 23 2019 at 21:12, on Zulip):

In particular, I didn't ask "when can we ship this", because that is perhaps informed by the results. i.e., if we start getting people using it and they are having problems, that's might push us to work more thoroughly through all locks/shared-state. If things are working ok but people aren't seeing speedups, that might also help us direct work. Finally, if things are working ok, people are seeing speed-ups, maybe we move more to shipping faster and let the doc effort trail behind.

simulacrum (Sep 23 2019 at 21:12, on Zulip):

- what do we need to do to get feedback from people in practice? (alt builds is a key enabler here!)

I think this is blocked on "what feedback do we want?" -- I think at this point we don't want to say "is this faster for you?" since it likely isn't or isn't as good as it can be

nikomatsakis (Sep 23 2019 at 21:12, on Zulip):

You mean because alt builds won't give us that

nikomatsakis (Sep 23 2019 at 21:12, on Zulip):

Or just because the overall perf doesn't scale amazingly

simulacrum (Sep 23 2019 at 21:13, on Zulip):

that, and also, based on what I saw -- we're not seeing like anywhere near linear scaling

simulacrum (Sep 23 2019 at 21:13, on Zulip):

like, if you throw 8 cores at rustc you don't see more than like ~20% gain at most, usually less, IIRC

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

Yes. I think that's..maybe ok.

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

I mean not amazing :)

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

but like 2x is 2x

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

well, 20% is not so great :)

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

I seem to recall cargo check was much better

nikomatsakis (Sep 23 2019 at 21:14, on Zulip):

at least in some cases

nikomatsakis (Sep 23 2019 at 21:15, on Zulip):

but regardless it seems like alt builds are going to help us most with flushing out bugs, and less so with perf wins

simulacrum (Sep 23 2019 at 21:15, on Zulip):

hm, perhaps. Anyway -- I think I can get a PR rolling for parallel-enabled alt builds

simulacrum (Sep 23 2019 at 21:15, on Zulip):

That'd at least get us to a point where you can reasonably simply bootstrap the rust compiler, for example, off of parallel rustc

nikomatsakis (Sep 23 2019 at 21:15, on Zulip):

I guess a downside of this is that if people try it out it might start getting a reputation as being slow, but we can probably message this appropriately (and clearly we wouldn't trumpet alt builds over the main rust blog or anything)

simulacrum (Sep 23 2019 at 21:16, on Zulip):

(It could be true that we say "hey, the LLVM asserts on these alt builders aren't buying as anything anyway, let's disable them and enable just parallel)

nikomatsakis (Sep 23 2019 at 21:17, on Zulip):

I have no idea how many people use that mechanism

nikomatsakis (Sep 23 2019 at 21:17, on Zulip):

I certainly never do but...

simulacrum (Sep 23 2019 at 21:17, on Zulip):

It.. might be feasible to find out based on e.g. cloudfront stats, not sure

nikomatsakis (Sep 23 2019 at 21:18, on Zulip):

folks from @WG-llvm might have some idea -- hey, do y'all use alt builds to get llvm assertions enabled?

nikomatsakis (Sep 23 2019 at 21:18, on Zulip):

I sort of forget who requested them in the first place

nikomatsakis (Sep 23 2019 at 21:18, on Zulip):

I think it was us, the compiler team

nagisa (Sep 23 2019 at 21:18, on Zulip):

I build my own build.

nikomatsakis (Sep 23 2019 at 21:21, on Zulip):

Ok.

Paul Faria (Sep 23 2019 at 21:22, on Zulip):

OK. I am not sure if it can really dead lock but I definitely agree that the design is not what we ultimately want.

After double checking, you're right. There's a case where the data gets computed twice, but no deadlock. My mistake. I was thinking back to a similar situation I ran into in C++

nikomatsakis (Sep 23 2019 at 21:24, on Zulip):

Still, I think the other concern that @Aaron Turon raised is pretty possible. In particular, it's not hard to imagine that some of the existing structs have two fields under separate locks that need to be mutated atomically for correctness; under a ref-cell, you'd never be able to observe the 'in between' state, but this is not the case for locks.

simulacrum (Sep 23 2019 at 21:24, on Zulip):

Concretely it sounds like we want:

nikomatsakis (Sep 23 2019 at 21:25, on Zulip):

Yeah, and maybe try to avoid general purpose locks and replace them with safer patterns, etc

simulacrum (Sep 23 2019 at 21:26, on Zulip):

Indeed, yes -- I'm thinking that eventually we probably want to be at a place where Lock (or Mutex, etc) aren't really seen inside the compiler proper, since for the most part they aren't needed

nikomatsakis (Sep 23 2019 at 21:26, on Zulip):

ps @Santiago Pastorino asked me what alt builds are under privmsg, I thought I'd answer here as probably lots of folks don't know :)

nikomatsakis (Sep 23 2019 at 21:26, on Zulip):

basically we publish an "alternative" version of our builds (all of them? I forget) that is build with different compiler flags than the standard distribution

Santiago Pastorino (Sep 23 2019 at 21:26, on Zulip):

didn't want to distract you from the wonderful discussion you're having :)

nikomatsakis (Sep 23 2019 at 21:26, on Zulip):

right now that is used to distribute a build with llvm assertions enabled :)

nikomatsakis (Sep 23 2019 at 21:27, on Zulip):

but we are discussing "repurposing" this channel

nikomatsakis (Sep 23 2019 at 21:27, on Zulip):

you can install them through rustup

Santiago Pastorino (Sep 23 2019 at 21:27, on Zulip):

I see :+1:

nikomatsakis (Sep 23 2019 at 21:28, on Zulip):

Concretely it sounds like we want:

ok it feels like we're making some sort of progress here

nikomatsakis (Sep 23 2019 at 21:28, on Zulip):

in particular, it feels like "it's premature to talk of shipping" but we should start by setting a variety of smaller goals

nikomatsakis (Sep 23 2019 at 21:28, on Zulip):

and a key first milestone is to get the seq overhead (-Zthreads=1) down to a level where we are comfortable flipping the switch

nikomatsakis (Sep 23 2019 at 21:29, on Zulip):

enabling alt builds is a way for us to also be advancing our confidence in correctness and gathering data in the meantime

nikomatsakis (Sep 23 2019 at 21:29, on Zulip):

and a key first milestone is to get the seq overhead (-Zthreads=1) down to a level where we are comfortable flipping the switch

for some reason I thought we were there already

simulacrum (Sep 23 2019 at 21:29, on Zulip):

hm, well, maybe we are -- but perhaps not yet there in terms of impl confidence

nikomatsakis (Sep 23 2019 at 21:30, on Zulip):

we should try to be sure I guess

simulacrum (Sep 23 2019 at 21:30, on Zulip):

in particular I think deadlocking is fine at this point (alpha quality) but e.g. corrupting because we didn't lock for some intermediate bit is _not_

nikomatsakis (Sep 23 2019 at 21:30, on Zulip):

I kind of think neither is ok :)

nikomatsakis (Sep 23 2019 at 21:30, on Zulip):

I guess a good question is "what would help us increase our confidence" the most, though

simulacrum (Sep 23 2019 at 21:31, on Zulip):

I would personally want to see less sort of arbitrary field-level locks -- this might mean that we just have a doc on each, though, I guess

nikomatsakis (Sep 23 2019 at 21:31, on Zulip):

yes, I would be happy if we either (a) moved most of those locks to the outermost level or (b) documented the invariants cleanly

nikomatsakis (Sep 23 2019 at 21:31, on Zulip):

typically the latter would be done by (in my view) extracting a struct with the operations

nikomatsakis (Sep 23 2019 at 21:32, on Zulip):

well, that may be overkill. depends.

nikomatsakis (Sep 23 2019 at 21:34, on Zulip):

(but comments at least :)

simulacrum (Sep 23 2019 at 21:40, on Zulip):

I think it would be feasible to do that refactoring / extraction within the next month or so, depending on review queue

nikomatsakis (Sep 23 2019 at 21:41, on Zulip):

yep

nikomatsakis (Sep 23 2019 at 21:42, on Zulip):

I count 175 uses of \bLock\b in src/lib* :)

nikomatsakis (Sep 23 2019 at 21:42, on Zulip):

not that many

Nikita Popov (Sep 23 2019 at 21:43, on Zulip):

folks from @WG-llvm might have some idea -- hey, do y'all use alt builds to get llvm assertions enabled?

Yeah, it's pretty convenient to get be able to get an assertion-enabled build via rustup-toolchain-install-master. TBH I thought that having LLVM assertions was the entire purpose of the alt builds.

nikomatsakis (Sep 23 2019 at 21:44, on Zulip):

It was...

Pietro Albini (Sep 24 2019 at 06:43, on Zulip):

I think we can enable parallel compilers on the -alt builders today; this isn't really good from a "I want faster compiler" standpoint because those builders also ship with llvm asserts, but that does give us the advantage of having regular published builds

Pietro Albini (Sep 24 2019 at 06:44, on Zulip):

please ping me if you do that as crater is using alt builds today

Pietro Albini (Sep 24 2019 at 06:44, on Zulip):

I just don't want our run times to get even slower

simulacrum (Sep 24 2019 at 11:05, on Zulip):

@Pietro Albini Oh, for all runs?

Pietro Albini (Sep 24 2019 at 11:06, on Zulip):

iirc we're only using alt builds

simulacrum (Sep 24 2019 at 11:06, on Zulip):

/me goes off to add item to infra agenda to not do that

Last update: Nov 17 2019 at 07:20UTC