Stream: t-compiler/wg-parallel-rustc

Topic: #68171


simulacrum (Feb 03 2020 at 23:40, on Zulip):

I think some other folks should probably weigh in (this is the "parallel always runs all pieces"). We appear to be at an impasse with Zoxc and I'm not sure how to move forward. A design meeting has been suggested; we can pursue that I guess.

Zoxc (Feb 07 2020 at 01:53, on Zulip):

We do want to remove all FatalErrors in case none of this will matter, so I don't really feel like it warrants a design meeting.

Zoxc (Feb 07 2020 at 02:05, on Zulip):

I know that you don't think we need determinism here, but my counter argument is that executing all the iterations is useful for our test suite (so it matches the non-parallel compiler) and it also doesn't really cost us anything.

The case you seem to be concerned about where a fatal error will poison the global state can happen with any parallelism. Say for example:

join(|| {
    fatal_error_which_poisons_global_state();
}, || {
    function_which_ICEs_on_poisoned_global_state()
});

So to have any parallelism our fatal errors must not poison global state or cause any ICEs. Once we remove all fatal errors this won't be a concern though.

simulacrum (Feb 07 2020 at 02:06, on Zulip):

I'm mostly concerned about it being more complicated (at least, we need to think about it a little when adding new parallel abstractions).

simulacrum (Feb 07 2020 at 02:07, on Zulip):

Maybe that's a false concern, but I also don't really see that it buys us that much -- for the test suite, -Zthreads=1 seems fine for where it matters. And, as you say, eventually we'll not have fatal errors in which case it won't matter even more so long as we sort or so.

simulacrum (Feb 07 2020 at 02:08, on Zulip):

Do you agree that we don't need determinism? i.e., that error output need not be deterministic?

simulacrum (Feb 07 2020 at 02:08, on Zulip):

I think that's the first point of disagreement

Zoxc (Feb 07 2020 at 02:18, on Zulip):

-Zthreads=1 just makes the error output deterministic, it doesn't make it match the non-parallel compiler. I do not want to have separate test outputs for the parallel compiler or sort test output (and basically touch all test files).

Zoxc (Feb 07 2020 at 02:20, on Zulip):

I think our error output should be as deterministic as possible, especially if it doesn't come at a performance cost or complexity cost. In this case the performance cost is not relevant and the complexity is contained to sync.rs abstractions.

simulacrum (Feb 07 2020 at 02:27, on Zulip):

I think the complexity cost in mental overhead is higher than that. At least for me having to think "this iterator will continue to run" even on panics is not something that most Rust code is used to

simulacrum (Feb 07 2020 at 02:28, on Zulip):

No matter what we'd not be touching all files. x.py test --stage 1 src/test/ui --bless passes almost entirely for me on a parallel compiler with almost no changes

simulacrum (Feb 07 2020 at 02:28, on Zulip):

maybe 4 files or so

Zoxc (Feb 07 2020 at 02:30, on Zulip):

Try removing all catch_panic calls from sync.rs, since you want to undo that design decision.

Zoxc (Feb 07 2020 at 02:30, on Zulip):

I also think that we need catch_panic in this case anyway, since we don't want changes to Rayon iterators to affect rustc's test output.

simulacrum (Feb 07 2020 at 02:39, on Zulip):

It was my impression that this is not a design decision in the sense that we've not really discussed it, more so that it is "just a choice" that was made at some point.

simulacrum (Feb 07 2020 at 02:40, on Zulip):

I also don't think isolating ourselves from hypothetical rayon changes is necessary

simulacrum (Feb 07 2020 at 02:40, on Zulip):

and catch panic wouldn't help there anyway, since (non fatal) errors are going to get emitted in different orders regardless?

simulacrum (Feb 07 2020 at 02:40, on Zulip):

even across different runs

simulacrum (Feb 07 2020 at 02:40, on Zulip):

well, at least, presuming rayon is non-deterministic, which it must be

Zoxc (Feb 07 2020 at 02:44, on Zulip):

I'm pretty sure it was discussed, but it was probably years ago =P

Zoxc (Feb 07 2020 at 02:45, on Zulip):

Rayon is deterministic with 1 thread (which we use for tests)

simulacrum (Feb 07 2020 at 02:49, on Zulip):

okay, so:

simulacrum (Feb 07 2020 at 02:49, on Zulip):

does that represent an accurate summary from your perspective?

Zoxc (Feb 07 2020 at 02:56, on Zulip):

The tests case isn't accurate. For tests 2 decisions was made:
1) Use -Zthread=1 to make the output of the parallel compiler deterministic. This is an alternative to sorting the error message order for tests (which I think is bad since error order matter, in particular the first error should always be a usable one).

2) Use catch_panicto ensure all the parallel parts execute in both the parallel compiler and the non-parallel compiler. This ensures they both output the same set of error messages (ignoring the order). This is an alternative to manually marking which errors appear in the parallel and the non-parallel compiler in test files. Note that sorting doesn't help here.

simulacrum (Feb 07 2020 at 02:58, on Zulip):

Hm, so is it accurate that we could only use catch_panic if -Zthreads=1, to satisfy your concern over test determinism?

(I'm not saying that's a good idea -- it would increase complexity -- but just to make sure I understand)

simulacrum (Feb 07 2020 at 03:01, on Zulip):

(I'm off for the night, but I think we should keep discussing here, this has felt more productive than the PR)

Zoxc (Feb 07 2020 at 03:04, on Zulip):

Yeah, you need both 1) -Zthreads=1 and 2) catch_panic to make the test output for the non-parallel and parallel compiler match.

simulacrum (Feb 07 2020 at 12:56, on Zulip):

is the reason that we need catch panic, over just -Zthreads=1, that rayon (even in "deterministic mode" with one thread) does not execute an iterator/join sequentially, exactly like non-parallel compilers would?

Zoxc (Feb 07 2020 at 13:12, on Zulip):

I think it executes them sequentially, but on a panic it will abort some iterations, but not others (it uses catch_panic internally), so it doesn't match the non-parallel compiler which in master aborts all later iterations

simulacrum (Feb 07 2020 at 13:14, on Zulip):

I think it would be helpful, for me, if we had a document or so that said "here is how rayon -Zthreads=1 differs from regular iterators" for each API we use in rayon

simulacrum (Feb 07 2020 at 13:14, on Zulip):

because to me it sounds like there is some difference, but I don't have a good handle on it

simulacrum (Feb 07 2020 at 13:14, on Zulip):

and perhaps that difference is one that we could resolve upstream (or say that it is acceptable)

Zoxc (Feb 07 2020 at 13:16, on Zulip):

We can just use catch_panic to get the behavior we want from Rayon iterators. We don't need to change the upstream, especially since we won't keep the catch_panic long-term.

simulacrum (Feb 07 2020 at 13:26, on Zulip):

That's possibly true, yes. But I would be hesitant to sign off on that as a reviewer without understanding the behavior we're patching over

simulacrum (Feb 07 2020 at 13:27, on Zulip):

And I still feel like if the order is not the same as in the non-rayon case, then maybe we're not actually getting determinism due to the global state issues I mentioned before. If the order is the same, then I'm confused why we need to catch panics ourselves.

Zoxc (Feb 07 2020 at 13:46, on Zulip):

The order is not the problem, it is which iterations which will execute once one (or more) iteration panics that differs.

Zoxc (Feb 07 2020 at 13:48, on Zulip):

My PR uses catch_panic so Rayon iterator will see no panics and thus always execute all iterations. So we don't have to rely on how Rayon deals with panics there.

simulacrum (Feb 07 2020 at 14:59, on Zulip):

I realize that. I'm asking - what is the default behavior?

Zoxc (Feb 07 2020 at 15:04, on Zulip):

I'm guessing it groups up chunks to run in a single thread and then runs these in parallel, but the iterator code isn't very readable so I don't know.

simulacrum (Feb 07 2020 at 15:23, on Zulip):

I think if we can't get the order to be the same then it doesn't matter that much as to whether we're catching panics or not.

Zoxc (Feb 07 2020 at 15:35, on Zulip):

Again, the order is not the problem, it seems to match.

simulacrum (Feb 07 2020 at 15:44, on Zulip):

But rayon catches panics already at some non-single-item level?

simulacrum (Feb 07 2020 at 15:45, on Zulip):

(Why?)

Zoxc (Feb 07 2020 at 16:03, on Zulip):

It uses join I think which require catch_panic for soundness

simulacrum (Feb 07 2020 at 16:06, on Zulip):

hm, I guess that makes sense.

simulacrum (Feb 07 2020 at 16:06, on Zulip):

Do we agree that there's no point in us catching panics inside rustc when there are multiple threads?

Zoxc (Feb 07 2020 at 16:12, on Zulip):

Doing that will give us a deterministic set of errors, but the ordering of them will still be non-deterministic

simulacrum (Feb 07 2020 at 16:13, on Zulip):

hm, I don't follow

simulacrum (Feb 07 2020 at 16:14, on Zulip):

it is definitely true that if we don't catch panics in the many thread case that we may not execute the same work

Zoxc (Feb 07 2020 at 16:14, on Zulip):

Assuming Rayon iterators picks their chunks dynamically (which would make sense), but I don't know if they actually do that

simulacrum (Feb 07 2020 at 16:16, on Zulip):

The goals you are setting are:

simulacrum (Feb 07 2020 at 16:16, on Zulip):

Is that accurate?

Zoxc (Feb 07 2020 at 16:18, on Zulip):

Multiple threads: Should emit the same set of error messages as the non-parallel compiler, but possibly in a different order

simulacrum (Feb 07 2020 at 16:20, on Zulip):

(and the first bullet is as you understand?)

Zoxc (Feb 07 2020 at 16:21, on Zulip):

Yeah

simulacrum (Feb 07 2020 at 16:22, on Zulip):

Okay, so I'm not sure that I entirely agree with these constraints. But I think I now understand them, at least.

simulacrum (Feb 07 2020 at 16:23, on Zulip):

Goals:

simulacrum (Feb 07 2020 at 16:28, on Zulip):

@Zoxc So I think I would prefer that we not roll this code into sync.rs (at least for the calls to rayon), and instead have it live inside upstream rayon. It seems similar conceptually to panic_fuse, just doing the reverse. What do you think about upstreaming it instead of having it in sync.rs?

Zoxc (Feb 07 2020 at 16:35, on Zulip):

It might work as a combinator (which ensure all iterators run in the presence of panics). That way users can select to pay the cost for it.

cc @cuviper @nikomatsakis

Zoxc (Feb 07 2020 at 16:41, on Zulip):

Or maybe it won't help much, since we still need to deal with the non-parallel cases

simulacrum (Feb 07 2020 at 16:42, on Zulip):

Well, we're going to jettison the non-parallel case eventually

Zoxc (Feb 07 2020 at 16:48, on Zulip):

And FatalErrors, in which case we wouldn't need said combinator

cuviper (Feb 07 2020 at 16:49, on Zulip):

A delayed-panic combinator does sound interesting, even if rustc ends up not needing it

Last update: Feb 25 2020 at 04:25UTC