Stream: t-compiler/wg-parallel-rustc

Topic: mir stealing


nikomatsakis (Sep 06 2019 at 18:17, on Zulip):

So this is the set of queries for MIR right now

        /// Fetch the MIR for a given `DefId` right after it's built - this includes
        /// unreachable code.
        query mir_built(_: DefId) -> &'tcx Steal<mir::Body<'tcx>> {}

        /// Fetch the MIR for a given `DefId` up till the point where it is
        /// ready for const evaluation.
        ///
        /// See the README for the `mir` module for details.
        query mir_const(_: DefId) -> &'tcx Steal<mir::Body<'tcx>> {
            no_hash
        }

        query mir_validated(_: DefId) -> &'tcx Steal<mir::Body<'tcx>> {
            no_hash
        }

        /// MIR after our optimization passes have run. This is MIR that is ready
        /// for codegen. This is also the only query that can fetch non-local MIR, at present.
        query optimized_mir(key: DefId) -> &'tcx mir::Body<'tcx> {
            cache_on_disk_if { key.is_local() }
            load_cached(tcx, id) {
                let mir: Option<crate::mir::Body<'tcx>> = tcx.queries.on_disk_cache
                                                            .try_load_query_result(tcx, id);
                mir.map(|x| &*tcx.arena.alloc(x))
            }
        }

        query promoted_mir(key: DefId) -> &'tcx IndexVec<mir::Promoted, mir::Body<'tcx>> { }
nikomatsakis (Sep 06 2019 at 18:18, on Zulip):

from librustc/query/mod.rs -- well, a mildly outdated copy, apparently

nikomatsakis (Sep 06 2019 at 18:18, on Zulip):

also this file has changed a lot since I last looked at it :)

simulacrum (Sep 06 2019 at 18:19, on Zulip):

Is there a reason for having the separate queries? If we don't expect them to run in parallel with each other on a given DefId might it make sense to move all this into a single query that dispatches to calling mir_const etc in a non-query fashion, with &mut Mir?

nikomatsakis (Sep 06 2019 at 18:19, on Zulip):

I was just thinking about it

nikomatsakis (Sep 06 2019 at 18:19, on Zulip):

there is some reason

nikomatsakis (Sep 06 2019 at 18:19, on Zulip):

e.g., in an IDE, you want be able to do the borrow check

nikomatsakis (Sep 06 2019 at 18:20, on Zulip):

(to produce errors)

nikomatsakis (Sep 06 2019 at 18:20, on Zulip):

without necessarily doing all the optimizations

Aaron Turon (Sep 06 2019 at 18:20, on Zulip):

@Paul Faria note: looks like there's a "mir stealing" topic now

nikomatsakis (Sep 06 2019 at 18:20, on Zulip):

the current setup allows for that fairly gracefully

nikomatsakis (Sep 06 2019 at 18:20, on Zulip):

that said, I'm not sure how imp't that is, especially since we don't do any real optimization right now

simulacrum (Sep 06 2019 at 18:20, on Zulip):

sounds reasonable to me -- I'd be willing to spend some time trying to refactor the handler in librustc_errors simultaneously with an audit of sorts to try and come up with some docs on how to do it and such

nikomatsakis (Sep 06 2019 at 18:20, on Zulip):

iow if just had one optimized_mir query that did all the things, it would sometimes do more work than is needed

simulacrum (Sep 06 2019 at 18:20, on Zulip):

presuming that's useful?

Aaron Turon (Sep 06 2019 at 18:21, on Zulip):

@Paul Faria i think you can move your messages if you want

nikomatsakis (Sep 06 2019 at 18:21, on Zulip):

probably also for cargo check

simulacrum (Sep 06 2019 at 18:22, on Zulip):

so we also have an abstraction -- for the non-tcx queries -- that essentially wraps a generator such that you can "access" a piece of state, and then move on to the next piece of the function (optionally)

Paul Faria (Sep 06 2019 at 18:23, on Zulip):

I might have messed up some messages in this topic. I assumed (incorrectly) that moving "later replies" would just be my own, and not everyone else's as well.

nikomatsakis (Sep 06 2019 at 18:23, on Zulip):

heh, I was wondering what happened there

simulacrum (Sep 06 2019 at 18:24, on Zulip):

https://github.com/rust-lang/rust/blob/master/src/librustc_data_structures/box_region.rs is the data structure

nikomatsakis (Sep 06 2019 at 18:24, on Zulip):

in any case, @Paul Faria, I think that would be too hard to do in practice, we did think about things like this; you could also (for example) build MIR out of persistent data structures, but that'd be a big change

simulacrum (Sep 06 2019 at 18:24, on Zulip):

but essentially it's quite similar to async/await except without the "not ready" (i.e., it's synchronous)

simulacrum (Sep 06 2019 at 18:25, on Zulip):

so that could allow some form of "linearizability" where you can ask for things but we guarantee in one place that you're always >= the previous so to speak

nikomatsakis (Sep 06 2019 at 18:25, on Zulip):

@simulacrum interesting. this gets into the direction of the more ambitious proposals where we extend the query system somewhat

Paul Faria (Sep 06 2019 at 18:26, on Zulip):

@nikomatsakis no worries, I might stick with the predecessors issue then if no one else is hoping to pick it up

nikomatsakis (Sep 06 2019 at 18:26, on Zulip):

you could certainly imagine having some process that encapaulates some state (including the mir)

nikomatsakis (Sep 06 2019 at 18:26, on Zulip):

and can be pushed along to various points

nikomatsakis (Sep 06 2019 at 18:26, on Zulip):

and caches the results of those points

nikomatsakis (Sep 06 2019 at 18:26, on Zulip):

hmm so e.g. you could have like

nikomatsakis (Sep 06 2019 at 18:26, on Zulip):
fn mir(def_id) -> MirBox
Wesley Wiser (Sep 06 2019 at 18:27, on Zulip):

iow if just had one optimized_mir query that did all the things, it would sometimes do more work than is needed

It was also necessary add a new query so that miri (the const eval engine not the tool) could access promoted MIR while we're processing the body those promotions came from during optimize_mir. This is part of the ongoing work to deduplicate code between miri and rustc_mir::transform::ConstProp.

nikomatsakis (Sep 06 2019 at 18:27, on Zulip):

and then

struct MirBox {
    fn borrow_check(&self) -> BorrowCheckResult;
    fn optimized_mir(&self) -> &Mir;
}

and whatever the major outputs are

nikomatsakis (Sep 06 2019 at 18:27, on Zulip):

(the idea being that MirBox kind of encapsulates and hides this state that moves monotonically along, but only as far as it needs to)

nikomatsakis (Sep 06 2019 at 18:28, on Zulip):

it'd be sort of a mini query system I guess ;)

simulacrum (Sep 06 2019 at 18:28, on Zulip):

Yeah, that was what I was thinking

nikomatsakis (Sep 06 2019 at 18:28, on Zulip):

It was also necessary add a new query so that miri (the const eval engine not the tool) could access promoted MIR ...

yeah, we'd have to accommodate this

simulacrum (Sep 06 2019 at 18:28, on Zulip):

In some sense though this doesn't quite work out

nikomatsakis (Sep 06 2019 at 18:28, on Zulip):

what sense do you mean?

nikomatsakis (Sep 06 2019 at 18:29, on Zulip):

well

simulacrum (Sep 06 2019 at 18:29, on Zulip):

well, you always have the potential for panic or some similar functionality -- if you make the "subqueries" public then it's hard to audit precisely I guess

nikomatsakis (Sep 06 2019 at 18:29, on Zulip):

the thing that I exactly proposed is "not great" because it's still shared mutable state and is going to play poorly with incremental etc

simulacrum (Sep 06 2019 at 18:29, on Zulip):

in that anyone can come along and call a prior state

nikomatsakis (Sep 06 2019 at 18:29, on Zulip):

oh, that's not an issue the way I specified it

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

the idea would be that you will cache the results of (say) borrow-check

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

and any outputs "publicly vislble"

simulacrum (Sep 06 2019 at 18:30, on Zulip):

right, I'm talking about the inputs -- borrow check wants unoptimized MIR

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

that's not a problem

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

unless I'm misunderstanding you

simulacrum (Sep 06 2019 at 18:30, on Zulip):

I guess you're saying that we'd always run it if someone asked for optimized MIR?

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

yes, same as today

nikomatsakis (Sep 06 2019 at 18:30, on Zulip):

basically internally there is a series of steps

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

that are ALWAYS run in a fixed order

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

and never go backwards

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

but we only run through as many as we have to

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

to satisfy the things requested thus far

simulacrum (Sep 06 2019 at 18:31, on Zulip):

Ah, okay -- so I misunderstood then -- I guess my thinking was that we do want the ability to say "get me optimized MIR, don't run borrowck"

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

if you ask for optimized-mir first, it'll do them all (and produce all the outputs alon the way)

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

ah, no

nikomatsakis (Sep 06 2019 at 18:31, on Zulip):

I think I might like to think of it as a "query group"

simulacrum (Sep 06 2019 at 18:31, on Zulip):

but if that's not a desirable outcome, then a MirBox of sorts seems like a very reasonable thing to have

nikomatsakis (Sep 06 2019 at 18:32, on Zulip):

i.e., you define a linear set of queries that will have some "shared, hidden state"

simulacrum (Sep 06 2019 at 18:32, on Zulip):

Right, but that state is essentially a bunch of Onces and a single Lock<Mir> that is acquired for the duration of the query

nikomatsakis (Sep 06 2019 at 18:32, on Zulip):

and in their provider functions, they get ownership of this state and they return a tuple (their result for the outside world, and the state for next step)

simulacrum (Sep 06 2019 at 18:32, on Zulip):

(or equivalent, I guess)

nikomatsakis (Sep 06 2019 at 18:33, on Zulip):

yeah, somehow behind the scenes the query engine can track how far along you are

nikomatsakis (Sep 06 2019 at 18:33, on Zulip):

the advantage of this would be that it is well exposed to incremental

nikomatsakis (Sep 06 2019 at 18:33, on Zulip):

so e.g. we can see that at some point we can skip over the remaining steps

simulacrum (Sep 06 2019 at 18:33, on Zulip):

hm, so I feel like integrating with incremental might be somewhat of a non-goal?

nikomatsakis (Sep 06 2019 at 18:33, on Zulip):

why?

simulacrum (Sep 06 2019 at 18:33, on Zulip):

like, you only ever do all/nothing it feels like for some given compilation

nikomatsakis (Sep 06 2019 at 18:34, on Zulip):

not at all

simulacrum (Sep 06 2019 at 18:34, on Zulip):

though I guess that might not be true for e.g. IDEs and such

nikomatsakis (Sep 06 2019 at 18:34, on Zulip):

no, it's not true in rustc

nikomatsakis (Sep 06 2019 at 18:34, on Zulip):

like, if I only change the body of one function

nikomatsakis (Sep 06 2019 at 18:34, on Zulip):

why should I have to reoptimize all the rest?

simulacrum (Sep 06 2019 at 18:34, on Zulip):

ah, so I meant all/nothing (for a given DefId)

simulacrum (Sep 06 2019 at 18:34, on Zulip):

so sort of less granular than today, presumably

nikomatsakis (Sep 06 2019 at 18:34, on Zulip):

ah. yes, it might be sufficient to track the "query series" as a unit, but that's not obvious to me

nikomatsakis (Sep 06 2019 at 18:35, on Zulip):

it's probably true

nikomatsakis (Sep 06 2019 at 18:35, on Zulip):

I mean there's no theoretical reason for it, just probably true enough in practice

simulacrum (Sep 06 2019 at 18:35, on Zulip):

I guess internally it'd potentially serialize and deserialize to w/e state it was in, too, so maybe it just all sort of works out anyway to fine-grained tracking

nikomatsakis (Sep 06 2019 at 18:35, on Zulip):

(you could e.g. have made some change that desugars into the same MIR, or which winds up with equivalent MIR after some optimization is done)

nikomatsakis (Sep 06 2019 at 18:36, on Zulip):

I guess I think that if it makes life easier, then foregoing incremental/parallelization between members of a "query series" is fine

nikomatsakis (Sep 06 2019 at 18:36, on Zulip):

the main thing is that I don't think this should live "on the side"

nikomatsakis (Sep 06 2019 at 18:36, on Zulip):

with some shared mutable state not exposed to the query system

nikomatsakis (Sep 06 2019 at 18:37, on Zulip):

/me ponders

nikomatsakis (Sep 06 2019 at 18:37, on Zulip):

wait

nikomatsakis (Sep 06 2019 at 18:37, on Zulip):

maybe I'm wrong

simulacrum (Sep 06 2019 at 18:37, on Zulip):

So you mean like that the mutable state should be inside the MirBox? That was sort of implied, I felt?

simulacrum (Sep 06 2019 at 18:37, on Zulip):

I'm not sure where the "on the side" would be / what we'd put there

nikomatsakis (Sep 06 2019 at 18:37, on Zulip):

in the mir box is "on the side" in my terminology :)

nikomatsakis (Sep 06 2019 at 18:38, on Zulip):

it's not stored in the query hashmaps

nikomatsakis (Sep 06 2019 at 18:38, on Zulip):

the query system doesn't know about it or manage it

simulacrum (Sep 06 2019 at 18:38, on Zulip):

hm, but the mirbox is? Right?

simulacrum (Sep 06 2019 at 18:38, on Zulip):

Like, the mir(DefId) -> &MirBox query is a proper query

nikomatsakis (Sep 06 2019 at 18:38, on Zulip):

yes I somewhat take back what I said

nikomatsakis (Sep 06 2019 at 18:38, on Zulip):

well, no, it's not proper in the sense we mean oday

nikomatsakis (Sep 06 2019 at 18:38, on Zulip):

in that query results are supposed to be "pure values"

simulacrum (Sep 06 2019 at 18:38, on Zulip):

it's just that it evaluates to some state that can change down the road (like Steal, today) but in some non-stealing fashion

nikomatsakis (Sep 06 2019 at 18:39, on Zulip):

that never cahnge

nikomatsakis (Sep 06 2019 at 18:39, on Zulip):

otherwise, the whole premise of incremental falls apart

nikomatsakis (Sep 06 2019 at 18:39, on Zulip):

but I think you can make it work

simulacrum (Sep 06 2019 at 18:39, on Zulip):

hm, okay, so I was thinking that it's fine for the return to change so long as the _query_ always resolves to the "same" thing

simulacrum (Sep 06 2019 at 18:39, on Zulip):

like, I don't see why it'd be a problem for incremental, since presumably it'd still work out

nikomatsakis (Sep 06 2019 at 18:40, on Zulip):

it basically depends on what hash/eq mean

simulacrum (Sep 06 2019 at 18:40, on Zulip):

though we might need some custom code to unify an old MirBox and a possible new one

nikomatsakis (Sep 06 2019 at 18:40, on Zulip):

I think what you could do is this

nikomatsakis (Sep 06 2019 at 18:40, on Zulip):

the "identity" of a MirBox is some fresh uuid

nikomatsakis (Sep 06 2019 at 18:41, on Zulip):

(and/or maybe a hash of the initial mir that you made)

nikomatsakis (Sep 06 2019 at 18:41, on Zulip):

when we serialize a mirbox, we capture whatever values we cached so far

nikomatsakis (Sep 06 2019 at 18:41, on Zulip):

now when incremental re-runs --

nikomatsakis (Sep 06 2019 at 18:41, on Zulip):

if the inputs to the mir query have changed (basically, the HIR of the function, or relevant types etc)

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

we'll rebuild the MIR

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

since it's identity is a fresh UUID (let's run with that for a sec)

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

it will always be dirty relative to the previous, cached mir

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

but that's ok

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

that means we'll just have none of the "downstream steps" (borrow-check, etc)

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

cached

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

and hence we will re-run them

nikomatsakis (Sep 06 2019 at 18:42, on Zulip):

but if the inputs to the mir did not change, we can deserialize the mir box as normal

nikomatsakis (Sep 06 2019 at 18:43, on Zulip):

I think this would also work fine if you used instead a hash of the MIR as the "identity" of the mir-box

nikomatsakis (Sep 06 2019 at 18:43, on Zulip):

except that maybe you'd get a hair more re-use, because sometimes your serialized value would be found to be equal

simulacrum (Sep 06 2019 at 18:43, on Zulip):

because MIR encodes a "lower" form than HIR and such?

nikomatsakis (Sep 06 2019 at 18:43, on Zulip):

in any case the key point is that the "identity" is tied to the initial output from the mir query -- the other, derived work (optimized form, etc) are all pure functions from that

nikomatsakis (Sep 06 2019 at 18:43, on Zulip):

because MIR encodes a "lower" form than HIR and such?

yes

nikomatsakis (Sep 06 2019 at 18:43, on Zulip):

that doesn't strike mas a very realistic proposition in most cases

nikomatsakis (Sep 06 2019 at 18:44, on Zulip):

but a hash may still be better just because uuid is a drag and non-reproducible

nikomatsakis (Sep 06 2019 at 18:44, on Zulip):

that doesn't strike mas a very realistic proposition in most cases

(as you suggested)

simulacrum (Sep 06 2019 at 18:44, on Zulip):

I think the derive work being pure from that is -- perhaps -- not true? And can't be true?

simulacrum (Sep 06 2019 at 18:45, on Zulip):

e.g. I would assume we depend on bits from HIR?

nikomatsakis (Sep 06 2019 at 18:45, on Zulip):

yeah so the problem is that they want to invoke other queries in some cases

nikomatsakis (Sep 06 2019 at 18:45, on Zulip):

e.g., inlining

nikomatsakis (Sep 06 2019 at 18:45, on Zulip):

and so the set of inputs that go into the mir box actually grows over time

nikomatsakis (Sep 06 2019 at 18:46, on Zulip):

actually, I want something very similar to this for salsa + chalk

nikomatsakis (Sep 06 2019 at 18:46, on Zulip):

well, for chalk (but I was thining about adding it to salsa)

nikomatsakis (Sep 06 2019 at 18:46, on Zulip):

that is, some mechanism for saying "I have a process that has advanced some of the way; it will resume later, and may grow more inputs as it goes"

nikomatsakis (Sep 06 2019 at 18:46, on Zulip):

that's encouraging, that there is more than one use

simulacrum (Sep 06 2019 at 18:46, on Zulip):

anyway, I need to run -- but it does sound like getting a more general structure here for sort of "long-term" queries or "stateful queries" is useful

nikomatsakis (Sep 06 2019 at 18:47, on Zulip):

yeah, it seems useful, though in the short term we could fix the immediate problem in the simpler way

nikomatsakis (Sep 06 2019 at 18:47, on Zulip):

I also should go

simulacrum (Sep 06 2019 at 18:47, on Zulip):

I can try and type up some of this in a shorter form in the doc when I get a chance, probably later tonight

simulacrum (Sep 06 2019 at 18:47, on Zulip):

(unless of course someone beats me to it)

nikomatsakis (Sep 06 2019 at 18:51, on Zulip):

no worries, I might stick with the predecessors issue then if no one else is hoping to pick it up

(@Paul Faria seems good)

Last update: Nov 17 2019 at 06:55UTC