Stream: t-compiler

Topic: `ReadOnlyBodyAndCache` ergonomics


ecstatic-morse (Mar 26 2020 at 21:18, on Zulip):

@Paul Faria Why do I need to a ReadOnlyBodyAndCache to visit an immutable MIR body?

Paul Faria (Mar 26 2020 at 21:32, on Zulip):

So there was work done for the parallelization effort to remove interior mutability since it would cause issues with parallelization. I had worked through this design with Oli (can't find his handle at the moment). Unfortunately the maintainable version required the cache and body to be independent, but still tracked together.

Paul Faria (Mar 26 2020 at 21:33, on Zulip):

The cache lazily computes some cases on read, which is why it affects immutable Mir bodies.

ecstatic-morse (Mar 26 2020 at 21:35, on Zulip):

At the moment, the cache holds only the predecessors of each basic block (reverse CFG). When would we need to access this when immutably visiting a MIR body?

ecstatic-morse (Mar 26 2020 at 21:37, on Zulip):

Also, ReadOnlyBodyAndCache can't recompute anything lazily, since it just has a &Cache.

Paul Faria (Mar 26 2020 at 21:39, on Zulip):

My mistake, I made the comment without reviewing the code. In that case there are some usages where the visitors read the data from the cache, but it's guaranteed to have already been precomputed. This is the PR where this went in for reference: https://github.com/rust-lang/rust/pull/64736

Paul Faria (Mar 26 2020 at 21:42, on Zulip):

I believe I tried making versions where the cache wasn't needed so &Body could be passed through, but there were so many interdependencies that I could never get it to compile. There were other version proposed that didn't require these explicits types, but they were much more fragile in the sense that a change in one part of the compiler could inadvertently cause a panic in another (I believe this was when we experimented with manually clearing and recomputing the cache at certain points during compilation).

Paul Faria (Mar 26 2020 at 21:42, on Zulip):

@oli (not sure why, but couldn't link you in the earlier comment on Mobile)

Paul Faria (Mar 26 2020 at 21:51, on Zulip):

If you're working on something I've got plenty of free time the next few days to pair if that would help.

ecstatic-morse (Mar 26 2020 at 22:01, on Zulip):

@Paul Faria Can you point me to a place in the code or a comment that illustrates whyvisit_body doesn't take &mir::Body? If we need to always keep a reference to the predecessor cache with the body, why do we still pass a plain mir::Body anywhere?

ecstatic-morse (Mar 26 2020 at 22:05, on Zulip):

My assumption was that ReadOnlyBodyAndCache would gradually replace &mir::Body throughout the codebase and then be renamed to BodyRef, since having to choose whether each function takes &mir::Body or ReadOnlyBodyAndCache is minorly annoying. Do you recall discussing this? Do you also find the status quo confusing or am I alone here?

Paul Faria (Mar 26 2020 at 22:40, on Zulip):

I agree with you that it is not easy to work with, there were easier proposals, but they made me very uncomfortable regarding the panics I mentioned above. I had some personal events occur that kept me from working on this for the last few months, so I'll have to review the code to see if/where there's a comment describing the behavior (I'll take care of that tomorrow since I'll be on and off tonight). I might have only changed Body to BodyAndCache where it was required simply because it touched so many pieces of the compiler, and I wanted to minimize the impact of the initial change.

Paul Faria (Mar 26 2020 at 22:41, on Zulip):

I definitely hadn't intended to stop there, but the personal events took up all of my spare time for the last few months. As of earlier this week that all opened up so I'll have lots of free time to continue the work on this.

ecstatic-morse (Mar 26 2020 at 22:45, on Zulip):

No worries. If you can't think of a reason not to, I'll have a go at changing the signature of visit_body.

ecstatic-morse (Mar 26 2020 at 22:47, on Zulip):

As for broader changes, there's no rush, I just wanted to know if there's a longer-term plan in place.

eddyb (Mar 26 2020 at 23:10, on Zulip):

@oli already tried that in the --bless mir-opt PR and I think it works

eddyb (Mar 26 2020 at 23:10, on Zulip):

but I had him use a TypeVisitor instead for more correctness and smaller diff

ecstatic-morse (Mar 26 2020 at 23:15, on Zulip):

See #70449, which does the bare minimum. I could change some of the surrounding function signatures back as well, but it's not a high priority.

Paul Faria (Mar 27 2020 at 13:22, on Zulip):

@ecstatic-morse I like the change. Looks like it will be easier to use visitors.

ecstatic-morse (Mar 29 2020 at 22:12, on Zulip):

So every time I create a ReadOnlyBodyAndCache, I have to compute the reverse CFG. This means that before #70449 every time I needed to call a Visitor impl, we had to compute the reverse CFG. Is this correct? This seems very wasteful.

ecstatic-morse (Mar 29 2020 at 22:15, on Zulip):

(#70449 doesn't actually remove any uses of ReadOnlyBodyAndCache, so there's no perf benefits, but someone could go in now and remove them)

Paul Faria (Mar 31 2020 at 19:53, on Zulip):

It just ensures that the cache is precomputed before you can create a ReadOnlyBodyAndCache. If it was already computed before, then it's a no-op.

Paul Faria (Mar 31 2020 at 19:54, on Zulip):

So if you create two ReadOnlyBodyAndCache from one BodyAndCache, the cache should only be computed once, not twice.

ecstatic-morse (Mar 31 2020 at 21:07, on Zulip):

Said another way, what percentage of the times when I'm forced to create a ReadOnlyBodyAndCache to call visit_bodydo I actually use it to look at predecessors.

ecstatic-morse (Mar 31 2020 at 23:03, on Zulip):

@oli @Paul Faria I'd like to revert #64736 in favor of a Mutex<Option<...>> to hold the predecessors cache. With the current solution, each function which doesn't need mutable access to a given Body must encode whether any of its callees plan to compute the reverse CFG into its type signature. This makes things like adding support for backward dataflow analyses to the current framework, which I am working on today, more difficult than it should be. Only backward dataflow analyses need access to the reverse CFG, but I am forced to either precompute the predecessor graph for all analyses (ReadOnlyBodyAndCache), or keep around a mutable reference to the body (&mut BodyAndCache) so I can compute it on demand, despite the fact that dataflow does not actually mutate the MIR.

ecstatic-morse (Mar 31 2020 at 23:07, on Zulip):

Invalidating the predecessor cache won't actually require taking the lock, since any operation that would invalidate the cache requires a unique reference (&mut) to the MIR anyway. That means we only actually lock when calling predecessors, and I don't think it's worth the current complexity to avoid a few uncontended mutex operations.

Paul Faria (Apr 01 2020 at 01:55, on Zulip):

The only other parts of the compiler that might be affected are in librustc_codegen_ssa and librustc_mir/borrow_check. I can't remember how it was all tied together at the moment, but do you think it would cause any issues with those parts of the compiler? It definitely sounds easier to use than what I came up with, and I think none of the stages will affect the same body simultaneously. I think @oli can speak best to that.

oli (Apr 01 2020 at 08:10, on Zulip):

Hmm... While I do like the typed approach we have right now, it is a lot of complexity...

oli (Apr 01 2020 at 08:12, on Zulip):

Another alternative would be to just move the cache back into the body and force run the predecessor computation at the end of optimized_mir

Paul Faria (Apr 01 2020 at 18:08, on Zulip):

@oli was that the version where we had to manually recompute the cache rather than it being computed on demand? I vaguely remember running into hidden panics because it wasn't always obvious where a part of the compiler might invalidate the cache. If so, I also remember worrying about future changes to the compiler possibly causing panics in areas of code unrelated to the changes. My concern was that would be even more complicated to debug because it wouldn't be obvious to anyone not familiar with the cache.

ecstatic-morse (Apr 02 2020 at 19:33, on Zulip):

@oli Thoughts? We still need a solution for computing predecessors during borrow checking and the optimization pipeline. It would be much easier if this could be done without requiring &mut Body in these places.

eddyb (Apr 02 2020 at 19:35, on Zulip):

I suggested something at some point but I forget where

eddyb (Apr 02 2020 at 19:35, on Zulip):

which would handle terminators differently

eddyb (Apr 02 2020 at 19:35, on Zulip):

and replacing a terminator would automatically adjust predecessors, incrementally

eddyb (Apr 02 2020 at 19:35, on Zulip):

i.e. make MIR a managed (control-flow) graph

eddyb (Apr 02 2020 at 19:35, on Zulip):

instead of doing full recomptations after invalidations

eddyb (Apr 02 2020 at 19:36, on Zulip):

you could never get a &mut Terminator

eddyb (Apr 02 2020 at 19:37, on Zulip):

but instead you'd have e.g. TerminatorMutRef<'a> instead of &'a mut Terminator or w/e, without being able to mutate the BasicBlock destinations

ecstatic-morse (Apr 02 2020 at 19:47, on Zulip):

That would be ideal. Since the predecessor graph would always be computed and up-to-date, &Body could be used everywhere. However, it's going to be a lot of work to audit all existing callers of basic_blocks_mut, and I would like to find a stopgap solution.

eddyb (Apr 02 2020 at 19:47, on Zulip):

well, you wouldn't audit, you'd just hammer them to fit the new API shape :P

eddyb (Apr 02 2020 at 19:48, on Zulip):

they don't have to cooperate to preserve correctness, they can be forced to use an API that does

ecstatic-morse (Apr 02 2020 at 19:49, on Zulip):

Replace audit with update then

Jonas Schievink (Apr 02 2020 at 19:52, on Zulip):

That sounds like work I might enjoy, if no one else gets to it

ecstatic-morse (Apr 11 2020 at 17:31, on Zulip):

I'm going to submit a PR replacing BodyAndCache with interior mutability on Body. Besides helping me finish backward dataflow, it will be easier to migrate to the incremental approach, which will also use only a single type.

eddyb (Apr 11 2020 at 18:13, on Zulip):

@ecstatic-morse so basically we're reverting the BodyAndCache approach?

eddyb (Apr 11 2020 at 18:14, on Zulip):

oh I see by incremental you mean the "managed CFG" approach

eddyb (Apr 11 2020 at 18:14, on Zulip):

where you never get direct mutable access to BasicBlock targets in a TerminatorKind

ecstatic-morse (Apr 11 2020 at 18:14, on Zulip):

Yeah, and by "incremental" I meant the approach you proposed

eddyb (Apr 11 2020 at 18:15, on Zulip):

why did we even go to BodyAndCache? the parallelism effort to remove locks?

ecstatic-morse (Apr 11 2020 at 18:16, on Zulip):

Correct. The choice was between Mutex<PredecessorCache> and the status quo.

eddyb (Apr 11 2020 at 18:16, on Zulip):

well, not Mutex

ecstatic-morse (Apr 11 2020 at 18:16, on Zulip):

Do we have a custom lock type?

eddyb (Apr 11 2020 at 18:16, on Zulip):

there are special types that are Cell/RefCell when parallelism isn't enabled

eddyb (Apr 11 2020 at 18:17, on Zulip):

everything uses those

eddyb (Apr 11 2020 at 18:17, on Zulip):

IIRC the name might be Lock for what's needed here

eddyb (Apr 11 2020 at 18:17, on Zulip):

it wouldn't hurt to name them CellOrAtomic, RefCellOrMutex and RefCellOrRwLock :P

eddyb (Apr 11 2020 at 18:18, on Zulip):

also Lrc would be RcOrArc

centril (Apr 11 2020 at 18:19, on Zulip):

@eddyb How about Arrc<T> :pirate: ? ;)

eddyb (Apr 11 2020 at 18:19, on Zulip):

... the point is to be clear about what it is :P

eddyb (Apr 11 2020 at 18:20, on Zulip):

IIRC the L in Lrc doesn't really mean anything

ecstatic-morse (Apr 11 2020 at 21:38, on Zulip):

Doing a perf run on #71044 now.

ecstatic-morse (Apr 11 2020 at 21:38, on Zulip):

I think perf uses the single-threaded compiler? So this will only give part of the picture.

Paul Faria (Apr 16 2020 at 12:54, on Zulip):

@ecstatic-morse any way I can help out? I started a new job recently, so I was unavailable the last few weeks. I should have time now, a couple hours before and after work (US EST timezone atm)

Paul Faria (Apr 16 2020 at 13:02, on Zulip):

If not I'll just help out rustc-dev-guide for the time being

ecstatic-morse (Apr 16 2020 at 16:09, on Zulip):

@Paul Faria #71044 is a small regression in performance. Do you recall whether the original PR that replaced the RefCell had any perf impact? #71044 is equivalent to the RefCell approach for the single-threaded compiler, which I'm pretty sure what is used by perf.rust-lang.org

ecstatic-morse (Apr 16 2020 at 16:11, on Zulip):

AFAICT, the regression is in metadata_register_crate, so it's encoding/decoding that's slow.

eddyb (Apr 16 2020 at 16:15, on Zulip):

@ecstatic-morse I think we don't (or used to not) encode caches and we don't (or used to not) compute them on decoding

ecstatic-morse (Apr 16 2020 at 16:16, on Zulip):

Yeah #71044 includes the old no-op impls for encoding/decoding: https://github.com/rust-lang/rust/pull/71044/commits/b3922699b9744f0b894d8309f8d107859d1cfa83#diff-f3e3c79bda8c6a5aa3cd4fa1f57c002cR47

ecstatic-morse (Apr 16 2020 at 16:21, on Zulip):

I also added some #[inline], which helped a bit. I just found an emit_unit impl that is used cross-crate but not #[inline]. I'll fix that.

eddyb (Apr 16 2020 at 16:21, on Zulip):

oh dear

eddyb (Apr 16 2020 at 16:21, on Zulip):

/me forgets about #[inline] all the time

ecstatic-morse (Apr 16 2020 at 16:29, on Zulip):

@eddyb These primitive impls would likely benefit from inlining as well right?
https://github.com/rust-lang/rust/blob/4e4d49d60fd696c4036d438292673a2d7fd34519/src/libserialize/serialize.rs#L391

eddyb (Apr 16 2020 at 16:29, on Zulip):

but those get inlined already, don't they?

eddyb (Apr 16 2020 at 16:30, on Zulip):

the methods are generic

ecstatic-morse (Apr 16 2020 at 16:30, on Zulip):

Ah yes. My bad

eddyb (Apr 16 2020 at 16:30, on Zulip):

it's only emit_foo in a non-generic impl that's the problem

ecstatic-morse (Apr 16 2020 at 16:35, on Zulip):

Oooh. I found a gem: https://github.com/rust-lang/rust/blob/master/src/librustc_middle/ty/query/on_disk_cache.rs#L910

ecstatic-morse (Apr 16 2020 at 16:36, on Zulip):
macro_rules! encoder_methods {
    ($($name:ident($ty:ty);)*) => {
        #[inline]
        $(fn $name(&mut self, value: $ty) -> Result<(), Self::Error> {
            self.encoder.$name(value)
        })*
    }
}
ecstatic-morse (Apr 16 2020 at 16:41, on Zulip):

at least the first one's fast XD

Paul Faria (Apr 17 2020 at 01:56, on Zulip):

ecstatic-morse said:

Paul Faria #71044 is a small regression in performance. Do you recall whether the original PR that replaced the RefCell had any perf impact? #71044 is equivalent to the RefCell approach for the single-threaded compiler, which I'm pretty sure what is used by perf.rust-lang.org

I remember doing a perf analysis with @Santiago Pastorino months ago and we saw that there was no major changes in perf. I don't remember if this was before or after I addressed all of the comments though. There could have been other changes since then that maybe have a different impact now? I remember wanting to go back and review which fns needed to be marked inline, but never had time to take care of that.

Paul Faria (Apr 17 2020 at 02:00, on Zulip):

We had run it, but the results aren't available anymore: https://github.com/rust-lang/rust/pull/64736#issuecomment-548308975 .

ecstatic-morse (Apr 17 2020 at 04:35, on Zulip):

Old perf runs are deleted from the server occasionally.

ecstatic-morse (Apr 20 2020 at 16:35, on Zulip):

Inlining emit_unit fixed the perf problem.

nikomatsakis (Apr 20 2020 at 22:24, on Zulip):

I have to say I'm not super psyched to see more mutexes back in, but I won't stand in the way for now

nikomatsakis (Apr 20 2020 at 22:25, on Zulip):

That said, I guess I should review the PR, I'd be happy if we can keep the mutex so that it only is held long enough to return an Arc<X> or something

nikomatsakis (Apr 20 2020 at 22:25, on Zulip):

as opposed to returning a mutex guard, which seems like it opens the door to deadlocks if you are manipulating two MIRs at once (granted, we rarely do that, and probably never access preds when we do, but still)

nikomatsakis (Apr 20 2020 at 22:25, on Zulip):

@ecstatic-morse :point_up:

ecstatic-morse (Apr 20 2020 at 22:46, on Zulip):

nikomatsakis said:

That said, I guess I should review the PR, I'd be happy if we can keep the mutex so that it only is held long enough to return an Arc<X> or something

Since you can only mutate the MIR via unique ownership (&mut), I believe we could use an AtomicPtr<PredecessorCache> instead of a Lock<PredecessorCache>. Because the ability to invalidate the predecessor cache requires unique ownership of the AtomicPtr as well, the invalidator knows that they are responsible for reclaiming the predecessor cache. Those with shared ownership would compute the predecessor cache and then compare and swap the result into the AtomicPtr.

ecstatic-morse (Apr 20 2020 at 22:50, on Zulip):

If a shared owner of the MIR loses a compare-and-swap, they throw away their copy of the predecessor cache and load the one that won the race.

ecstatic-morse (Apr 20 2020 at 22:51, on Zulip):

This is similar to RCU, except that we always know who is responsible for freeing the cache. There's no need for hazard pointers or epoch-based reclamation.

ecstatic-morse (Apr 20 2020 at 23:01, on Zulip):

@nikomatsakis ^

ecstatic-morse (Apr 20 2020 at 23:39, on Zulip):

I guess the single-threaded version of this is Cell<Option<PredecessorCache>> with Cell::set/Cell::get_mut.

ecstatic-morse (Apr 20 2020 at 23:47, on Zulip):

Err, no. That doesn't quite work. You need to be able to go from a shared reference to the Cell to a shared reference to the predecessor cache.

nikomatsakis (Apr 21 2020 at 15:58, on Zulip):

@ecstatic-morse sure, that's kind of a micro-optimization though. THe main thing I wanted was a Mutex<Arc<PredecessorData>> and an API like

fn predecessor_map(&self) -> Arc<PredecessorData> {
    /* lazilly compute and return */
}
nikomatsakis (Apr 21 2020 at 15:58, on Zulip):

it would also be ok to use a CAS, but I'm not very concerned

nikomatsakis (Apr 21 2020 at 15:59, on Zulip):

I am surprised that the mutator always has &mut self, maybe I don't understand what's going on

nikomatsakis (Apr 21 2020 at 15:59, on Zulip):

the main thing I do not want is

nikomatsakis (Apr 21 2020 at 15:59, on Zulip):
fn predecessor_map(&self) -> ReadGuard<'_, PredecessorData> { .. }
nikomatsakis (Apr 21 2020 at 16:00, on Zulip):

I mean I guess it's not really possible to deadlock, since computing one set of predecessors can't in turn compute another set of predecessors

nikomatsakis (Apr 21 2020 at 16:00, on Zulip):

but I still prefer the Arc setup because it's sort of obvious that it's fine...

nikomatsakis (Apr 21 2020 at 16:01, on Zulip):

all that said, the biggest thing I want is clear docs on every mutex explaining

  1. Why it's there, who locks it and why
  2. Why it won't lead to deadlocks

and further that every mutex is in a private field where one audit the accesses with relative ease

ecstatic-morse (Apr 21 2020 at 16:13, on Zulip):

That's fine too. Do you want me to r- #71044? I would prefer to let it land and then switch to Mutex<Arc<PredecessorCache>> in a follow-up PR since #71044 is liable to bitrot. AFAIK, we never actually access the predecessor cache for a MIR body concurrently from different threads, so deadlock won't be an issue in the interim.

ecstatic-morse (Apr 21 2020 at 16:31, on Zulip):

(deleted)

nikomatsakis (Apr 21 2020 at 16:56, on Zulip):

@ecstatic-morse I am fine with follow-up PR

nikomatsakis (Apr 21 2020 at 16:57, on Zulip):

and yeah it's not that I think there is an actual bug

nikomatsakis (Apr 21 2020 at 16:57, on Zulip):

it's that, in terms of the parallel work (which is sorta stalled, but yeah) we were going for "guarantee that things are documented and as easily verified as possible"

nikomatsakis (Apr 21 2020 at 16:57, on Zulip):

if you want to open an issue to track the refactoring that might be nice

ecstatic-morse (Apr 21 2020 at 16:58, on Zulip):

For sure. I'm in full agreement with eliminating all potential sources of deadlock.

ecstatic-morse (Apr 21 2020 at 16:58, on Zulip):

I'm working on the follow-up now XD, so an issue won't be necessary unless you want one for other reasons.

ecstatic-morse (Apr 21 2020 at 17:39, on Zulip):

#71392

Last update: Jun 04 2020 at 18:40UTC