Stream: t-compiler/help

Topic: dependent dataflow analyses


tmandry (Jun 14 2019 at 21:19, on Zulip):

Within a dataflow analysis, what's the cleanest way to depend on the results of another dataflow pass?

Right now I'm calling state_for_location at each step, which makes it O(N^2) in the number of statements in a block

tmandry (Jun 14 2019 at 21:20, on Zulip):

I also need to be able to get the results of both analyses out (right now I'm computing the first and then borrowing it while I compute the second)

tmandry (Jun 14 2019 at 21:24, on Zulip):

I could start calling the statement_effect callbacks myself, but that seems like it could be brittle..

tmandry (Jun 14 2019 at 21:25, on Zulip):

cc @pnkfelix

tmandry (Jun 14 2019 at 21:27, on Zulip):

Ideally I'd also get the actual effects at each statement, not just the resulting state

tmandry (Jun 14 2019 at 21:51, on Zulip):

here's the code for what I'm doing now

tmandry (Jun 14 2019 at 21:54, on Zulip):

Right now, the most efficient way seems to be to keep track of my own BlockSets and update them during every callback in BitDenotation ("unroll" the implementation of state_for_location into my type, essentially)

tmandry (Jun 14 2019 at 21:55, on Zulip):

I'm not sure how comfortable people will be with this though

tmandry (Jun 14 2019 at 21:56, on Zulip):

cc @nikomatsakis since you're also listed as a dataflow expert :)

nikomatsakis (Jun 14 2019 at 21:56, on Zulip):

Hmm

nikomatsakis (Jun 14 2019 at 21:57, on Zulip):

This reminds me that I wanted to sync up with you on the work you've been doing just to get an overview and try to get up to speed

nikomatsakis (Jun 14 2019 at 21:57, on Zulip):

Think you'd be game to schedule some time to do that?

tmandry (Jun 14 2019 at 21:57, on Zulip):

Yeah, sure!

nikomatsakis (Jun 14 2019 at 21:57, on Zulip):

Also that we should really start converting this code to datafrog :P

tmandry (Jun 14 2019 at 21:58, on Zulip):

(re: scheduling, I'm free Mon/Tue afternoon PST, or Wed morning)

tmandry (Jun 14 2019 at 21:59, on Zulip):

Also that we should really start converting this code to datafrog :P

I didn't realize this was the official plan :)

nikomatsakis (Jun 14 2019 at 21:59, on Zulip):

I guess it'd prob be best to sched for the week after next, owing to Moz all hands

nikomatsakis (Jun 14 2019 at 21:59, on Zulip):

I didn't realize this was the official plan :)

"official"... well, it's my plan

nikomatsakis (Jun 14 2019 at 22:00, on Zulip):

(in all seriousness, having dependent analyses is pretty easy in that case)

tmandry (Jun 14 2019 at 22:00, on Zulip):

I guess it'd prob be best to sched for the week after next, owing to Moz all hands

same schedule then

nikomatsakis (Jun 14 2019 at 22:01, on Zulip):

that said, a possibly shorter term plan might be to do a "cursor" sort of API -- I think state_for_location kind of repeats all the work each time

nikomatsakis (Jun 14 2019 at 22:01, on Zulip):

but presumably you are walking through the MIR in a certain order

nikomatsakis (Jun 14 2019 at 22:01, on Zulip):

the idea would be to have an API that lets you reposition the cursor arbitrarily

nikomatsakis (Jun 14 2019 at 22:01, on Zulip):

but to optimize for sequentially accessing statements within the same block

tmandry (Jun 14 2019 at 22:02, on Zulip):

ah, yeah, that sounds good

nikomatsakis (Jun 14 2019 at 22:02, on Zulip):

but I'm not sure quite how difficult that would be; it's something @eddyb suggested some time back and I thought it was clever

tmandry (Jun 14 2019 at 22:02, on Zulip):

I don't _think_ it would be difficult

nikomatsakis (Jun 14 2019 at 22:02, on Zulip):

I'm trying to page this back in -- I feel like @pnkfelix did think about the idea of dependent analyses a bit so let me go scout a bit to see if I see anything promising :)

nikomatsakis (Jun 14 2019 at 22:02, on Zulip):

I don't _think_ it would be difficult

doesn't seem that hard indeed

tmandry (Jun 14 2019 at 22:03, on Zulip):

I could probably hack it together as part of my PR

tmandry (Jun 14 2019 at 22:03, on Zulip):

implement it* I mean :)

nikomatsakis (Jun 14 2019 at 22:03, on Zulip):

I'm basically trying to remember if there already exists any such cases

nikomatsakis (Jun 14 2019 at 22:05, on Zulip):

it could probably be done just by hacking on FlowAtLocation, @tmandry

nikomatsakis (Jun 14 2019 at 22:05, on Zulip):

(like it's not that far from what's there already)

tmandry (Jun 14 2019 at 22:07, on Zulip):

ah yes, that seems promising

nikomatsakis (Jun 14 2019 at 22:08, on Zulip):

I don't see anything doing quite what you're doing

nikomatsakis (Jun 14 2019 at 22:08, on Zulip):

so I think that's my best suggestion

nikomatsakis (Jun 14 2019 at 22:09, on Zulip):

besides porting everything to datafrog

tmandry (Jun 14 2019 at 22:09, on Zulip):

ok. I think a light wrapper around FlowAtLocation that remembers its last location sounds best, probably

tmandry (Jun 14 2019 at 22:10, on Zulip):

rather than having two ways of using the same object

tmandry (Jun 14 2019 at 22:10, on Zulip):

thanks for the pointer @nikomatsakis

tmandry (Jun 14 2019 at 22:15, on Zulip):

conceptually this makes sense... DataflowResultsConsumer is the main "driver" for FlowAtLocation today, whereas if you're being driven by something else, you need a DataflowResultsCursor instead

tmandry (Jun 14 2019 at 22:26, on Zulip):

(actually, maybe FlowAtLocation should just do this)

nagisa (Jun 15 2019 at 11:19, on Zulip):

@tmandry the cleanest way to write dependent dataflow passes I have found so far is to interleave them

nagisa (Jun 15 2019 at 11:21, on Zulip):

However the traditional dataflow implementations don’t usually allow for that easily.

nagisa (Jun 15 2019 at 11:21, on Zulip):

if you’re interested in that despite the caveats, feel free to ping me and we can chat in more detail.

eddyb (Jun 16 2019 at 13:47, on Zulip):

the idea would be to have an API that lets you reposition the cursor arbitrarily, but to optimize for sequentially accessing statements within the same block

@nikomatsakis @tmandry I've not just suggested it but also implemented it, for a different dataflow system (a bidirectional one :P)

eddyb (Jun 16 2019 at 13:47, on Zulip):

or I should say, "symmetrical"

eddyb (Jun 16 2019 at 13:49, on Zulip):

ah dammit I keep finding and linking to #46321 when I mean #47954

eddyb (Jun 16 2019 at 13:55, on Zulip):

anyway this is what I had https://github.com/rust-lang/rust/pull/47954/commits/ed395ca5dfffff13d4f52f8b2c831e0d8fa294a3#diff-dfa62bf4fa7aefc62a25f618c203f921R302

eddyb (Jun 16 2019 at 13:55, on Zulip):

I called it an "observer"

eddyb (Jun 16 2019 at 14:10, on Zulip):

which I guess fit more with "eventflow" than regular dataflow

nagisa (Jun 16 2019 at 15:05, on Zulip):

Doesn’t that require maintaining extra data? I can imagine situations where it isn’t always possible to undo changes to factset which occurred as part of a transfer function over a statement.

eddyb (Jun 16 2019 at 15:19, on Zulip):

@nagisa while I maintained extra data, it could reset from the start of the block if not moved forward

nagisa (Jun 16 2019 at 15:20, on Zulip):

I see… so it is still O(n^2) with some caching to mitigate that

eddyb (Jun 16 2019 at 15:20, on Zulip):

normal usage would not be quadratic, it's more like a fallback

tmandry (Jun 17 2019 at 20:59, on Zulip):

@eddyb yeah I ended up writing something similar, just not bidirectional

tmandry (Jun 17 2019 at 22:27, on Zulip):

@nagisa by interleave, do you mean computing them simultaneously?

nagisa (Jun 17 2019 at 23:17, on Zulip):

@tmandry yes. This is done by composing the transfer functions of the different analyses you’re interested in and running them all for a single statement before proceeding to the next one.

nagisa (Jun 17 2019 at 23:18, on Zulip):

This way any transfer function can gain access (if you want them to) to any intermediate factset that gets generated by any other analysis as well and this way you also get optimal results for any given combination of dataflow analyses.

tmandry (Jun 17 2019 at 23:19, on Zulip):

@nagisa makes sense, but as you say it's not very convenient in the current framework (if you need to get the individual passes back out)

nagisa (Jun 17 2019 at 23:20, on Zulip):

yeah, it makes the dataflow loop a tight knot between all the involved analyses..

tmandry (Jun 17 2019 at 23:21, on Zulip):

Actually, it still might not work in my case. I need to know the final state of a different pass at each point to compute my pass (not just the gen/kill set for individual statements)

tmandry (Jun 17 2019 at 23:21, on Zulip):

or at least, I think I do :)

nagisa (Jun 17 2019 at 23:22, on Zulip):

That is fine, the dataflow analysis keeps running until a fixed point is reached, so you will re-visit a certain statement if there is new information known about it after it gets visited a first/2nd/3rd time etc.

nagisa (Jun 17 2019 at 23:22, on Zulip):

the only thing you won’t know is whether any particular visitation is the last one.

tmandry (Jun 17 2019 at 23:23, on Zulip):

ah okay, that should work then

nagisa (Jun 17 2019 at 23:24, on Zulip):

I will try to find a paper which describes this technique

nagisa (Jun 17 2019 at 23:24, on Zulip):

http://research.microsoft.com/en-us/um/people/simonpj/Papers/c--/hoopl-haskell10.pdf describes it.

nagisa (Jun 17 2019 at 23:25, on Zulip):

It also has motivating examples for this specific approach over the traditional dataflow analysis implementations.

pnkfelix (Jun 20 2019 at 21:56, on Zulip):

I'm trying to page this back in -- I feel like pnkfelix did think about the idea of dependent analyses a bit so let me go scout a bit to see if I see anything promising :)

finally getting back to this

pnkfelix (Jun 20 2019 at 21:56, on Zulip):

the main example of a dependent analysis that I had worked on

pnkfelix (Jun 20 2019 at 21:56, on Zulip):

was the hacked up version of two-phase borrows

pnkfelix (Jun 20 2019 at 21:56, on Zulip):

that I did in PR #46537

pnkfelix (Jun 20 2019 at 21:57, on Zulip):

it has since been thrown away

pnkfelix (Jun 20 2019 at 21:57, on Zulip):

but you can see at least my comments documenting what I did here

pnkfelix (Jun 20 2019 at 21:59, on Zulip):

however I would certainly not say that I was happy with this design

pnkfelix (Jun 20 2019 at 21:59, on Zulip):

and I will probably take a look at the hoopl paper that @nagisa linked above

pnkfelix (Jun 20 2019 at 22:02, on Zulip):

however I would certainly not say that I was happy with this design

in fact, reviewing the PR now, I am reminded of all the terrible kludges that were in it. For example, If i recall correctly, I had a bunch of ugly stuff, like making the bitvector twice as big as you'd expect, so that it would store the intermediate values for both of the two analyses in a single bitvector.

Last update: Nov 11 2019 at 22:00UTC