Stream: t-compiler/wg-nll

Topic: compress-the-cfg (polonius#20)


nikomatsakis (May 28 2018 at 11:21, on Zulip):

@Julien Cretin so — the idea there is to compress the points in some way — @Frank McSherry pointed out that we could probably look for edges P -> Q where the set of live regions is the same for both P and Q. That's a good starting point, though I think we'd have to do better.

Julien Cretin (ia0) (May 28 2018 at 11:22, on Zulip):

Ok, I'll start with that. Should I create a new algorithm or edit an existing one?

nikomatsakis (May 28 2018 at 11:22, on Zulip):

I thnk that to start you should make a new algorithm "forked" from the naive

nikomatsakis (May 28 2018 at 11:22, on Zulip):

it's a lot smaller and easier to wrap your head around

nikomatsakis (May 28 2018 at 11:22, on Zulip):

we can port the concept to the optimized one

Julien Cretin (ia0) (May 28 2018 at 11:23, on Zulip):

ok

Julien Cretin (ia0) (May 28 2018 at 11:30, on Zulip):

FYI: My current status is reading the naive algorithm

Julien Cretin (ia0) (May 28 2018 at 12:11, on Zulip):

Progress checkpoint: We are going to write a function condense_cfg(all_facts: AllFacts<Region, Loan, Point>) -> AllFacts<Region, Loan, Point> that we're going to call before naive::compute.
This function would "merge" successor points into predecessor points according to the criteria defined in polonius#20 (and "fixing" the facts that mention the removed successor point)

lqd (May 28 2018 at 12:15, on Zulip):

(Sorry on my phone — I think the exact algo from the issue won’t have much impact on clap; the last criteria has no matches in the dataset IIRC. Compressing is definitely the next big speedup for sure; just FYI wrt these exact rules)

lqd (May 28 2018 at 12:39, on Zulip):

(I think 15 out the 60-61 cfg edges from the “issue” dataset have no impact on the result)

nikomatsakis (May 28 2018 at 12:49, on Zulip):

(I agree the precise criteria in the issue are prob not the right ones; not 100% sure what the right ones are — but experimenting a bit seems good)

Julien Cretin (ia0) (May 28 2018 at 12:58, on Zulip):

inputs/issue-47680/nll-facts/main has 37 nodes to be merged according to the polonius#20 criteria. I'll experiment on this one. I haven't implemented the merge yet, just the filtering of nodes to be merged.

nikomatsakis (May 28 2018 at 13:20, on Zulip):

I guess there are two mildly separable concerns — which nodes to merge, and how to adjust the rules to reflect the merging of nodes

nikomatsakis (May 28 2018 at 13:21, on Zulip):

I think probably two points which have the same live regions can be merged so long as there is no Invalidate at either of them

nikomatsakis (May 28 2018 at 13:21, on Zulip):

well, sorry, and there is only one outgoing cfg edge from the prior point

nikomatsakis (May 28 2018 at 13:21, on Zulip):

(basically, points within a basic block)

nikomatsakis (May 28 2018 at 13:21, on Zulip):

but I might be missing something

lqd (May 28 2018 at 13:44, on Zulip):

(A bunch of the Start to Mid edges looked very filterable)

nikomatsakis (May 28 2018 at 13:47, on Zulip):

yes certainly those; I expect the majority mid-to-start edges can also be filtered

nikomatsakis (May 28 2018 at 13:47, on Zulip):

but we have to figure out just the right rules I guess, not sure if what I wrote is in fact correct

nikomatsakis (May 28 2018 at 13:48, on Zulip):

I suppose this might be a case where #19 could help

nikomatsakis (May 28 2018 at 13:48, on Zulip):

in particular, there are some patterns in the way we generate facts that may not be evident from the current setup I think

lqd (May 28 2018 at 13:48, on Zulip):

agreed

nikomatsakis (May 28 2018 at 13:48, on Zulip):

though I'm trying to think of some concrete examples

nikomatsakis (May 28 2018 at 13:48, on Zulip):

presuming that the set of live regions at each point is the same....

nikomatsakis (May 28 2018 at 13:49, on Zulip):

I suppose the danger would be something where you setup half a path in one opint

nikomatsakis (May 28 2018 at 13:49, on Zulip):

and the other half in the other

nikomatsakis (May 28 2018 at 13:49, on Zulip):

but they are never connected

nikomatsakis (May 28 2018 at 13:49, on Zulip):

e.g., say that A and B are live at P and Q (and only A and B), and you have outlives facts like:

outlives(A, TMP1, P)
outlives(TMP1, B, Q)
nikomatsakis (May 28 2018 at 13:50, on Zulip):

but I .. sort of doubt we ever do that

nikomatsakis (May 28 2018 at 13:50, on Zulip):

(I guess we could merge points where the regions in outlives relations are disjoint or something, though computing that takes some work)

nikomatsakis (May 28 2018 at 13:51, on Zulip):

e.g., say that A and B are live at P and Q (and only A and B), and you have outlives facts like:

to spool this out a bit more:

If we had this scenario, under the naive rules, we would not find that A: B, because transitioning from A to B that outlives relation would be dropped. But if we merged P and Q, we might, because we would have outlives(A, TMP1) and outlives(TMP1, B) at the same (merged) point.

nikomatsakis (May 28 2018 at 13:52, on Zulip):

in any case I can imagine various ways to handle that

lqd (May 28 2018 at 13:52, on Zulip):

frank has interesting « equiv/InEquiv edges/points » rules I had implemented in a datafrog computation — how to produce the expected borrow_live_at + errors we needed was a bit more complex :)

nikomatsakis (May 28 2018 at 13:52, on Zulip):

yeah, I should look at those in more depth

lqd (May 28 2018 at 13:53, on Zulip):

:thumbs_up: (but should also enjoy the holiday ;)

nikomatsakis (May 28 2018 at 13:54, on Zulip):

@lqd can you send a link? :)

lqd (May 28 2018 at 13:55, on Zulip):

I can try (phone + subway ;)

lqd (May 28 2018 at 13:59, on Zulip):

The mobile app doesnt really havie links but Franks message was : «  E.g. we could put together a relation Equiv(P,Q) which is CFG(P,Q) minus any pairs for which RLA(R,P), !RLA(R,Q) or any Killed(R,P) or Killed(R,Q). » if Search works. He had exact datalog rules a bit later; I coded them and gisted them on Zulip recently — I’ll now look for them

lqd (May 28 2018 at 14:06, on Zulip):

https://gist.github.com/lqd/743abae3913bc3785a955b749289ca03

Julien Cretin (ia0) (May 28 2018 at 14:22, on Zulip):

https://github.com/ia0/polonius/compare/master...ia0:condense_cfg <= current implementation, I'll try to look at small examples manually and compare with the naive algorithm

Janito Vaqueiro Ferreira Filho (May 28 2018 at 14:34, on Zulip):

Hello! Here's my first attempt: https://github.com/jvff/polonius/compare/master...jvff:condense-cfg

nikomatsakis (May 28 2018 at 14:36, on Zulip):

@Julien Cretin @Janito Vaqueiro Ferreira Filho (are you two working together or are these competing sets of changes?)

Julien Cretin (ia0) (May 28 2018 at 14:36, on Zulip):

Those are "competing" set of changes

nikomatsakis (May 28 2018 at 14:36, on Zulip):

ok

nikomatsakis (May 28 2018 at 14:37, on Zulip):

seems ok, we can take the "best of both", just wanted to know whether to look at them both

Janito Vaqueiro Ferreira Filho (May 28 2018 at 14:37, on Zulip):

We're three people doing 3 different implementations, but we're mostly learning about the code and learning with each other

emilsp (May 28 2018 at 14:52, on Zulip):

I too am working on a competing solution. But I'm pretty far behind at this point in time.

nikomatsakis (May 28 2018 at 15:04, on Zulip):

/me looks at these branches (sorry for delay)

nikomatsakis (May 28 2018 at 15:08, on Zulip):

ok, so, @Julien Cretin, i'm curious if you got back any stats on how often those changes apply

nikomatsakis (May 28 2018 at 15:08, on Zulip):

if I understood your branch, I guess it does a kind of "pre-pass" to compress the facts?

nikomatsakis (May 28 2018 at 15:09, on Zulip):

one thing I like about this approach: if we rework the AllFacts with custom code, we can then feed them into the naive and optimized algorithms and measure the difference in speed — it gives us a kind of "lower bound" on how effective this can be

nikomatsakis (May 28 2018 at 15:10, on Zulip):

I suspect longer term we would want to code up the compression in :frog: or else maybe even do it on the rustc side, not entirely clear, depends on what precise conditions we wind up with (I suspect we may find that some conditions hold sort of "structurally" much of the time and that rustc could enforce them quite cheaply)

nikomatsakis (May 28 2018 at 15:12, on Zulip):

@Janito Vaqueiro Ferreira Filho looking at your branch, which I think is taking a similar approach, similar comments apply — though I am a bit confused by the merge_edge routine, which seems to basically just delete things relating to the second point, right? (not so much "merging" then...)

Janito Vaqueiro Ferreira Filho (May 28 2018 at 15:16, on Zulip):

Yes, true. It merges the cfg_edge vector though. But the code needs some improvements, and I think tomorrow I'll start playing with the edge_is_mergeable function to see what rules can be applied.

Janito Vaqueiro Ferreira Filho (May 28 2018 at 15:18, on Zulip):

We're wrapping up for today, but we'll be back here tomorrow

nikomatsakis (May 28 2018 at 15:21, on Zulip):

great :)

nikomatsakis (May 28 2018 at 15:21, on Zulip):

I'll try to wake up early tomorrow to overlap with CET as much as possible

lqd (May 29 2018 at 09:37, on Zulip):

another data point: 2870 cfg edges out of the 51896 have no impact on the borrow_live_at results of clap, and most of those are goto edges between blocks. (instead of trying to think about the rules I bruteforced the checks while I was away, here are those results) -- I'll check later today whether filtering those 5% out has interesting performance benefits, if anyone doesn't get to it first :) (maybe we can also easily reverse engineer the rules from the data)

nikomatsakis (May 29 2018 at 09:51, on Zulip):

I have not had time to pursue https://github.com/rust-lang/rust/pull/50938 any further, but it seems related. In particular, it removes a lot of "duplicate" outlives relations — strangely, when I tested polonius with it, it seemed to make things go slower. :/ Not sure why that would be yet!

nikomatsakis (May 29 2018 at 09:52, on Zulip):

that said, we got to land the rustc integration — it was showing some problems with polonius (actually related to #24), in terms of failing tests. This may indicate a problem with the optimized variant.

nikomatsakis (May 29 2018 at 09:52, on Zulip):

/me goes back to his coffee

lqd (May 29 2018 at 09:54, on Zulip):

oh yes I remember we talked about Location:All temporarily creating outlives facts -- the slowdown is interesting indeed. So there are at least 2 directions to investigate, both the outlives and cfg_edge relations

lqd (May 29 2018 at 10:02, on Zulip):

I guess a problem located solely in the optimized variant would be surprising, since it has the same output as the naive variant (except if we're talking errors ofc)

nikomatsakis (May 29 2018 at 10:19, on Zulip):

I guess a problem located solely in the optimized variant would be surprising, since it has the same output as the naive variant

on our tests so far :)

(except if we're talking errors ofc)

I would expect the result to be the same for errors =)

lqd (May 29 2018 at 10:20, on Zulip):

haha yes :)

lqd (May 29 2018 at 10:22, on Zulip):

as I mentioned to Felix, it's also why I really wanted to add the frontend to be able to have more diverse tests on our side :) eg these potentially diverging ones that are only in rustc at the moment

nikomatsakis (May 29 2018 at 10:24, on Zulip):

yes my expectation is that once we have integration going we will add more tests as we find interesting examples

nikomatsakis (May 29 2018 at 10:25, on Zulip):

(Note: I have no evidence that the problem is the opt diverging from naive, just saying it's possible)

nikomatsakis (May 29 2018 at 10:26, on Zulip):

anyway, gotta run for a bit — I should add that I've seen these failures before, they are caused by the return type not being properly enforced everywhere...

lqd (May 29 2018 at 10:26, on Zulip):

sure :)

Julien Cretin (ia0) (May 29 2018 at 12:26, on Zulip):

Shouldn't the region_live_at relation be closed under the subset relation? I mean adding the following rule:
region_live_at(R2, P) :- region_live_at(R1, P), subset(R1, R2, P).
Or is this somehow guaranteed in the nll-facts input?

nikomatsakis (May 29 2018 at 12:28, on Zulip):

that is not necessarily true, no

nikomatsakis (May 29 2018 at 12:29, on Zulip):

region-live-at implies that a reference whose type includes that region will be used

nikomatsakis (May 29 2018 at 12:29, on Zulip):

you can think of regions as sets of loans

nikomatsakis (May 29 2018 at 12:30, on Zulip):

and it is true that loans "flow through" subset in this way

nikomatsakis (May 29 2018 at 12:30, on Zulip):

(that is, if you have L in R1 and R1 <= R2 and region_live_at(R2) at a given point, then borrow_live_at(L))

Julien Cretin (ia0) (May 29 2018 at 12:36, on Zulip):

Ok I see, thanks!

Julien Cretin (ia0) (May 29 2018 at 14:52, on Zulip):

# Status
Added a --condense-cfg flag in https://github.com/ia0/polonius/compare/master...ia0:condense_cfg.
Added the example A of the blog in https://github.com/ia0/polonius/compare/master...ia0:blog_example.
Print errors for naive algorithm in https://github.com/ia0/polonius/compare/master...ia0:naive_errors.
# Observations
The set of errors is preserved for inputs/{{blog,issue-47680}/nll-facts/main,clap-rs/app-parser-\{\{impl\}\}-add_defaults}.
Only issue-47680 has merged nodes.
The clap-rs example is slightly slower when condensing the CFG although the facts are equivalent (only the order changes).

Last update: Nov 22 2019 at 00:05UTC