Stream: t-compiler/wg-polonius

Topic: notes on datafrog


lqd (Sep 24 2019 at 19:30, on Zulip):

I've mentioned in the list from a while ago, that there was possible work to be done in datafrog: I've looked at a couple of easy ones recently and will explain this here

lqd (Sep 24 2019 at 19:32, on Zulip):

the 1st thing was the idea of more specialized Relations, that is, right now they're all Vec of tuples

lqd (Sep 24 2019 at 19:32, on Zulip):

very general, always works, :thumbs_up:

lqd (Sep 24 2019 at 19:33, on Zulip):

in some cases, they might consume too much memory, or we're not using the flexibility

lqd (Sep 24 2019 at 19:34, on Zulip):

example: killed we're propagating loans in origins along the CFG if the loan is not killed at a specific point

lqd (Sep 24 2019 at 19:34, on Zulip):

we're never using this relation to produce tuples

lqd (Sep 24 2019 at 19:35, on Zulip):

thus, we don't really need the data, we only need to know if a specific tuple is absent from this relation

lqd (Sep 24 2019 at 19:36, on Zulip):

for this use-case there could be a specific type of Relation, only interested in the absence of data, for example using a probabilistic data structure like a Bloom filter

lqd (Sep 24 2019 at 19:38, on Zulip):

the killed relation is usually pretty small I think, esp compared to the others, but clap had the biggest one at 1000 in our dataset (and I haven't looked at dumping p.rl.o facts yet, but @Albin Stjerna has some of those benchmarks and then some, so maybe they'll have confirmation or infirmation on this) so the memory saving would be "small" in the absolute (but big in the relative reduction) for relatively the same query performance, at this size

lqd (Sep 24 2019 at 19:39, on Zulip):

after this filter_anti Leaper specialized case, there's also the "filtering" case

lqd (Sep 24 2019 at 19:41, on Zulip):

extend_with is also somehow used for filtering, because filter_with is for semijoins with the source Relation

lqd (Sep 24 2019 at 19:43, on Zulip):

but it's made for proposing as well, so there's some work done in count to prepare for that (here) again it's not much but there are more counts than intersections in the filtering case so this could be done in intersect instead when filtering

lqd (Sep 24 2019 at 19:44, on Zulip):

in the same idea, we're not using region_live_at to produce tuples either

lqd (Sep 24 2019 at 19:44, on Zulip):

therefore another Relation dedicated to filtering could be tried

lqd (Sep 24 2019 at 19:46, on Zulip):

and there's a single index

lqd (Sep 24 2019 at 19:48, on Zulip):

just to have an example, what I've tried here was just an fx map of fx sets — the construction is slower, and the Relation is quite big, but it does make a relative difference on clap (which is the only benchmark I had tried in all these tests)

lqd (Sep 24 2019 at 19:51, on Zulip):

setting up the data shouldn't be much of a concern, esp compared to what we do right now, where rustc builds Vecs, which are cloned a bit, sorted and deduplicated, we materialize the free regions everywhere on the CFG, compute liveness and some things about initialization (which I assume is related to liveness as well), sometimes multiples times etc :) but we'll get there

lqd (Sep 24 2019 at 19:54, on Zulip):

ideally we should be getting the Relations, all ready to go, in the variants — but ofc there's slight caveat that Variables take ownership of them so this can probably only partially be done now; at least to fix the easy issue of the Hybrid variant

lqd (Sep 24 2019 at 19:55, on Zulip):

on another theme, all the Relations, Variables, Leapers etc are very generic in datafrog, and generics + a crate depenency = no inlining opportunities; so that's an easy PR right there :)

lqd (Sep 24 2019 at 19:57, on Zulip):

and lastly for now, is the WF-ness of leapjoins: this assert bakes the fact that a leapjoin should produce some tuples

lqd (Sep 24 2019 at 19:58, on Zulip):

however

errors(B, P) :-
   requires(R, B, P),
   invalidates(B, P),
   region_live_at(R, P).

is only filtering requires

lqd (Sep 24 2019 at 19:59, on Zulip):

relaxing this constraint wouldn't by itself make a lot of things faster, it's just the regular joins, however, it could allow removing some indexes

lqd (Sep 24 2019 at 20:00, on Zulip):

and maintaining those, especially with bigger relations like 2 out of the 3 here, can matter

lqd (Sep 24 2019 at 20:01, on Zulip):

there might be a small Leaper (like the ValueFilter/PrefixFilter) we could add to just pass through the source tuples, so that it feels like tuples are produced even though they are just going down the chain to be filtered by the next 2 leapers

lqd (Sep 24 2019 at 20:04, on Zulip):

I haven't done anything for this last point, while for the others I have experiments here and there (which I accidentally depended on in the previous equality-tracking prototype) so if anyone is interested, this could be a thing to look at

lqd (Sep 24 2019 at 20:06, on Zulip):

(also another easy one, in moving from the original leapjoin API of an array of Leapers, to the tuple of Leapers datafrog uses now, we have "lost" the ability to have a leapjoin with a single extend_with leaper — again, not really efficiency-critical except maybe removing indexes, and that should be a one-line PR)

Albin Stjerna (Sep 24 2019 at 20:30, on Zulip):

the killed relation is usually pretty small I think, esp compared to the others, but clap had the biggest one at 1000 in our dataset (and I haven't looked at dumping p.rl.o facts yet, but Albin Stjerna has some of those benchmarks and then some, so maybe they'll have confirmation or infirmation on this) so the memory saving would be "small" in the absolute (but big in the relative reduction) for relatively the same query performance, at this size

I'll read and reply tomorrow, but yes, killed is relatively small. Still, I suspect it could do with compression. Here is a box plot of the number of tuples in my dataset!!

Albin Stjerna (Sep 24 2019 at 20:33, on Zulip):

Speaking of this, we don't really need to compute liveness for every variable, right? We only need them if they "connect" to an origin either in the variable's type, or in a reference inside of a struct that implements its own drop? I think there is definitely some filtering that can be done here too.

lqd (Sep 24 2019 at 20:35, on Zulip):

a lot of filtering can be done indeed, for example when a value ultimately flows into 'static

lqd (Sep 24 2019 at 20:36, on Zulip):

and thank you for the tuple numbers :)

Albin Stjerna (Sep 25 2019 at 13:46, on Zulip):

Actually, I think all the inputs relating to variables are just used for filtering as well, but that may be positive filtering, but I suspect this is best solved at the fact generation phase (i.e. only emitting liveness inputs for variables we actually may care about the liveness of)

Albin Stjerna (Sep 25 2019 at 13:48, on Zulip):

@lqd In your benchmarks, how are you measuring time? Do you use the complete solve-time or just the "borrow check" part of it, ignoring liveness? I'm worried that liveness now occupies a very large portion of the runtime, given how the correlations between total runtime and input variable sizes look (number of variables and other very liveness-related inputs completely dominate.

Albin Stjerna (Sep 25 2019 at 13:48, on Zulip):

I really want to fix that benchmark mode

lqd (Sep 25 2019 at 13:49, on Zulip):

I'm benchmarking the analysis itself so I'm only using the computation_start.elapsed() value, the rest is either counting reading files and interning, cloning data, or computing other things

lqd (Sep 25 2019 at 13:51, on Zulip):

(this should all probably go in a dedicated topic but yes liveness filtering should be probably done in rustc like it's done in NLL)

Albin Stjerna (Sep 25 2019 at 14:00, on Zulip):

I'm benchmarking the analysis itself so I'm only using the computation_start.elapsed() value, the rest is either counting reading files and interning, cloning data, or computing other things

Ok, so you measure liveness computations and everything?

lqd (Sep 25 2019 at 14:16, on Zulip):

no liveness and initialization is done before creating the analysis timer, this is what I'm measuring

Albin Stjerna (Sep 25 2019 at 14:17, on Zulip):

Ah, ok. Fair enough!

Albin Stjerna (Sep 25 2019 at 14:17, on Zulip):

So then we don't know if those explode completely

Albin Stjerna (Sep 25 2019 at 14:17, on Zulip):

Which, I guess, makes sense

lqd (Sep 25 2019 at 14:33, on Zulip):

they do not explode completely on what I've tried, eg clap, where Naive + filtering is like <300ms

Albin Stjerna (Sep 25 2019 at 15:00, on Zulip):

I'm really looking forward to some systematic benchmarking at some point though :)

Albin Stjerna (Sep 25 2019 at 15:01, on Zulip):

In particular if we could get continuous benchmarking

Last update: Nov 15 2019 at 20:50UTC