Stream: t-compiler/wg-nll

Topic: optimizing-transitive-closure


nikomatsakis (May 11 2018 at 23:20, on Zulip):

I spent some time optimizing the borrow-check transitive closure computation. The more optimized version no longer computes the full transitive closure at each point; instead, it only computes the transitive closure for those regions that are going out of scope along an edge. This can probably be made better still, but already it makes quite a difference:

Mr-Darcy. cargo run --release -- -a TimelyOpt inputs/clap-rs/app-parser-{{impl}}-add_defaults/ --skip-tuples
    Finished release [optimized] target(s) in 0.07s
     Running `target/release/borrow-check -a TimelyOpt 'inputs/clap-rs/app-parser-{{impl}}-add_defaults/' --skip-tuples`
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults/
Time: 63.513s
Mr-Darcy. cargo run --release -- -a Naive inputs/clap-rs/app-parser-{{impl}}-add_defaults/ --skip-tuples
    Finished release [optimized] target(s) in 0.22s
     Running `target/release/borrow-check -a Naive 'inputs/clap-rs/app-parser-{{impl}}-add_defaults/' --skip-tuples`
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults/
Time: 131.559s
nikomatsakis (May 11 2018 at 23:21, on Zulip):

I wonder if tables work. Let's find out.

version time
before 131s
after 63s
nikomatsakis (May 11 2018 at 23:21, on Zulip):

obviously still some way to go

nikomatsakis (May 11 2018 at 23:26, on Zulip):

though I strongly suspect that https://github.com/rust-lang-nursery/borrow-check/issues/20 would help a lot -- @qmx, did you ever do anything in that direction btw?

nikomatsakis (May 11 2018 at 23:27, on Zulip):

I gotta figure out the best way to do some profiling of the timely stuff though. It's sort of opaque to me still. Frank McSherry recommended adding count calls at one point to try and observe how many tuples were flying around...

nikomatsakis (May 11 2018 at 23:29, on Zulip):

made one other improvement, and now it's done to 40s

nikomatsakis (May 11 2018 at 23:39, on Zulip):

opened a pull request. Not sure who might want to review it =)

qmx (May 11 2018 at 23:40, on Zulip):

That's my weekend project

nikomatsakis (May 11 2018 at 23:40, on Zulip):

ok. I'm stuck in GRU for a few more hours so I may poke at it

nikomatsakis (May 11 2018 at 23:41, on Zulip):

or something else, I'm going to first see if I can instrument the code more to get more insights

nikomatsakis (May 11 2018 at 23:41, on Zulip):

I still can't believe it's taking 40s :)

qmx (May 11 2018 at 23:41, on Zulip):

Go for it, I'm studying the basics

nikomatsakis (May 11 2018 at 23:41, on Zulip):

how many fricking billions of computer cycles is that :P

nikomatsakis (May 11 2018 at 23:42, on Zulip):

(good news: I checked, and my new analysis produces precisely the same final result as the old one)

nikomatsakis (May 11 2018 at 23:48, on Zulip):

woah, this surprises me, I have to admit:

nikomatsakis (May 11 2018 at 23:48, on Zulip):
Mr-Darcy. Mr-Darcy. wc -l outlives.facts
  534327 outlives.facts
Mr-Darcy. wc -l region_live_at.facts
  782146 region_live_at.facts
Mr-Darcy. wc -l cfg_edge.facts
   51896 cfg_edge.facts
nikomatsakis (May 11 2018 at 23:48, on Zulip):

well, maybe not the region-live-at, but still, those are big numbers

nikomatsakis (May 11 2018 at 23:48, on Zulip):

esp. the first one

nikomatsakis (May 11 2018 at 23:49, on Zulip):

and an awful lot of those are bidirectional, just by inspection:

nikomatsakis (May 11 2018 at 23:49, on Zulip):
"\'_#13627r"    "\'_#7r"    "Mid(bb0[3])"
"\'_#13628r"    "\'_#8r"    "Mid(bb0[3])"
"\'_#7r"    "\'_#13627r"    "Mid(bb0[3])"
"\'_#8r"    "\'_#13628r"    "Mid(bb0[3])"
qmx (May 11 2018 at 23:50, on Zulip):

huh?

nikomatsakis (May 11 2018 at 23:51, on Zulip):

when we have an invariant lifetime or are just requiring equality, we will generate 'a: 'b and 'b: 'a

nikomatsakis (May 11 2018 at 23:51, on Zulip):

so two edges

nikomatsakis (May 11 2018 at 23:51, on Zulip):

I am hypothesizing that this maybe contributes to the remarkable number of outlives facts there (500k)

nikomatsakis (May 11 2018 at 23:51, on Zulip):

but i'm not sure if that's really true

nikomatsakis (May 11 2018 at 23:51, on Zulip):

as I dig a bit more

nikomatsakis (May 11 2018 at 23:52, on Zulip):

mm, actually, it does seem to happena lot

nikomatsakis (May 11 2018 at 23:52, on Zulip):

e.g. here as well:

"\'_#563r"  "\'_#559r"  "Mid(bb240[6])"
"\'_#564r"  "\'_#560r"  "Mid(bb240[6])"
"\'_#565r"  "\'_#561r"  "Mid(bb240[6])"
"\'_#566r"  "\'_#562r"  "Mid(bb240[6])"
"\'_#559r"  "\'_#563r"  "Mid(bb240[6])"
"\'_#560r"  "\'_#564r"  "Mid(bb240[6])"
"\'_#561r"  "\'_#565r"  "Mid(bb240[6])"
"\'_#562r"  "\'_#566r"  "Mid(bb240[6])"
qmx (May 11 2018 at 23:52, on Zulip):

would it make sense to add metadata to an edge? like marking it as a invariant

qmx (May 11 2018 at 23:53, on Zulip):

or even better, do we even care that things are invariant or not?

nikomatsakis (May 11 2018 at 23:55, on Zulip):

I'm not sure yet what makes sense, I just think it's interesting.

nikomatsakis (May 11 2018 at 23:55, on Zulip):

you could imagine that adding a new relation -- like subset -- just for equality

nikomatsakis (May 11 2018 at 23:56, on Zulip):

but it would make everyhting more complex

nikomatsakis (May 11 2018 at 23:56, on Zulip):

I'd prefer not to

nikomatsakis (May 11 2018 at 23:56, on Zulip):

but still might be something we can exploit somehow

nikomatsakis (May 12 2018 at 00:18, on Zulip):

oh, hmm. I see part of the problem is the hack I did around the return type, which causes outlives relations to be added at all points

nikomatsakis (May 12 2018 at 00:54, on Zulip):

https://github.com/rust-lang-nursery/borrow-check/issues/24

Frank McSherry (May 12 2018 at 09:59, on Zulip):

Here I is! I'm heading over to the repo to try and page in the Datalog and see if I can say something intelligent. Immediate (superficial) response: if there are equivalence classes of nodes all mutually connected by bi-directional edges, then coalescing them down could help a lot. The SCC logic does this for transitive closure, but it may need to be something different here (not up to speed enough to be sure).

Frank McSherry (May 12 2018 at 10:18, on Zulip):

The times you are seeing do seem quite large for the number of tuples you are producing as output. By comparison, using DD to do a memory aliasing computation on the linux kernel produces ~1B facts and takes about 200s; a dataflow (null pointer flow) computation on httpd produces 10M output facts and takes about 10s.

I've got the code and am starting to work through it, but one thing that stands out is a large number of "cyclic joins", which are triangle-like joins of the form:

Result(x,y,z) <- A(x,y), B(y,z), C(x,z)

These can be painful because the A(x,y), B(y,z) join may produce many intermediate results, which may then be substantially reduced by C(x,z). This may be fundamental, but there are some clever ways to execute these joins (e.g. in this example of triangle counting using differential dataflow).

I'll take a look and see if re-wiring these joins to use the improved join patterns helps. One way to "check" is to take the final results for the relations and check out the cardinalities. If for example y has a high degree in A and B then A(x,y),B(y,z) will probably be massive, perhaps much larger than C, and the improved algorithms will likely help.

Frank McSherry (May 12 2018 at 10:25, on Zulip):

Another optimization that the triangle example above points out, which could shave constant factors but not asymptotic terms, is to use the "arrangement" infrastructure to arrange each (collection, key) pair at most once.

For example, there is a lot of

some_other_collection.semijoin(&region_live_at);

in the code, and each invocation will form a new indexed form of the contents of region_live_at. Instead, you can write

let region_live_at_by_self = region_live_at.arrange_by_self();

which gives you an arranged stream, one that you can use with join_core to effect a semijoin without rebuilding a region_live_at index for each use:

some_other_collection.join_core(&region_live_at_by_self, |&key, &val, &()| Some((key, val)));

This can reduce the memory and compute footprint by factors that depend on the amount of re-use of these relations. This will not take anything from 40s to 1s, though.

Frank McSherry (May 12 2018 at 11:03, on Zulip):

Another random (possibly out-dated) observation: In the computation of subset there is the rule

    // subset(R1, R3, P) :-
    //   subset(R1, R2, P),
    //   subset(R2, R3, P).

These sorts of rules are "quadratic" and often terrifying. The problem is that when we show up with 7.5M facts in subset (its final size) then there is a potential massive explosion of facts here, even if most of them end up being redundant.

In a transitive closure computation, we would likely instead write

    // subset(R1, R3, P) :-
    //   subset(R1, R2, P),
    //   outlives(R2, R3, P).

which grows subset more slowly (over more rounds) but also doesn't experience a massive (potentially quadratic) derivation spike in any one round. If I make that change with the Naive strategy the time to derive subset drops from 40s down to 15s.

IMPORTANTLY: This change also produces the wrong answer, because you have other subset facts produced by crawling along the cfg_edge relation, so it isn't always applicable, but I thought I'd point it out anyhow as an example of how some restructuring can mitigate some of the performance pain.

Frank McSherry (May 12 2018 at 11:14, on Zulip):

Another random observation: in the Naive execution, although subset produces 7.5M distinct derived results, there are actually 59,275,254 derived facts over 1,360 rounds. This implies a certain amount of redundancy, which might be legit but is also starting to explain where the 40s are coming from. :D

Off to look into the TimelyOpt variant and see if I can grok that.

lqd (May 12 2018 at 11:14, on Zulip):

IIRC this subset rule was where most of the time came from in the Naive computation (eg removing it dropped from 140s to 9s)

lqd (May 12 2018 at 11:16, on Zulip):

(the results were then very wrong of course ;)

Frank McSherry (May 12 2018 at 11:21, on Zulip):

Looking at the new version of the logic, the first question that comes up (applies to the old version too) is whether it is important to produce all of subset and then determine requires, or whether the derivation of requires can drive the derivation of subset. I.e. it seems we only need subset here:

    // requires(R2, B, P) :-
    //   requires(R1, B, P),
    //   subset(R1, R2, P).

which might mean we can co-derive subset and requires together, restricting the derivation of subset to those (R1, P) present in requires.

Frank McSherry (May 12 2018 at 11:25, on Zulip):

Similarly, but perhaps less complicatedly, can we fuse the subset logic into the derivation of requires?

In the Naive derivation, we have a bunch of rules like

    // subset(R1, R2, P) :- outlives(R1, R2, P).

    // subset(R1, R3, P) :-
    //   subset(R1, R2, P),
    //   subset(R2, R3, P).

    // subset(R1, R2, Q) :-
    // subset(R1, R2, P),
    //   cfg_edge(P, Q),
    //   (region_live_at(R1, Q); universal_region(R1)),
    //   (region_live_at(R2, Q); universal_region(R2)).

and it seems possible that we could replace

     // requires(R2, B, P) :-
     //   requires(R1, B, P),
     //   subset(R1, R2, P).

with flavors of the above rules and avoid deriving subset at all.

Frank McSherry (May 12 2018 at 11:41, on Zulip):

Niko asked elsewhere

I find myself wanting to do a kind of “split” – that is, take a set of tuples, and do a join and an antijoin, and process the resulting tuples differently depending on which way they go. Is there an optimized way to do that?

The answer should be yes. Let me try talking it out to convince myself. This is the implementation of antijoin:

        self.concat(&self.semijoin(other).negate())

so it computes self.semijoin(other) as part of its definition. Essentially it determines an antijoin by taking the original stream and subtracting off the semijoined terms. You could totally do the same thing by hand:

let semi = rel1.semijoin(&rel2);
let anti = semi.negate().concat(&rel1);

This does no more work than antijoin and gives you a handle to both semi and anti, which should be what you want.

Important caveat: antijoin only works correctly under the assumption that rel2 has maximum cardinality one; the semijoin implementation is not a restriction so much as a multiplication, so if rel2 has records with multiplicity 2 or more, the antijoin logic is messed up. This can always be fixed with a distinct, but it isn't done automatically in case you have other ways to guarantee that distinctness holds.

lqd (May 12 2018 at 12:10, on Zulip):

(if it helps visualize the datalog, this is a graph of the Naive rules) Naive-precedence-graph.png

Frank McSherry (May 12 2018 at 12:17, on Zulip):

Ooo, neat.

lqd (May 12 2018 at 12:20, on Zulip):

(this is generated by soufflé's debug-report) it could be neat to have a graph of the timely computation, the different operators and so on

Frank McSherry (May 12 2018 at 12:20, on Zulip):

Some more data from TimelyOpt: the new subset computation takes about 16s and produces 2,714,121 distinct facts. The requires computation then takes 35s longer and produces 9,585,213 distinct facts. It produces about 30m facts pre-distinctness, which isn't massively out of line with a 1us per fact rate that other computations are doing. If there really are 9.5M result requires tuples needed, it might be that it is time to start engineering the computation rather than the query.

Frank McSherry (May 12 2018 at 12:21, on Zulip):

@lqd Yeah, we've had that in the past. The logging machinery logs all dataflow creation events, and you can feed them in to a visualizer of your choosing (Andrea Lattuada had one that would produce static graphs). You .. can end up with a bit of a tangle, though. :)

Frank McSherry (May 12 2018 at 12:22, on Zulip):

That picture is much prettier (simpler, let's say) than what we usually produce (with all the map and concat and loop feedback nodes and edges).

lqd (May 12 2018 at 12:22, on Zulip):

:)

lqd (May 12 2018 at 13:32, on Zulip):

still trying out soufflé with the Naive rules, its profiler is cool, eg:

Relations — sorted by Total time, desc

Name ID Total Time Non Rec Time Rec Time Copy Time Tuples % of Time % of Tuples
subset R1 16.07m 0.056s 16.03m 2.31s 7.53M 92.3 42.0
requires R2 78.9s 0.002s 72.9s 6.06s 9.59M 7.56 53.4
borrow_live_at R3 1.43s 1.43s 0 0 832K 0.137 4.64

Rules — sorted by Total time, desc

Name ID Total Time Non Rec Time Rec Time Copy Time Tuples % of Time % of Tuples
subset(R1,R3,P) :- subset(R1,R2,P), subset(R2,R3,P). C1.1 15.80m 0 15.75m 2.51s 4.97M 90.8 27.7
requires(R2,B,P) :- requires(R1,B,P), subset(R1,R2,P). C2.1 56.4s 0 53.6s 2.74s 5.42M 5.40 30.2
requires(R,B,Q) :- requires(R,B,P), cfg_edge(P,Q), region_live_at(R,Q), !killed(B,P). C2.2 12.2s 0 10.6s 1.54s 3.06M 1.16 17.0
requires(R,B,Q) :- requires(R,B,P), cfg_edge(P,Q), universal_region(R), !killed(B,P). C2.3 9.08s 0 8.52s 0.556s 1.10M 0.869 6.14
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), region_live_at(R1,Q), region_live_at(R2,Q). C1.2 6.37s 0 5.45s 0.921s 1.82M 0.610 10.2
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), region_live_at(R1,Q), universal_region(R2). C1.3 3.89s 0 3.81s 0.076s 151K 0.372 0.843
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), universal_region(R1), region_live_at(R2,Q). C1.4 3.67s 0 3.65s 0.021s 42.4K 0.352 0.236
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), universal_region(R1), universal_region(R2). C1.5 3.32s 0 3.32s 0.006s 11.2K 0.318 0.062
borrow_live_at(P,B) :- requires(R,B,P), region_live_at(R,P). N3.1 1.09s 1.09s 0 0 803K 0.104 4.47
borrow_live_at(P,B) :- requires(R,B,P), universal_region(R). N3.2 0.325s 0.325s 0 0 29.8K 0.031 0.166
subset(R1,R2,P) :- outlives(R1,R2,P). N1.1 0.056s 0.056s 0 0 534K 0.005 2.98
requires(R,B,P) :- borrow_region(R,B,P). N2.1 0.002s 0.002s 0 0 1.89K 0.000 0.011
lqd (May 12 2018 at 13:33, on Zulip):

(yes it takes 17mins over the clap facts on this laptop :p) I'll try and get some numbers from the TimelyOpt rules

Frank McSherry (May 12 2018 at 13:38, on Zulip):

That is super cool (the souffle profiling). 17 mins is mental though. DD takes less time even if you count the build time. :D Is that the compiled mode, or interpreted, @lqd?

lqd (May 12 2018 at 13:39, on Zulip):

it's in compiled mode, and with 2 threads ;) but as niko said, it's just the original rule ordering, ie reordering it better would surely make it a lot faster — but with a 15mins feedback loop I'm not even going to look into that :p

lqd (May 12 2018 at 13:40, on Zulip):

niko mentioned having seen 10x better times from reordering

Frank McSherry (May 12 2018 at 13:40, on Zulip):

Also, more measurements and thoughts on TimelyOpt. The borrow_live_at relation, which I think is the main output of the analysis, has only 832,392 output facts, which is a substantial reduction from the 10M or so from requires. The rule that produces it is

    // borrow_live_at(B, P) :- requires(R, B, P), region_live_at(R, P)

which makes me think that there is the potential to short-circuit a bunch of the requires derivations. It isn't obvious how to do this yet, but it seems reasonable that we might be able to re-write the requires derivation so that each borrow_live_at fact acts as an antibody and suppresses further derivations of the fact (though, we have to avoid interfering with other derivations).

lqd (May 12 2018 at 13:42, on Zulip):

(I mostly wanted to see time %ages, and tuple %ages and absolute counts, per rule)

lqd (May 12 2018 at 13:43, on Zulip):

(and for comparison the Naive DD runs in 200-240s on the same machine)

Frank McSherry (May 12 2018 at 13:47, on Zulip):

Gah, Niko has disabled multi-threaded execution here:

    timely::execute_from_args(vec![].into_iter(), { ...

That would probably bump up perf a bit more.

Frank McSherry (May 12 2018 at 13:54, on Zulip):

Souffle's profiling info also looks better than timely/differential's:

TimelyOpt-profiling.png

Frank McSherry (May 12 2018 at 13:56, on Zulip):

About a 10% perf bump using github master DD, fwiw. (vs crates.io version).

lqd (May 12 2018 at 14:00, on Zulip):

@Frank McSherry here's an upload of its html profiler for you, if you wanted to check it out: https://lqd.github.io/profiler_html/main.html

Frank McSherry (May 12 2018 at 14:01, on Zulip):

steals

lqd (May 12 2018 at 14:02, on Zulip):

(the data is separate, it's a json file, very easy to steal :p)

Frank McSherry (May 12 2018 at 14:04, on Zulip):

The ETHZ group has had a mind to do a D3 attachment that takes the timely logs (they stream out too) into a live dataflow graph, so you can watch the computation and see these sorts of stats live. Just need to find someone who wants to spend the time. >.<

lqd (May 12 2018 at 17:26, on Zulip):

Just in case it's useful to anyone, here's some info about the (tentative) datalog soufflé version of TimelyOpt, with precedence graph timelyopt-precedence-graph.png and SCC graph timelyopt-scc-graph.png.

And a profile, for tuple numbers & relative %ages:
Relations — sorted by Total time, desc

Name ID Total Time Non Rec Time Rec Time Copy Time Tuples % of Time % of Tuples
requires R4 52.8s 268.8µs 46.3s 6.50s 9.59M 50.2 71.9
dead_can_reach R3 36.1s 0 36.1s 0.074s 170K 34.4 1.27
subset R1 12.6s 0.061s 11.4s 1.13s 2.70M 12.0 20.3
live_to_dead_regions R2 2.01s 0 2.01s 0.008s 38.1K 1.92 0.286
borrow_live_at R5 1.53s 1.53s 0 0 832K 1.46 6.25

Rules — sorted by Total time, desc

Name ID Total Time Non Rec Time Rec Time Copy Time Tuples % of Time % of Tuples
dead_can_reach(R1,R3,P,Q) :- dead_can_reach(R1,R2,P,Q), subset(R2,R3,P), !region_live_at(R2,Q), !universal_region(R2). C3.2 30.8s 0 30.8s 0.061s 94.2K 29.5 0.706
requires(R2,B,P) :- requires(R1,B,P), subset(R1,R2,P). C4.1 30.3s 0 24.7s 5.63s 8.73M 29.0 65.5
requires(R,B,Q) :- requires(R,B,P), cfg_edge(P,Q), region_live_at(R,Q), !killed(B,P). C4.2 12.4s 0 11.9s 0.520s 807K 11.8 6.05
requires(R,B,Q) :- requires(R,B,P), cfg_edge(P,Q), universal_region(R), !killed(B,P). C4.3 9.61s 0 9.57s 0.032s 50.1K 9.18 0.376
dead_can_reach(R2,R3,P,Q) :- live_to_dead_regions(_R1,R2,P,Q), subset(R2,R3,P). C3.1 5.25s 0 5.20s 0.049s 75.8K 5.02 0.568
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), region_live_at(R1,Q), region_live_at(R2,Q). C1.1 4.32s 0 2.93s 1.39s 2.15M 4.13 16.2
subset(R1,R3,Q) :- live_to_dead_regions(R1,R2,P,Q), dead_can_reach(R2,R3,P,Q), region_live_at(R3,Q). C1.5 2.97s 0 2.96s 0.010s 15.4K 2.84 0.116
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), region_live_at(R1,Q), universal_region(R2). C1.2 2.04s 0 2.04s 0 0 1.95 0.000
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), universal_region(R1), region_live_at(R2,Q). C1.3 1.70s 0 1.70s 11.6µs 18 1.62 0.000
subset(R1,R2,Q) :- subset(R1,R2,P), cfg_edge(P,Q), universal_region(R1), universal_region(R2). C1.4 1.63s 0 1.63s 0 0 1.56 0.000
borrow_live_at(P,B) :- requires(R,B,P), region_live_at(R,P). N5.1 1.16s 1.16s 0 0 803K 1.11 6.02
live_to_dead_regions(R1,R2,P,Q) :- subset(R1,R2,P), cfg_edge(P,Q), region_live_at(R1,Q), !region_live_at(R2,Q), !universal_region(R2). C2.1 1.14s 0 1.13s 0.011s 17.3K 1.09 0.130
live_to_dead_regions(R1,R2,P,Q) :- subset(R1,R2,P), cfg_edge(P,Q), universal_region(R1), !region_live_at(R2,Q), !universal_region(R2). C2.2 0.827s 0 0.813s 0.013s 20.8K 0.790 0.156
borrow_live_at(P,B) :- requires(R,B,P), universal_region(R). N5.2 0.346s 0.346s 0 0 29.8K 0.331 0.224
subset(R1,R2,P) :- outlives(R1,R2,P). N1.1 0.061s 0.061s 0 0 534K 0.058 4.01
requires(R,B,P) :- borrow_region(R,B,P). N4.1 248.9µs 248.9µs 0 0 1.89K 0.000 0.014
lqd (May 12 2018 at 17:31, on Zulip):

(this is not taking into account niko's recent transitive loan propagation removal)

nikomatsakis (May 14 2018 at 09:14, on Zulip):

Some more data from TimelyOpt: the new subset computation takes about 16s and produces 2,714,121 distinct facts. The requires computation then takes 35s longer and produces 9,585,213 distinct facts. It produces about 30m facts pre-distinctness, which isn't massively out of line with a 1us per fact rate that other computations are doing. If there really are 9.5M result requires tuples needed, it might be that it is time to start engineering the computation rather than the query.

I'm catching up here, but @Frank McSherry how did you gather these numbers? Can you did it with my latest version as well (which is substantially faster)?

nikomatsakis (May 14 2018 at 09:15, on Zulip):

Niko has disabled multi-threaded execution here

Oh, I did? =) I forgot about that. We should see what happens, though I think there's still plenty of room to improve on a single thread.

nikomatsakis (May 14 2018 at 09:17, on Zulip):

@Frank McSherry

Another optimization that the triangle example above points out, which could shave constant factors but not asymptotic terms, is to use the "arrangement" infrastructure to arrange each (collection, key) pair at most once.

So I saw the "arrangement" infrastructure but I totally didn't understand it. Are there docs or examples you can point me at?

Frank McSherry (May 14 2018 at 10:58, on Zulip):

I pulled the new code, and will try it out tonight. Iirc, things get a bit more complicated and I have a less good intuition about what is what, but I can attach the counters to the stream output and report back.

Frank McSherry (May 14 2018 at 11:00, on Zulip):

Yeah, let's see. I have a pile of non-public code that demonstrates it. I'll send you a copy via email (it .. relates to some other published work of people I don't want to antagonize). I'll also take a swing through the code and see if there are any opportunities for it; there were in the Naive code, and I may show you those changes to explain, even if that code is mostly dead.

lqd (May 14 2018 at 12:01, on Zulip):

wouldn't it be interesting to have a way to see those counters, maybe displayed with the -v flag ?

Frank McSherry (May 14 2018 at 12:06, on Zulip):

I can totally add that @lqd. It's just an optional e.g.

subset
    .map(|_| ())
    .consolidate()
    .inspect(|x| println!("Subset counts: {:?}", x.2));
Frank McSherry (May 14 2018 at 12:07, on Zulip):

(( Currently un-breaking something I broke in timely, so distracted, but can get around to that too ))

nikomatsakis (May 14 2018 at 13:01, on Zulip):

@Frank McSherry thanks; I tried attaching the counters but had a hard time interpreting the output. But I was missing the .map(|_| ()).consolidate(), which I think would make all the difference (I was seeing way too many tuples)

nikomatsakis (May 14 2018 at 13:07, on Zulip):

@Frank McSherry when I use the "arrange", would I want to arrange both things being joined?

nikomatsakis (May 14 2018 at 13:08, on Zulip):

I guess that only makes sense if it is joined more than once, and often one side is an "ephemeral" result

nikomatsakis (May 14 2018 at 13:10, on Zulip):

also, reading the code, I guess I could not do something like .semijoin(&arranged_region_live) -- the code seems to call arrange_by_self on the result, and I don't see a "fast path" for Arranged

nikomatsakis (May 14 2018 at 13:10, on Zulip):

that is, Arranged doesn't implement arrange_by_self as a no-op

nikomatsakis (May 14 2018 at 13:10, on Zulip):

so I guess I have to use join_core then?

nikomatsakis (May 14 2018 at 13:11, on Zulip):

which admittedly doesn't seem too bad

nikomatsakis (May 14 2018 at 13:41, on Zulip):

ok, I think I have it figured out

nikomatsakis (May 14 2018 at 13:50, on Zulip):

that improved things to 14.2s; I updated the PR

nikomatsakis (May 14 2018 at 14:40, on Zulip):

(fyi, just quickly hacking in a -w 8 argument on my local machine seemed to be approx a 2x win -- 7.9s instead of 14s)

Frank McSherry (May 14 2018 at 14:53, on Zulip):

@Frank McSherry when I use the "arrange", would I want to arrange both things being joined?

As you note, it's only helpful if there is re-use, but even if you don't arrange things the non-arranged join and such will internally do an arrange for you. They all call in to join_core, and put arranges in place if you haven't already done so yourself (and they do not re-arrange an arrangement, so shouldn't be worries there).

nikomatsakis (May 14 2018 at 14:53, on Zulip):

@Frank McSherry yep, I kind of pieced it together from reading the source. Using arrange proved to be a small win. I kept it.

nikomatsakis (May 14 2018 at 14:53, on Zulip):

That said, I'm not using it yet for antijoin, because I was too lazy :P

nikomatsakis (May 14 2018 at 14:54, on Zulip):

as an aside, I've been thinking btw about how to write a "intro to differential dataflow" tutorial =) it'd be interesting to chat over with you at some point

Frank McSherry (May 14 2018 at 14:55, on Zulip):

@nikomatsakis depending on your rig, it may be faster to use fewer workers than you have hardware threads. The timely workers are pretty impolite (and just sit on the cores) which can cause hiccups if the OS decides it has to do some things (one of the workers gets suspended, which can block parts of the compute).

nikomatsakis (May 14 2018 at 14:55, on Zulip):

this is why we should add a flag so we can experiment with the number :) but that said I expect in the real compiler to have many simul. jobs, so it may not typically be worth using >1 worker.

nikomatsakis (May 14 2018 at 14:55, on Zulip):

that said, I imagine we might want to make it depend on the size of the fn

nikomatsakis (May 14 2018 at 14:55, on Zulip):

in any case I still think there's a lot of improvements to be made without jumping to parallelism

Frank McSherry (May 14 2018 at 14:57, on Zulip):

Totally. Is it correct that the "output" for the computation is just the borrow_live_at relation? Is it important to compute all of subset and requires other than for determining bla, and might it be possible that derived bla tuples can suppress derivations in subset and requires?

qmx (May 14 2018 at 14:57, on Zulip):

heh, I was going to ask if timely used rayon work-stealing :)

Frank McSherry (May 14 2018 at 14:59, on Zulip):

@qmx It doesn't (use rayon). It's a sane question to ask if that makes sense, but it conflicts a bit with the dataflow design elsewhere (e.g. the inter-operator shared state all needs mutexes, at least if done naively).

nikomatsakis (May 14 2018 at 15:01, on Zulip):

Is it correct that the "output" for the computation is just the borrow_live_at relation?

Mostly; in fact, I think we can make an even more narrow relation -- this is what @Santiago Pastorino and @Reed Koser have been working on in issue #4. Basically, we can narrow down to a set of points and loans where we need borrow_live_at.

That said, there is one other piece of the check that I have to think about, where we might wind up needing subset in some other cases. (But still not the full relation)

Is it important to compute all of subset and requires other than for determining bla

No

and might it be possible that derived bla tuples can suppress derivations in subset and requires?

I think you are saying "is it possible that -- because borrow_live_at only cares about some region that requires a loan L -- we might save some time because right now we compute all regions that require that loan L?" If so, that might be true. I'm not really sure how much in practice now that I've suppressed the transitive computation. I suppose we could try to figure it out.

nikomatsakis (May 14 2018 at 15:02, on Zulip):

I was also wondering whether we will save time just by knowing that we only care about a subset of loans in any case. But It's hard to say how often that will be true.

Frank McSherry (May 14 2018 at 15:03, on Zulip):

In previous version (as of a day or two back) the size of requires was about 10x that of borrow_live_at, which made it seem like perhaps there were ways to thin down redundant derivations. Total speculation, though.

nikomatsakis (May 14 2018 at 15:04, on Zulip):

makes sense; it's worth checking again

nikomatsakis (May 14 2018 at 15:05, on Zulip):

in any case I'd like to add some kind of count calls that can dump out the size of each relation

lqd (May 14 2018 at 21:04, on Zulip):

"Regular Datalog" for incremental view maintenance from this paper https://arxiv.org/abs/1804.10565 could be interesting in our case (but maybe too limiting)

Frank McSherry (May 16 2018 at 12:34, on Zulip):

I just wrote a Datalog-specific framework, not built on timely dataflow. It is a lot simpler, and could in principle be a lot faster. At the moment it is not actually faster, due to be being lazy about how I do merge sort. It's about 300 lines of code, and currently correctly does a reference reachability computation. The intent here is that if it ends up both i. faster and ii. easier to integrate into rustc, then great!

nikomatsakis (May 16 2018 at 12:36, on Zulip):

@Frank McSherry I'm not sure what you are saying, is this a new project?

lqd (May 16 2018 at 12:36, on Zulip):

@Frank McSherry I was looking at Jamie Brandon's older Imp ideas for the same reason if you remember his work (some with DD :)

Frank McSherry (May 16 2018 at 12:36, on Zulip):

Yeah, brand new as of this morning.

nikomatsakis (May 16 2018 at 12:37, on Zulip):

interesting. I was wondering whether a "pared down differential-dataflow" might do well for us

Frank McSherry (May 16 2018 at 12:37, on Zulip):

Currently trying to sort out some performance issues (using sort_unstable increases running time by about 1.5x over sort).

Frank McSherry (May 16 2018 at 12:38, on Zulip):

It's a super-simple implementation. I'll try and document it a bit more this evening. Currently does monotonic variables with join and map, but semijoin and antijoin should be easy.

nikomatsakis (May 16 2018 at 12:39, on Zulip):

cool.

Frank McSherry (May 16 2018 at 12:40, on Zulip):

Here's a gist at the moment: https://gist.github.com/frankmcsherry/60be7cfe54b5bf096e879c9d15d679e1

it's not well explained yet, but roughly: variables have a pile of facts, a "recent" pile of facts (new as of the last round) and a "to_add" pile of facts that should be introduced in the next round (and made recent). All of the operators look at variables as inputs, and produce new outputs "to_add".

Frank McSherry (May 16 2018 at 12:41, on Zulip):

This part is what a program looks like:

    let mut iteration = Iteration::new();

    let variable1 = iteration.variable::<(u32,u32)>("nodes");
    let variable2 = iteration.variable::<(u32,u32)>("edges");

    variable1.insert(nodes.into());
    variable2.insert(edges.into());

    while iteration.changed() {

        // N(a,c) <-  N(a,b), E(b,c)
        join_into(&variable1, &variable2, &variable1, |_b, &a, &c| (c,a));

    }

    let reachable = variable1.complete();
lqd (May 16 2018 at 12:43, on Zulip):

the return of gallop:)

Frank McSherry (May 16 2018 at 12:43, on Zulip):

It's the same one that Jamie used, fwiw. :)

qmx (May 16 2018 at 12:48, on Zulip):

this might be not the end solution, but is immensely useful to me on the educational front to understand what's going on :)

Frank McSherry (May 16 2018 at 12:53, on Zulip):

I figure another advantage might be that, that this is a bit easier to grok, maintain, and debug (correctness/performance) if things go wrong (or if I get hit by a bus).

Frank McSherry (May 16 2018 at 12:53, on Zulip):

Also, this reachability example at least takes 1.3s to compile in release mode. That might be welcome too. :)

nikomatsakis (May 16 2018 at 12:56, on Zulip):

I hadn't really considered this, but I love the idea of having a rust-lang-nursery crate that is just the core "datalog-y" framework that is owned by compiler team

nikomatsakis (May 16 2018 at 12:56, on Zulip):

it does address some of my other concerns about taking on differential-dataflow as a dependency

nikomatsakis (May 16 2018 at 12:56, on Zulip):

e.g., I would prefer we keep rustc deps easily auditable

nikomatsakis (May 16 2018 at 12:57, on Zulip):

it actually doesn't matter whether it's owned by rust-lang-nursery or not, but basically I just mean a project that can grow its own community

nikomatsakis (May 16 2018 at 12:57, on Zulip):

and presumably be usable outside of just borrow-check

nikomatsakis (May 16 2018 at 12:57, on Zulip):

I feel like a number of compiler optimizations etc might benefit from the same infrastructure

nikomatsakis (May 16 2018 at 12:58, on Zulip):

obvious example; we compute liveness by hand with a home-grown thing

nikomatsakis (May 16 2018 at 12:59, on Zulip):

(I have contemplated integrating that computaiton into borrow-check as well at some point...)

nikomatsakis (May 16 2018 at 13:15, on Zulip):

@Frank McSherry the more I think about this the more I like it :) question: how long do you think it will take you to get something usable, and, would there be ways for us to help you :)

nikomatsakis (May 16 2018 at 13:16, on Zulip):

"you're not alone man"

Frank McSherry (May 16 2018 at 13:35, on Zulip):

It's up and running. I'll push a version to github, and you can take a peek / comment on what's missing etc.

nikomatsakis (May 16 2018 at 13:35, on Zulip):

we definitely need antijoin etc

nikomatsakis (May 16 2018 at 13:35, on Zulip):

though only I think on facts

nikomatsakis (May 16 2018 at 13:35, on Zulip):

maybe I can see if I can ascertain how one might add it ;)

Frank McSherry (May 16 2018 at 13:35, on Zulip):

do we name things datalog-rs or datalog_rs or what?

Frank McSherry (May 16 2018 at 13:35, on Zulip):

Oh, antijoin is super easy; I can do that, just lazy.

nikomatsakis (May 16 2018 at 13:35, on Zulip):

crate names should have -

nikomatsakis (May 16 2018 at 13:36, on Zulip):

Oh, antijoin is super easy; I can do that, just lazy.

I figured, it might just be interesting for me to contemplate how to do it :)

nikomatsakis (May 16 2018 at 13:36, on Zulip):

in order to make myself understand code better

nikomatsakis (May 16 2018 at 13:36, on Zulip):

although it looked prety clear from my skim

nikomatsakis (May 16 2018 at 13:36, on Zulip):

(I don't know if I'll have time for that today anyway, i'm on a compressed schedule since I have to take my daughter to the hospital later)

Frank McSherry (May 16 2018 at 13:36, on Zulip):

Ah, in which case I can give you a chance. :) There is some code with the signature commented out, but the logic in the body is totally not correct. (in the gist).

Jake Goulding (May 16 2018 at 13:36, on Zulip):

do we name things datalog-rs or datalog_rs or what?

and it's recommended to not have "rs". I think Cargo will remove it automatically now

nikomatsakis (May 16 2018 at 13:36, on Zulip):

@Frank McSherry heh don't feel the need to wait for me :)

nikomatsakis (May 16 2018 at 13:37, on Zulip):

oh, that too

nikomatsakis (May 16 2018 at 13:37, on Zulip):

but I think the github project can have it, just the cargo name typically doesn't

nikomatsakis (May 16 2018 at 13:37, on Zulip):

for obvious reasons

qmx (May 16 2018 at 13:37, on Zulip):

@nikomatsakis should I halt the other efforts in lieu of integrating this?

nikomatsakis (May 16 2018 at 13:38, on Zulip):

@qmx I still think we want to do that factoring

nikomatsakis (May 16 2018 at 13:38, on Zulip):

but probably not worth investigating how to get differential-dataflow into rustc

nikomatsakis (May 16 2018 at 13:38, on Zulip):

until we decide one way or the other here

nikomatsakis (May 16 2018 at 13:38, on Zulip):

that is, we still want to factor borrow-check itself into a reusable library

qmx (May 16 2018 at 13:39, on Zulip):

grr, I need to take a month-long vacation to play with rust

nikomatsakis (May 16 2018 at 13:39, on Zulip):

@qmx do it do it

qmx (May 16 2018 at 13:39, on Zulip):

/me looks at the clock

Frank McSherry (May 16 2018 at 13:40, on Zulip):

https://github.com/frankmcsherry/datalog-rs

Frank McSherry (May 16 2018 at 13:58, on Zulip):

What does :crab: mean?

Frank McSherry (May 16 2018 at 13:58, on Zulip):

It's like the bat-sign, but for Rust code? :)

nikomatsakis (May 16 2018 at 13:59, on Zulip):

http://www.rustacean.net/

nikomatsakis (May 16 2018 at 13:59, on Zulip):

I use it to indicate ferris =)

nikomatsakis (May 16 2018 at 14:01, on Zulip):

![ferris](http://www.rustacean.net/assets/rustacean-orig-noshadow.png)

nikomatsakis (May 16 2018 at 14:01, on Zulip):

a Rusty high five ;)

Frank McSherry (May 16 2018 at 14:02, on Zulip):

I was just trying to suss out the social implication. Like, if all of you had bowed your heads and clicked your hands as you scuttled side to side, you know "gotcha, but weird". :)

Frank McSherry (May 16 2018 at 14:02, on Zulip):

Rusty high five is better. ;) Antijoin is live, btw. Not tested yet, though.

nikomatsakis (May 16 2018 at 14:03, on Zulip):

if we could configure this part of Zulip, I would make the + key add the :crab: instead of :+1:

nikomatsakis (May 16 2018 at 14:03, on Zulip):

maybe I should open an issue

Frank McSherry (May 16 2018 at 14:04, on Zulip):

Wow. + does do that!

Frank McSherry (May 16 2018 at 14:57, on Zulip):

Repo updated with a hypothetical borrow checker (from naive, but without the universal region stuff). I haven't checked that it is correct yet, and so very bold to claim it would work. I did verify the reachability computation, but I'll try and get some facts loaded in and see if it produces the right number of tuples at least.

lqd (May 16 2018 at 14:58, on Zulip):

(the zoidberg salute :3)

Frank McSherry (May 16 2018 at 14:59, on Zulip):

Also, you can see from the borrow_check.rs example, https://github.com/frankmcsherry/datalog-rs/blob/master/src/bin/borrow_check.rs, that the syntax can be a bit heavy. No method chaining at the moment, in the interest of keeping things lightweight (each call needs some more state managed, and we ask the programmer to do that manually rather than stashing it somewhere secret and managing it).

lqd (May 16 2018 at 15:01, on Zulip):

if need be we can also generate calls to this API, parsed from datalog clauses

nikomatsakis (May 16 2018 at 15:01, on Zulip):

I guess a key difference here is that you are not — as in differential-dataflow — modeling a program, but rather directly executing the actions

nikomatsakis (May 16 2018 at 15:01, on Zulip):

right?

nikomatsakis (May 16 2018 at 15:01, on Zulip):

(hence the while and so forth)

Frank McSherry (May 16 2018 at 15:02, on Zulip):

That is definitely a difference, yes. Each of the "direct executions" says "pick up any new records since last time around, and do what needs to be done wrt adding records to the output".

Frank McSherry (May 16 2018 at 15:02, on Zulip):

but it does mean that someone needs to name the output, as the loop itself doesn't persist any state from iteration to iteration.

nikomatsakis (May 16 2018 at 15:03, on Zulip):

I'm not sure what you mean by "name" yet

Frank McSherry (May 16 2018 at 15:03, on Zulip):

Could probably clean this up a bit idiomatically, but could also let it be. It was pretty formulaic writing out the names of the intermediate relations, if tedious.

nikomatsakis (May 16 2018 at 15:03, on Zulip):
subset_r1p.from_map(&subset, |&(r1,r2,p)| ((r1,p),r2));
nikomatsakis (May 16 2018 at 15:04, on Zulip):

I guess subset_r1p is the "named output"

Frank McSherry (May 16 2018 at 15:04, on Zulip):

"name" meaning only "the operator itself does not create such a thing for you". So you need to point the operator at some place for it to put the results.

nikomatsakis (May 16 2018 at 15:04, on Zulip):

you are referring to?

Frank McSherry (May 16 2018 at 15:04, on Zulip):

Yes! The code is written as a lot of relation.from_operator(...) where the relation gets some more facts from the application of whatever the operator is.

nikomatsakis (May 16 2018 at 15:05, on Zulip):

right, ok. I mean I think that's fine for now.

nikomatsakis (May 16 2018 at 15:05, on Zulip):

it is still pretty easy to map to-from the underlying datalog

Frank McSherry (May 16 2018 at 15:05, on Zulip):

Old code was rel1.join(&rel2).join(&rel3).map(..).blah(...) where each step along the way the framework built the temporary collection to manage stuff; here the programmer has to do it.

nikomatsakis (May 16 2018 at 15:07, on Zulip):

yes I get it

nikomatsakis (May 16 2018 at 15:07, on Zulip):

clearly the old code was nicer

nikomatsakis (May 16 2018 at 15:07, on Zulip):

but this seems ok too

nikomatsakis (May 16 2018 at 15:07, on Zulip):

we might be able to make some scheme where you create temporaries

nikomatsakis (May 16 2018 at 15:07, on Zulip):

that are not named

nikomatsakis (May 16 2018 at 15:07, on Zulip):

well, I guess they have to persist across loop iterations

nikomatsakis (May 16 2018 at 15:07, on Zulip):

so that doesn't work

Frank McSherry (May 16 2018 at 15:09, on Zulip):

Yeah, some things like map could be simplified, but (at least in this implementation) you need a Variable wherever you want a collection indexed by some key so that you can do a join/antijoin.

Frank McSherry (May 16 2018 at 15:10, on Zulip):

So for example the subset variable could probably be removed, as it contributes nothing over the three indexed variants. One can be produced, and the other two derived from it with from_map().

nikomatsakis (May 16 2018 at 15:10, on Zulip):

I was wondering whether this model allowed for us to optimize cases where we know something will be joined many times

nikomatsakis (May 16 2018 at 15:10, on Zulip):

e.g. something like arrange

nikomatsakis (May 16 2018 at 15:11, on Zulip):

(I didn't read closely enough yet to see)

Frank McSherry (May 16 2018 at 15:11, on Zulip):

Already does! :D

Frank McSherry (May 16 2018 at 15:11, on Zulip):

For example, in the definition of subset, the two uses of &region_live_at are just the same variable.

nikomatsakis (May 16 2018 at 15:12, on Zulip):

/me looks

nikomatsakis (May 16 2018 at 15:12, on Zulip):

I see

Frank McSherry (May 16 2018 at 15:13, on Zulip):

This .. loses a lot of the flexibility of differential dataflow. The Variable types are roughly "two-version" relations, reflecting an "old" version and a "new" version. But because we know exactly when we want to use old and new, and when the entire world ticks forward from old to new (at the bottom of each loop), it is a bit easier to share the state.

nikomatsakis (May 16 2018 at 15:13, on Zulip):

@Frank McSherry are variables always (K, V) tuples?

Frank McSherry (May 16 2018 at 15:13, on Zulip):

Variables are arbitrary Tuple: Ord types, and join applies when they are (Key, Val) pairs.

Frank McSherry (May 16 2018 at 15:16, on Zulip):

I guess they don't even have to be tuples, so perhaps that is a misnomer.

Frank McSherry (May 16 2018 at 15:17, on Zulip):

btw, (looking at the code) there are some weird perf gotchas. When we first create relations (sorted lists of tuples) sort_unstable() is faster than sort(), but when we merge relations then sort() is about 1.5x faster than sort_unstable(). Not sure what is up, but there are probably similar gotchas to watch out for perf-wise.

Frank McSherry (May 16 2018 at 15:17, on Zulip):

On the plus side, can just point a profiler at it and see wtf is going on, and change it, I guess. :)

Frank McSherry (May 16 2018 at 16:02, on Zulip):

The datalog crate name is taken, so working name is now datafrog. Complaints can be directed to .. well the maintainer of the datalog crate first, but then to /dev/null.

nikomatsakis (May 16 2018 at 16:02, on Zulip):

I think I prefer datafrog

nikomatsakis (May 16 2018 at 16:02, on Zulip):

I see the logo already :P

Frank McSherry (May 16 2018 at 16:04, on Zulip):

:frog:

nikomatsakis (May 16 2018 at 16:05, on Zulip):

even better, it's pre-drawn for you

lqd (May 16 2018 at 16:06, on Zulip):

someone will expect leapfrog triejoin then :p

Frank McSherry (May 16 2018 at 16:08, on Zulip):

Oh wow, good point @lqd

Frank McSherry (May 16 2018 at 17:45, on Zulip):

I've pushed some doctests, it seems that in simple examples at least join, antijoin, and map work as planned. I'm actually flying to Vienna for some talks tomorrow morning (8pm here now) so not sure how much I'll get done wrt trying out borrow_check.rs. Happy to help out and put some time in when I get it, but I'll probably also take some time to check out Vienna too! :)

nikomatsakis (May 16 2018 at 18:09, on Zulip):

This is all in that same repo ?

lqd (May 16 2018 at 18:10, on Zulip):

yes

lqd (May 16 2018 at 18:10, on Zulip):

@Frank McSherry bon voyage :)

lqd (May 17 2018 at 01:12, on Zulip):

@Frank McSherry I've continued your borrowck-data :frog: example: 1) loaded nll facts, 2) added borrow_live_at(but couldn't test it) 3) and begun testing the computation: the first iteration, computing subset, works and produces the same tuples as DD. However, the next iteration, for requires doesn't: I've surely made a mistake in the inputs or something, but I don't know where (that's also the reason why I couldn't test if borrow_live_at is correctly done or not). My branch is here https://github.com/lqd/borrow-check/tree/datafrog — I've also temporarily disabled timely there so that it builds in 15s. I was testing with the smaller issue-47680 input, requiresproduces around 100 tuples instead of approx 150.

Jake Goulding (May 17 2018 at 02:40, on Zulip):

I'm very pleased with "datafrog". Well done.

Frank McSherry (May 17 2018 at 05:37, on Zulip):

I'll check it out @lqd. One possible outcome (other than "you erred" or "I erred") is that it is computing the same answer, just differently. This implementation intentionally holds data back a bit longer to make sure everyone gets the same single view of the data, and that might mean that it takes more rounds for the facts to propagate (roughly the same amount of work though).

Ima investigate, but if you notice that it does indeed reach a different limit (different number of bla facts, for example) that is 100% evidence that something is wrong, whereas "second round different" might just be a warning sign.

Frank McSherry (May 17 2018 at 05:47, on Zulip):

Ok, totally not the right total number of tuples:

Echidnatron% cargo +nightly run --release -- -a Datafrog inputs/clap-rs/app-parser-{{impl}}-add_defaults/
    Finished release [optimized] target(s) in 0.08s
     Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults/'`
subset is complete: 7531526
requires is complete: 84101
borrow_live_at is complete: 31933
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults/
Time: 33.721s
Frank McSherry (May 17 2018 at 05:49, on Zulip):

The count for subset looks "correct" (previously reported as 7.53M tuples), so will dive in to requires and what might be up there!

Frank McSherry (May 17 2018 at 05:50, on Zulip):

Ah, one important thing (sorry!): the borrow_check.rs code that I hacked up (which looks like what you've based things on) totally ignored universal regions. I saw the issue that they were going away, and didn't port them for reasons of simplicity. But as long as the region_live_at relations don't yet have the universal regions in them that could be a source of discrepancy. Could be other issues too, of course. :D

Frank McSherry (May 17 2018 at 06:00, on Zulip):

Further study (no answers yet) but it will almost 100% be the case that the number of tuples in each round of derivation are different from before. In particular each operator application takes one round to take effect, including easy operators like from_map. That shouldn't cause the computation to reach a different limit, so the glitchy outputs at the moment are still unresolved. Possible answers: i. bugs (datafrog loves bugs), ii. universal regions.

Frank McSherry (May 17 2018 at 06:20, on Zulip):

Wait (reading your code more) you seem to have noticed the universal region thing and addressed it! I'm looking at the same issue you are looking at, and it seems small enough to get data out that are not what we expect. I'm not sure of the easiest way to get a dump of the legit requires facts from -a Naive, but if we have those and can start to debug the "when was this fact supposed to show up" process, that would be my next step.

Frank McSherry (May 17 2018 at 06:21, on Zulip):

It does totally look like Datafrog bug though. Other options seem to have expired.

Frank McSherry (May 17 2018 at 06:26, on Zulip):

Boarding in six minutes, so unlikely to solve right now. I can pick up the debugging later tonight, but next step is to find a tuple (or all 50) that should be derived but are not, so that we can check for the moment that should have happened and see what silly code prevented it.

lqd (May 17 2018 at 07:23, on Zulip):

@Frank McSherry thank you for looking into this! I'll keep digging don't worry about it, enjoy the talks and Vienna :) I was going to try and compare the requires subrules to what Native outputs (rn it's dumped with -v, as a list named restricts in the output). I'll try later today to get more information about the missing tuples. thanks again

lqd (May 17 2018 at 07:25, on Zulip):

"my kingdom for why-not provenance"

nikomatsakis (May 17 2018 at 08:07, on Zulip):

@lqd nice!

Frank McSherry (May 17 2018 at 14:03, on Zulip):

My plan for this evening: grab the facts output from Datafrog and check that i. if you run the computation again you get no more facts (if it has reached fixed point you shouldn't, but maybe you do because bugs) and ii. when you input them into Souffle with the same rules you should get a few more facts (up to the ~150). These new facts are the ones that should be derived but are not, and should be easy to check out (with the same interned numbers as Datafrog).

@lqd, do you have a Souffle implementation of the Naive rules on hand? I know you had one version, once, but is it something you can check in to your repo?

nikomatsakis (May 17 2018 at 14:04, on Zulip):

I've not had a chance to check out datafrog yet

nikomatsakis (May 17 2018 at 14:04, on Zulip):

/me dying to do that

lqd (May 17 2018 at 14:04, on Zulip):

sure, I have both the naive and opt one

lqd (May 17 2018 at 14:04, on Zulip):

(I made a bit of progress I'll explain a bit later)

Frank McSherry (May 17 2018 at 14:05, on Zulip):

Cool; between meetings now, and can check back in a few hours!

lqd (May 17 2018 at 14:08, on Zulip):

@Frank McSherry can't commit rn, but here's a gist of both in the meantime https://gist.github.com/lqd/cb4c1e615be1eb0bcbf17919c0cef937 (the opt one worked in my tests but I'm not sure if I correctly translated the universal_region removal; no biggie since you wanted the naive one)

Frank McSherry (May 17 2018 at 14:09, on Zulip):

Tyvm!

lqd (May 17 2018 at 14:09, on Zulip):

:frog:

lqd (May 17 2018 at 14:09, on Zulip):

(datafrog is indeed a name as cool as Frank's previous "differential dataflog" :D)

Frank McSherry (May 17 2018 at 14:15, on Zulip):

@lqd

Echidnatron% souffle -c ~/Projects/borrow-check/naive.dl | wc
     106     208    2241
Echidnatron%
Frank McSherry (May 17 2018 at 14:16, on Zulip):

Are we sure that 150-ish is the right answer?

lqd (May 17 2018 at 14:17, on Zulip):

I remember something like 135 now lemme check

Frank McSherry (May 17 2018 at 14:17, on Zulip):

Oh wait, this is bla as output, and you are worried about requires. Nevermind me!

lqd (May 17 2018 at 14:20, on Zulip):

think soufflé finds 152 requires

lqd (May 17 2018 at 14:22, on Zulip):

I think the last of the requires rules is interesting, commenting out the rest basically gives the same output as DD

lqd (May 17 2018 at 14:23, on Zulip):

the tuple I was about to try and locate was :
timely (hopefully soufflé as well :p) has

"Mid(bb2[0])"    "\'_#6r"         "bw0"
"Mid(bb2[0])"    "\'_#6r"         "bw2"
lqd (May 17 2018 at 14:23, on Zulip):

datafrog only has the bw0 one

Frank McSherry (May 17 2018 at 14:24, on Zulip):

Cool; this is helpful! My plan is to try and get souffle going on the interned index values; do you think going backwards from the interned indices to strings is better (I'm not clear on how to do that)

lqd (May 17 2018 at 14:25, on Zulip):

I was just doing that rn

lqd (May 17 2018 at 14:25, on Zulip):

InternerTables "untern" could I think be used to get back from those strings to the indices

lqd (May 17 2018 at 14:26, on Zulip):

but maybe using the interned values with soufflé would be less of a hassle I'm not sure

Frank McSherry (May 17 2018 at 14:26, on Zulip):

Unless you strongly object, I'm just going to hack Datafrog to dump the interned tables as text, so that I can Souffle them up and use those values.

lqd (May 17 2018 at 14:27, on Zulip):

oh I'm not objecting anything :D

lqd (May 17 2018 at 14:27, on Zulip):

(don't take this sentence out of context)

Frank McSherry (May 17 2018 at 14:29, on Zulip):

okies, off to meetings! I should get some dataz for you soon (well, later this evening).

lqd (May 17 2018 at 14:33, on Zulip):

awesome :) (just a heads up I might not have access to a computer this evening)

lqd (May 17 2018 at 16:38, on Zulip):

interestingly enough, all the 47 missing tuples from requires seem to be for the bw2 loan

lqd (May 17 2018 at 17:35, on Zulip):

@Frank McSherry I might be onto something, around the killed antijoin; killed is a <L, P> and it might need to be like the other semijoins, like <(L, P), ()>

lqd (May 17 2018 at 17:39, on Zulip):

or maybe not :/ but something is up with this antijoin

lqd (May 17 2018 at 18:05, on Zulip):

some more results: hacking the worst antijoin of all time, joining requires_bp((b,p),r) into requires_1 only if (b, p) is not in killed, we have:

subset is complete: 30
requires is complete: 152
borrow_live_at is complete: 102

of course these 102 data:frog: bla are the same ones timely returns :3

nikomatsakis (May 17 2018 at 18:07, on Zulip):

@lqd I don't understand that last line -- oh, bla is borrow_live_at

lqd (May 17 2018 at 18:07, on Zulip):

yes sorry

nikomatsakis (May 17 2018 at 18:07, on Zulip):

so basically antijoin is the problem, definitely

lqd (May 17 2018 at 18:07, on Zulip):

probably more my inputs to it I think

lqd (May 17 2018 at 18:08, on Zulip):

but there's a mismatch between what frank's requires_1 expected (eg a relation) vs what i could be (if it's the semijoin structure I mentioned)

lqd (May 17 2018 at 18:09, on Zulip):

it's definitely around this predicate for sure tho, so that's good to know :)

nikomatsakis (May 17 2018 at 18:09, on Zulip):

we gotta rename borrow_live_at to loan_live_at, so that the acronym is lla :)

nikomatsakis (May 17 2018 at 18:09, on Zulip):

/me gets confused each time he sees "bla"

lqd (May 17 2018 at 18:09, on Zulip):

and all Bs to Ls !

lqd (May 17 2018 at 18:40, on Zulip):

tried it with clap just to check the number of tuples and we're at 832392 borrow_live_at like we expect :tada: (I don't want to talk about how long it took :p)

nikomatsakis (May 17 2018 at 18:51, on Zulip):

that's with the "worst antijoin ever"?

lqd (May 17 2018 at 18:52, on Zulip):

yes indeed, and more copies/indexing than necessary (as I was trying to stay as close to the initial rules) as some of the indexing work can be shared between the 1st 2 "iterations" (as was the case with frank's original example btw)

lqd (May 17 2018 at 18:53, on Zulip):

for the smaller input it's very competitive even in this state

lqd (May 17 2018 at 18:55, on Zulip):

/me facepalms

lqd (May 17 2018 at 18:55, on Zulip):

lemme check something real quick lol

lqd (May 17 2018 at 18:57, on Zulip):

ok I take it back I do want to talk about it

lqd (May 17 2018 at 18:58, on Zulip):

with all the caveats I mentioned before, we're at 60s naive data-:frog: vs 148s for naive DD

lqd (May 17 2018 at 18:59, on Zulip):

of course among the long command line, sorting & uniquing and diffing the output, of course the "--release" characters were missing

nikomatsakis (May 17 2018 at 18:59, on Zulip):

that's awesome!

lqd (May 17 2018 at 19:01, on Zulip):

sounds good for a future :frog:-opt (which I'll do tomorrow if nobody beats me to it, as I'll be mobile-only tonight) -- frank will know how to deal with the antijoin

lqd (May 17 2018 at 19:02, on Zulip):

@nikomatsakis did I also mention polonius with datafrog compiles in 8s vs 2.5mins with timely ? :)

lqd (May 17 2018 at 19:03, on Zulip):

(on this beefy machine, at home it's like 10 mins)

lqd (May 17 2018 at 19:05, on Zulip):

(the antijoin is sooner in the datafrog example compared to naive DD, and it doesn't work the same way so of course it's a "grain of salt" situation for this comparison)

lqd (May 17 2018 at 19:12, on Zulip):

(there might be some possible parallelism to have when creating tuples in the joins each round, à la "R×S = (R delta × S) + (R × S delta) + (R delta × S delta)")

nikomatsakis (May 17 2018 at 19:35, on Zulip):

@lqd can't wait, compile times have been killin' me

Frank McSherry (May 17 2018 at 20:01, on Zulip):

This all looks really good. I can believe the antijoin is glitchy, but not obvious to me at the moment what is wrong. The argument is a Relation to try and ensure that no one uses a variable (that might grow with iteration); .. that shouldn't make it wrong, but .. it could lead to bugs? I'm not sure.

Frank McSherry (May 17 2018 at 20:02, on Zulip):

I'll try and get at this tomorrow. Like a moron, I left my power cable at IST in Vienna, so it won't be until tomorrow that I get back there, and on battery power for now.

nikomatsakis (May 17 2018 at 20:04, on Zulip):

sad that there is no "crying frog" emoji

lqd (May 17 2018 at 20:10, on Zulip):

@Frank McSherry I'm on mobile but here's teh ugliness https://gist.github.com/lqd/c2c645ac053ecc8340ca12e8b0f5d03b

Frank McSherry (May 17 2018 at 20:10, on Zulip):

Ah sweet; was just about to beg for that. :D

lqd (May 17 2018 at 20:11, on Zulip):

TIL copying big files on a phone, not that easy

Frank McSherry (May 17 2018 at 20:13, on Zulip):

Ok, I'll look into this. I've got some code that is super similar but still producing the wrong answer. Not at all clear why.

lqd (May 17 2018 at 20:13, on Zulip):

I have full confidence in your laptop, it has bested many big data clusters before :wink:

Frank McSherry (May 17 2018 at 20:14, on Zulip):

Well, it goes plenty fast on all of these computations, just gets the wrong answer. Didn't say anything about it being more accurate than the clusters. ;)

lqd (May 17 2018 at 20:14, on Zulip):

:simple_smile:

Frank McSherry (May 17 2018 at 20:15, on Zulip):

Also, yours is not the worst antijoin of all time; mine is basically the same but uses Vec::contains().

lqd (May 17 2018 at 20:16, on Zulip):

I had that before as well !

Frank McSherry (May 17 2018 at 20:17, on Zulip):

But mine gives also gives the wrong answer, so I have you beat there!

lqd (May 17 2018 at 20:18, on Zulip):

everything's a competition I see how it is :smiley:

Frank McSherry (May 17 2018 at 20:19, on Zulip):

Want to know something crazy: my antijoin isn't suppressing any tuples. Everything it sees it emits as output. Somehow it isn't seeing the right inputs, or something...

lqd (May 17 2018 at 20:21, on Zulip):

it must be doing something right, the test looked sane IIRC

lqd (May 17 2018 at 20:23, on Zulip):

that's also why I was wondering whether it was the input, which didn't look like the other semijoins (but might be handled by the operator)

Frank McSherry (May 17 2018 at 20:23, on Zulip):

I FIGURED IT OUT!

lqd (May 17 2018 at 20:23, on Zulip):

yes!

Frank McSherry (May 17 2018 at 20:24, on Zulip):

So, my antijoin was producing the right tuples. It was just producing them badly. It presumed that if you filter (K,V) then that remains in order, so it doesn't need to re-sort anything. But it forgot about the map that is being applied.

Frank McSherry (May 17 2018 at 20:24, on Zulip):
Duration { secs: 0, nanos: 213561 } subset is complete: 30
requires, initial size: 3
Duration { secs: 0, nanos: 758060 } requires is complete: 152
Duration { secs: 0, nanos: 821829 } borrow_live_at is complete: 102
lqd (May 17 2018 at 20:25, on Zulip):

awesome :simple_smile:

Frank McSherry (May 17 2018 at 20:26, on Zulip):

pushed

lqd (May 17 2018 at 20:28, on Zulip):

@Frank McSherry thanks very much! I'll do the timelyopt conversion tomorrow, I'm really looking forward to seeing those numbers as well

Frank McSherry (May 17 2018 at 20:33, on Zulip):

No worries @lqd. Fun to do something helpful and relatively understandable.

nikomatsakis (May 18 2018 at 00:31, on Zulip):

btw I was poking at your datafrog branch @lqd — seems like we don't need to manually specify the types of all variables, just

let subset_r1p = iteration1.variable("subset_r1p");

will suffice. (I was curious why those types couldn't be inferred. Turns out, they can.)

lqd (May 18 2018 at 00:32, on Zulip):

nice

lqd (May 18 2018 at 00:32, on Zulip):

I found them helpful to debug :simple_smile: ofc they were initially written by Frank

lqd (May 18 2018 at 00:33, on Zulip):

also helpful for the "mechanized translations" from the datalog rules, but very verbose indeed

nikomatsakis (May 18 2018 at 00:34, on Zulip):

yeah I can imagine they're handy anyway — but if we were going to e.g. generate this code from a macro, it's nice to not need them.

lqd (May 18 2018 at 00:34, on Zulip):

absolutely

lqd (May 18 2018 at 00:36, on Zulip):

I'll clean them up tomorrow :simple_smile:

nikomatsakis (May 18 2018 at 00:37, on Zulip):

I've finally got a spare moment to read into datafrog

nikomatsakis (May 18 2018 at 00:37, on Zulip):

very nice

lqd (May 18 2018 at 00:39, on Zulip):

feels very promising

lqd (May 18 2018 at 00:41, on Zulip):

also wonder whether rayon could fit in there

nikomatsakis (May 18 2018 at 00:41, on Zulip):

if nothing else one could use the par_sort methods

nikomatsakis (May 18 2018 at 00:41, on Zulip):

but maybe elsewhere too...

nikomatsakis (May 18 2018 at 00:41, on Zulip):

...it's sort of tempting to rewrite to not use Rc<RefCell<..>> etc

nikomatsakis (May 18 2018 at 00:41, on Zulip):

i.e., make each Variable an index and have Iteration hold all the data for them

nikomatsakis (May 18 2018 at 00:42, on Zulip):

then you would write stuff like iteration.map(target, source, closure) or whatever

nikomatsakis (May 18 2018 at 00:42, on Zulip):

instead of target.from_map(source, closure)

nikomatsakis (May 18 2018 at 00:42, on Zulip):

totally irrelevant of course

nikomatsakis (May 18 2018 at 00:42, on Zulip):

though maybe relevant if you were going to use Rayon

nikomatsakis (May 18 2018 at 00:42, on Zulip):

depending

lqd (May 18 2018 at 00:44, on Zulip):

oh interesting

nikomatsakis (May 18 2018 at 00:44, on Zulip):

anyway the real question is what will perf look like with the 'timely-opt' variant I guess

lqd (May 18 2018 at 00:44, on Zulip):

yup

nikomatsakis (May 18 2018 at 00:44, on Zulip):

but it seems like the naive was faster

nikomatsakis (May 18 2018 at 00:44, on Zulip):

(if I read your msg correctly)

lqd (May 18 2018 at 00:44, on Zulip):

it was indeed

nikomatsakis (May 18 2018 at 00:44, on Zulip):

though not necessarily faster than w/ distinct_total

nikomatsakis (May 18 2018 at 00:44, on Zulip):

(I'd be curious to see that comparison)

nikomatsakis (May 18 2018 at 00:44, on Zulip):

that is, naive w/ distinct_total

nikomatsakis (May 18 2018 at 00:45, on Zulip):

I never measured that :)

lqd (May 18 2018 at 00:45, on Zulip):

could be surprising :simple_smile:

nikomatsakis (May 18 2018 at 00:45, on Zulip):

one might (I don't know?) expect a similar ratio?

nikomatsakis (May 18 2018 at 00:45, on Zulip):

I guess I really have no idea

nikomatsakis (May 18 2018 at 00:45, on Zulip):

just gotta measure and see

lqd (May 18 2018 at 00:46, on Zulip):

I remember Jamie's Brandon doing some datafrog work one could say and mentioning exactly that, how important predictable performance was to them

lqd (May 18 2018 at 00:47, on Zulip):

I'll try this in the morning

nikomatsakis (May 18 2018 at 01:14, on Zulip):

Oh just realized that as each variable is of distinct type you can't use indices

qmx (May 18 2018 at 01:40, on Zulip):

can't you use a composite index? (type,id)

Frank McSherry (May 18 2018 at 07:28, on Zulip):

quick comments (heading out to the Apple store soon!):

The Rc and RefCells are optional; they are there to allow a binding of the "read" and "write" halves of each variable, and to allow the Iteration struct to manage all of its variables. If we wanted to break that and insist that users manage both halves and manually called .changed() on each pair, we could avoid them I think. It is perhaps a bit more error-prone, though, in that I could imagine people adding new variables to an existing program and forgetting to step them, which would effectively make them non-recursive (fixed at whatever their initial values are). Could easily complain in Drop if they clearly haven't been stepped completely, but I'm not sure of the right trade-off of "bullet-proofing" vs "simple Send types".

lqd (May 18 2018 at 07:50, on Zulip):

@Reed Koser thank you that's great!

nikomatsakis (May 18 2018 at 09:32, on Zulip):

@Frank McSherry re: the Rc, not a big thing. I think the current solution works out quite elegantly.

lqd (May 18 2018 at 09:34, on Zulip):

@nikomatsakis you were wondering yesterday, naive DD as is vs naive DD w/ distinct_total comes at 142s vs 118s

nikomatsakis (May 18 2018 at 09:34, on Zulip):

encouraging

lqd (May 18 2018 at 09:36, on Zulip):

naive DF being 52s

Frank McSherry (May 18 2018 at 13:02, on Zulip):

Can probably speed that up a bit, too. Each of the non-arranged collections (e.g. subset, requires, without a _xyz suffix) can probably be removed and we just use from_map to change one of them to the others.

lqd (May 18 2018 at 13:13, on Zulip):

yeah I was also wondering how to use the same indexed inputs between iterations (eg region_live_at and cfg_edge)

lqd (May 18 2018 at 13:16, on Zulip):

(I was removing some redundant reindexing between iterations and noticed those)

nikomatsakis (May 18 2018 at 13:40, on Zulip):

I was entertaining the idea of making a macro to produce the programs in question

nikomatsakis (May 18 2018 at 13:41, on Zulip):

but I decided to "sleep in" instead (until 5:30am :)

nikomatsakis (May 18 2018 at 13:41, on Zulip):

kind of glad I didn't though since it sounds like they don't yet reflect "best practice" :)

nikomatsakis (May 18 2018 at 13:41, on Zulip):

I don't quite yet see how "arranged" things are reflected.. are those a distinct kind of variable?

lqd (May 18 2018 at 13:44, on Zulip):

I was thinking it was through the "indices" variables (but then again I'm not extremely competent with any of this :clown_face: )

lqd (May 18 2018 at 13:46, on Zulip):

sorry for being slow with the frogopt translation, I'm trying to internalize and come up the mechanized translations rules that would come naturally to the both of you

nikomatsakis (May 18 2018 at 13:49, on Zulip):

I was thinking it was through the "indices" variables (but then again I'm not extremely competent with any of this :clown_face: )

such modesty :)

nikomatsakis (May 18 2018 at 13:49, on Zulip):

I didn't look that closely, apparently, or at least I didn't notice variables named indices

nikomatsakis (May 18 2018 at 13:49, on Zulip):

I'll look again later...

nikomatsakis (May 18 2018 at 13:50, on Zulip):

I'm mostly hoping you two will just solve all our problems :)

lqd (May 18 2018 at 13:51, on Zulip):

they're not named indices but suffixed with the key

lqd (May 18 2018 at 13:51, on Zulip):

eg subset_r1p

nikomatsakis (May 18 2018 at 13:53, on Zulip):

ah those

nikomatsakis (May 18 2018 at 13:53, on Zulip):

ok

lqd (May 18 2018 at 13:53, on Zulip):

I could be very wrong !!

Frank McSherry (May 18 2018 at 14:04, on Zulip):

@lqd You should be able to pretty cheaply re-use the indices. If you .complete() the variables, you get a sorted Vec<_> out, and if you insert that in a new Variable it should be pretty cheap: the list will be re-sorted, but that should just be a linear scan if the right sort algorithm is used (sort does this, sort_unstable might).

Frank McSherry (May 18 2018 at 14:05, on Zulip):

I can demo the "remove subset and just use subset_r1p" thing if you'd like. It's not too magical, and probably just a snippet should make it clear.

lqd (May 18 2018 at 14:05, on Zulip):

@Frank McSherry oh true I didn't think of this, indeed

lqd (May 18 2018 at 14:05, on Zulip):

I think I did this, lemme get you a gist

lqd (May 18 2018 at 14:06, on Zulip):

https://gist.github.com/lqd/2fbd14646f1404d2222a7d823ed4f1bd

Frank McSherry (May 18 2018 at 14:06, on Zulip):

E.g., where right now we have:

    subset_r1p.from_map(&subset, |&(r1,r2,p)| ((r1,p),r2));
    subset_r2p.from_map(&subset, |&(r1,r2,p)| ((r2,p),r1));
    subset_p.from_map(&subset, |&(r1,r2,p)| (p,(r1,r2)));

we can just skip subset and do

    // subset_r1p is now the thing we build up
    subset_r2p.from_map(&subset_r1p, |&((r1,p),r2)| ((r2,p),r1));
    subset_p.from_map(&subset_r1p, |&((r1,p),r2)| (p,(r1,r2)));
lqd (May 18 2018 at 14:07, on Zulip):

oh no so it's not what I was talking about, (and I think you actually already did this) using the indexed relations from iteration to the next when possible

lqd (May 18 2018 at 14:08, on Zulip):

very nice, thank you. I'll try to see if I can add to frogopt when I've progressed enough

Frank McSherry (May 18 2018 at 14:09, on Zulip):

Sorry, two things here:

1. Capturing subset_r1p from the first iteration (subset) and using it in the second iteration (requires). You are doing this in the fragment, which is great.
2. Not having a subset or a requires, and only keeping indexed lists.

lqd (May 18 2018 at 14:11, on Zulip):

this will help as well indeed, they are pretty big on clap

lqd (May 18 2018 at 14:11, on Zulip):

I like that there's a lot of potential to improve

lqd (May 18 2018 at 14:25, on Zulip):

@Frank McSherry such indexed lists would be where in DD we'd use "arranged" lists right ?

Frank McSherry (May 18 2018 at 14:25, on Zulip):

Erm, sure, except so much simpler here.

Frank McSherry (May 18 2018 at 14:26, on Zulip):

A Relation's into method sorts and deduplicates, which isn't free, but if the list is already sorted it is really. cheap.

Frank McSherry (May 18 2018 at 14:27, on Zulip):

The spirit is the same (cheap to re-use) but operationally they are pretty different. The arranged re-use is more like when we use a Variable multiple times in the same query.

lqd (May 18 2018 at 14:28, on Zulip):

:thumbs_up:

Frank McSherry (May 18 2018 at 14:33, on Zulip):

Btw, if you want any help at any point just holler. I'm done with my work for the week and have power again. :) Resting for a bit before heading out to explore Vienna.

lqd (May 18 2018 at 14:43, on Zulip):

I'm just at the point I think I've translated the 1st join lol

lqd (May 18 2018 at 14:44, on Zulip):

I was a bit unsure what the granularity was to cut the queries into different Iterations

lqd (May 18 2018 at 14:44, on Zulip):

(and I had the same problem knowing when to use scopes with DD)

lqd (May 18 2018 at 14:47, on Zulip):

esp since this timelyopt conversion has a more complicated graph than the naive one (and which you did 99% of)

Frank McSherry (May 18 2018 at 14:55, on Zulip):

I think in principle you can just put everything in one iteration, if you want.

Frank McSherry (May 18 2018 at 14:56, on Zulip):

The only reason you "need" multiple ones is when you want to use antijoin. Otherwise, all of the relations contribute to each other positively, and can just co-develop. Nothing especially good happens by completing one of them first and then doing the next (perhaps lower peak memory utilization?).

Frank McSherry (May 18 2018 at 14:57, on Zulip):

when you want to use antijoin.

meaning, when you want to derive something iteratively, and then use it as the right-hand side of an antijoin.

nikomatsakis (May 18 2018 at 14:58, on Zulip):

yeah I was gonna say that our use of antijoin should be fine

nikomatsakis (May 18 2018 at 14:58, on Zulip):

with one iteration loop

lqd (May 18 2018 at 15:00, on Zulip):

oh very interesting, and rn we're only doing antijoins of inputs

Frank McSherry (May 18 2018 at 15:01, on Zulip):

Right, all the rules are monotonic (positively so) with respect to all variables other than killed, which is static over the rounds of iteration.

nikomatsakis (May 18 2018 at 15:01, on Zulip):

right. usually datalog people talk about "stratifying" the programs in layers here when it comes to negation, basically — that is, if you do an antijoin of a non-input relation R, you have to have a "layer" such that R and all of its dependencies are completed first

lqd (May 18 2018 at 15:04, on Zulip):

thanks to the both of you :)

lqd (May 18 2018 at 15:05, on Zulip):

btw @Frank McSherry have you seen a sketch of what the "timelyopt" dl looked like ? something like https://gist.github.com/lqd/88f6eb5dd75e3850d2abc1242dbcea4a (some things are missing/different but giving the same answers on the specific test datasets)

Frank McSherry (May 18 2018 at 15:09, on Zulip):

I have seen that; I think @nikomatsakis has an even newer version though, right? E.g. stuff like

    // dead_region_requires(R, B, P, Q) :-
    //   requires(R, B, P),
    //   !killed(B, P),
    //   cfg_edge(P, Q),
    //   !region_live_at(R, Q).
lqd (May 18 2018 at 15:09, on Zulip):

I tihnk it's the same except universal_regions

Frank McSherry (May 18 2018 at 15:10, on Zulip):

I'm up for implementing any random program, but don't want to grab that away from you. :)

Frank McSherry (May 18 2018 at 15:11, on Zulip):

Yeah current timely_opt.rs has all sorts of heinous subset/requires co-development, and lots of dead_can_reach_live horror story plotlines going on in it.

lqd (May 18 2018 at 15:11, on Zulip):

(eg in soufflé I wasn't easily able to have facts containing the universal region points, so I used the rules until then; not a problem with DD or data-:frog:)

nikomatsakis (May 18 2018 at 15:11, on Zulip):

I believe that Polonius master has the latest version of the timely-opt computation

lqd (May 18 2018 at 15:12, on Zulip):

@Frank McSherry I'd gladly take help but won't take you away from visiting Vienna

nikomatsakis (May 18 2018 at 15:12, on Zulip):

Yeah current timely_opt.rs has all sorts of heinous subset/requires co-development, and lots of dead_can_reach_live horror story plotlines going on in it.

( I think this can all be one big iteration though )

lqd (May 18 2018 at 15:12, on Zulip):

oh I didn't see the latest revision then :/

Frank McSherry (May 18 2018 at 15:13, on Zulip):

this is me looking at Niko's "detransitize" branch; maybe not the current one (no clue what "polonius" is; some sort of Bond villian org codename probably).

nikomatsakis (May 18 2018 at 15:13, on Zulip):

https://github.com/rust-lang-nursery/polonius/

nikomatsakis (May 18 2018 at 15:13, on Zulip):

it's the new name for the borrow-check repo :)

nikomatsakis (May 18 2018 at 15:14, on Zulip):

Now! With extra fun Shakespeare references!

nikomatsakis (May 18 2018 at 15:14, on Zulip):

anyway, this is the most up to date

nikomatsakis (May 18 2018 at 15:14, on Zulip):

it is probably the same as the detransitize branch

Frank McSherry (May 18 2018 at 15:16, on Zulip):

Okies, I may go and try to hack on that for a bit. Maybe grab a glass of wein on the Danube (in Wien) and type a bit taking in the evening.

lqd (May 18 2018 at 15:18, on Zulip):

I myself am verifying I correctly did like the 3rd clause ... live_to_dead_regions

nikomatsakis (May 18 2018 at 15:18, on Zulip):

/me is having a lot of fun watching this unfold

lqd (May 18 2018 at 15:18, on Zulip):

(slowly getting the hang of it)

lqd (May 18 2018 at 15:19, on Zulip):

no pressure :p

nikomatsakis (May 18 2018 at 15:19, on Zulip):

![popcorn](http://i0.kym-cdn.com/photos/images/newsfeed/000/895/845/2f9.jpg)

lqd (May 18 2018 at 15:23, on Zulip):

yesss, 1 done, 1 million to go

Frank McSherry (May 18 2018 at 15:26, on Zulip):

Random observation from before, probably still applies: in the clap example there were several thousand rounds of derivations, which makes me think there is lots of cfg crawling going on. If there is still a plan to collapse down indistinguishable regions, that could probably still have a positive effect!

lqd (May 18 2018 at 15:26, on Zulip):

there are plans indeed :D

lqd (May 18 2018 at 15:27, on Zulip):

https://github.com/rust-lang-nursery/polonius/issues/20

Frank McSherry (May 18 2018 at 15:28, on Zulip):

Neat! We can probably do it in Datalog too, fwiw (connected components of edges are indistinguishable if their application is unimpeded by any of the region_live_at transitions, and killed, I think).

Frank McSherry (May 18 2018 at 15:29, on Zulip):

sneaking off for a bit; will report back later!

nikomatsakis (May 18 2018 at 15:48, on Zulip):

Neat! We can probably do it in Datalog too, fwiw (connected components of edges are indistinguishable if their application is unimpeded by any of the region_live_at transitions, and killed, I think).

that was my plan

nikomatsakis (May 18 2018 at 15:48, on Zulip):

but we don't have all the pieces we need for that yet

nikomatsakis (May 18 2018 at 15:48, on Zulip):

almost

nikomatsakis (May 18 2018 at 15:48, on Zulip):

anyway, I'd like to keep that in our back pocket :)

nikomatsakis (May 18 2018 at 15:48, on Zulip):

it seems like we've got a decent chance of getting the perf we need without

nikomatsakis (May 18 2018 at 15:48, on Zulip):

in which case it's just gravy :)

qmx (May 18 2018 at 15:54, on Zulip):

<insert homer mmm gravy meme here>

Frank McSherry (May 18 2018 at 15:57, on Zulip):

One thing I don't currently like about datafrog is that the variable declarations happen so far from their use. Hard to write rules without lots of scrolling around. =/

Frank McSherry (May 18 2018 at 15:58, on Zulip):

Btw, sitting on the river, having some dark czech beer, enjoying the weather. just fyi about life choices.

lqd (May 18 2018 at 16:09, on Zulip):

I'm writing each mecanized step as well so I have 100 lines between one operator call and the var decl :D

lqd (May 18 2018 at 16:09, on Zulip):

fwiw I'm in sunny south of france frank :p

lqd (May 18 2018 at 16:10, on Zulip):

(not far from where Prolog was invented)

Frank McSherry (May 18 2018 at 16:12, on Zulip):

How's boston, @nikomatsakis ?

qmx (May 18 2018 at 16:13, on Zulip):

/me can't complain about sunny Florida

Frank McSherry (May 18 2018 at 16:14, on Zulip):

3/4 :D

lqd (May 18 2018 at 16:15, on Zulip):

:D

Frank McSherry (May 18 2018 at 16:21, on Zulip):

ok, polonius version "code complete"!

Frank McSherry (May 18 2018 at 16:21, on Zulip):

(meaning: typed a lot, now time to see what happens when I cargo check it.)

lqd (May 18 2018 at 16:23, on Zulip):

I had barely finished dead_region_requires ...

Frank McSherry (May 18 2018 at 16:26, on Zulip):

I'm trying out a weird programming style. Write all the rules w/o declaring anything, see what you need, then go and define the things.

lqd (May 18 2018 at 16:27, on Zulip):

while I spend 90% of my time scrolling

Frank McSherry (May 18 2018 at 16:29, on Zulip):

Yeah, I did that for a while and then thought "I have so much beer in my belly, why don't I just write the program first will surely work out".

lqd (May 18 2018 at 16:31, on Zulip):

:beer: -driven development

Frank McSherry (May 18 2018 at 16:40, on Zulip):

Rapidly approaching Ballmer peak.

Frank McSherry (May 18 2018 at 16:42, on Zulip):

it's running...

Frank McSherry (May 18 2018 at 16:43, on Zulip):

super fast.

lqd (May 18 2018 at 16:43, on Zulip):

awesome

Frank McSherry (May 18 2018 at 16:43, on Zulip):
     Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults' --skip-tuples`
Duration { secs: 0, nanos: 523194318 }  borrow_live_at is complete: 0
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults
Time: 0.549s
Echidnatron%
Frank McSherry (May 18 2018 at 16:43, on Zulip):

sec.

lqd (May 18 2018 at 16:44, on Zulip):

:)

Frank McSherry (May 18 2018 at 16:45, on Zulip):

debugging this is going to be rough. I forgot to load requires with borrow_region, but still not right yet:

     Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults' --skip-tuples`
Duration { secs: 2, nanos: 122990259 }  borrow_live_at is complete: 391740
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults
Time: 2.174s
Echidnatron%
Frank McSherry (May 18 2018 at 16:46, on Zulip):

only off by about 2x, right? >.>

lqd (May 18 2018 at 16:47, on Zulip):

:D

lqd (May 18 2018 at 16:47, on Zulip):

yeah debugging was a bit rough for me even on the simpler naive program

lqd (May 18 2018 at 16:47, on Zulip):

that's why I went step by step comparing to DD, but that requires a 2.5mins compilation per clause ....

Frank McSherry (May 18 2018 at 16:49, on Zulip):
warning: unused variable: `dead_region_requires`
  --> src/output/datafrog.rs:47:13
   |
47 |         let dead_region_requires = iteration.variable::<(Region, Loan, Point, Point)>("dead_region_requires");
   |             ^^^^^^^^^^^^^^^^^^^^ help: consider using `_dead_region_requires` instead
Frank McSherry (May 18 2018 at 16:49, on Zulip):

Hint #1

lqd (May 18 2018 at 16:49, on Zulip):

(mostly to side step ungrounded vars w/ soufflé)

lqd (May 18 2018 at 16:49, on Zulip):

this compiler is helpful!

lqd (May 18 2018 at 16:51, on Zulip):

I'm coming up to dead_can_reach -- I'm loving theses clauses btw, it's like a Walking Dead homage, dead_can_reach_origins the prequel, dead_can_reach_live the Broadway musical

Frank McSherry (May 18 2018 at 16:52, on Zulip):

fwiw: https://gist.github.com/frankmcsherry/adb9ed3c433eb9f4ddaf591ad9888dea

Frank McSherry (May 18 2018 at 16:52, on Zulip):

I had it as "dead can reach live? oh no! run live, run!"

lqd (May 18 2018 at 16:55, on Zulip):

mine is 100 comments per join :sob: https://gist.github.com/lqd/e6bcc6bde2b0fe9a14e3f2fa1b57904c

Frank McSherry (May 18 2018 at 16:55, on Zulip):

One thing I'm not sure of is whether I grokked Niko's DD code. Embarrassingly, it is apparently hard to read... :)

lqd (May 18 2018 at 16:56, on Zulip):

tbf the beers might be part of the reason why :D

nikomatsakis (May 18 2018 at 16:57, on Zulip):

@Frank McSherry beautiful right now :) springtime is the best ...

nikomatsakis (May 18 2018 at 16:59, on Zulip):

One thing I'm not sure of is whether I grokked Niko's DD code. Embarrassingly, it is apparently hard to read... :)

uh oh =)

nikomatsakis (May 18 2018 at 17:00, on Zulip):

I tried to keep datalog comments because — I confess — find the DD by itself kind of "write only" (but quite mechanical and easy to write once you have datalog)

nikomatsakis (May 18 2018 at 17:00, on Zulip):

maybe that's not enough though

lqd (May 18 2018 at 17:00, on Zulip):

(Frank BTW, any time it's not fun debugging anymore vs enjoying Vienna, I can also "help")

Frank McSherry (May 18 2018 at 17:00, on Zulip):

Well, all the beer is drunk, so I'm afraid I must go.

Frank McSherry (May 18 2018 at 17:01, on Zulip):

I'll just leave this:

Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults' --skip-tuples`
Duration { secs: 16, nanos: 743834090 } borrow_live_at is complete: 832392
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults
Time: 16.839s
Echidnatron%
lqd (May 18 2018 at 17:01, on Zulip):

he's already debugged it

nikomatsakis (May 18 2018 at 17:01, on Zulip):

is that the right number of tuples? :)

lqd (May 18 2018 at 17:01, on Zulip):

yeah

nikomatsakis (May 18 2018 at 17:02, on Zulip):

seems to be approx the same time, which is interesting

nikomatsakis (May 18 2018 at 17:02, on Zulip):

oh well I guess

nikomatsakis (May 18 2018 at 17:02, on Zulip):

I am comparing two different computers ;)

lqd (May 18 2018 at 17:05, on Zulip):

we might need the cfg compression after all, fiddle with different sorting methods (even without rayon's par sort), try and reduce alloc and temporaries

nikomatsakis (May 18 2018 at 17:06, on Zulip):

I'm not worried yet :)

nikomatsakis (May 18 2018 at 17:06, on Zulip):

between CFG compression + the location-insensitive-pre-filter I think we'll make big wins

lqd (May 18 2018 at 17:08, on Zulip):

even if it had exactly DD like performance (not thinking about ease of writing, which we could macro) it might be still worth it for the compile time win, easier maintenance, good optimization potential, etc

nikomatsakis (May 18 2018 at 17:11, on Zulip):

oh, definitely!

nikomatsakis (May 18 2018 at 17:11, on Zulip):

this seems like a much easier story re: integration

nikomatsakis (May 18 2018 at 17:11, on Zulip):

(actually, @pnkfelix, I wonder if it's worth waiting for the datafrog story to play out — that might completely obviate the need to integrate an executable into rustc)

lqd (May 18 2018 at 17:14, on Zulip):

(I'll also convert the pre-pass to datafrog, at a snail's pace evidently)

Frank McSherry (May 18 2018 at 17:14, on Zulip):

Sorry! In actual fact it started raining and the beer-man decided to close up shop, meaning I had to boogie. Obvs not all the beer had been drunk (it's Austria!).

Frank McSherry (May 18 2018 at 17:14, on Zulip):

Gist updated with the correct version.

Frank McSherry (May 18 2018 at 17:16, on Zulip):

Full detail:

    Finished release [optimized] target(s) in 18.62s
     Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults' --skip-tuples`
Duration { secs: 16, nanos: 548798709 } borrow_live_at is complete: 832392
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults
Time: 16.638s
Echidnatron%

means it now runs faster than it takes to compile. Ball is in your court, @nikomatsakis ;)

Frank McSherry (May 18 2018 at 17:19, on Zulip):

Looks like (profiling) almost all the time is spent in deduplication. Probably what's going on is that there are relatively fewer distinct_total calls in the differential code, whereas datafrog is doing a distinct for every variable (incl intermediates, etc).

nikomatsakis (May 18 2018 at 17:20, on Zulip):

that makes sense; I did spend some time thinking about where duplicates could be introduced

nikomatsakis (May 18 2018 at 17:20, on Zulip):

I suppose we could make that explicit in datafrog?

nikomatsakis (May 18 2018 at 17:21, on Zulip):

ps I really like the model of "manually" compiling to a nice set of primitives

nikomatsakis (May 18 2018 at 17:21, on Zulip):

that is, I like us having the ability to map it back to datalog, but also not being reliant on some automated optimizer

Frank McSherry (May 18 2018 at 17:21, on Zulip):

for perf reasons probably makes sense. could probably just have a bool with each variable that is "do distinct", at which point it is on the programmer to make sure they have distincts set up to converge properly (no cycles w/o distincts).

nikomatsakis (May 18 2018 at 17:21, on Zulip):

(to apply domain knowledge of this kind)

lqd (May 18 2018 at 17:24, on Zulip):

on my machine, timely 11s vs :frog: 15s -- manual distincts would be :thumbs_up:

Frank McSherry (May 18 2018 at 17:25, on Zulip):

Manual distincts:

Running `target/release/borrow-check -a Datafrog 'inputs/clap-rs/app-parser-{{impl}}-add_defaults' --skip-tuples`
Duration { secs: 9, nanos: 640609767 }  borrow_live_at is complete: 832392
--------------------------------------------------
Directory: inputs/clap-rs/app-parser-{{impl}}-add_defaults
Time: 9.729s
Echidnatron%
nikomatsakis (May 18 2018 at 17:26, on Zulip):

on my machine, timely 11s vs :frog: 15s -- manual distincts would be :thumbs_up:

I think you mean: :clock1: 11s vs :frog: 15s

nikomatsakis (May 18 2018 at 17:27, on Zulip):

also i confess I find data-:frog: hard to read and prefer just :frog:

nikomatsakis (May 18 2018 at 17:27, on Zulip):

emoji-word hybrids blow my mind

nikomatsakis (May 18 2018 at 17:27, on Zulip):

Now that is the nit to end all nits.

lqd (May 18 2018 at 17:29, on Zulip):

(if you'll allow me, I just want to say that Niko, Frank -- and also Jamie who's not doing Rust anymore -- are 3 of my favorite technical writers/people/etc in the world, so I'm extremely happy rn)

Frank McSherry (May 18 2018 at 17:29, on Zulip):

Jamie is still doing Rust I think...

Frank McSherry (May 18 2018 at 17:29, on Zulip):

Just, not as publicly. >.>

lqd (May 18 2018 at 17:30, on Zulip):

:) very Julia rn, think he did a talk extremely recently on Imp

lqd (May 18 2018 at 17:32, on Zulip):

compilation staging seemed key to this work

Frank McSherry (May 18 2018 at 17:33, on Zulip):

Profiling on the manual distinct runs, btw:

Frank McSherry (May 18 2018 at 17:33, on Zulip):

Screen-Shot-2018-05-18-at-19.32.49.png

nikomatsakis (May 18 2018 at 17:33, on Zulip):

I don't know how to interpret that :)

nikomatsakis (May 18 2018 at 17:33, on Zulip):

except that "shuffling data" is a big part...

lqd (May 18 2018 at 17:34, on Zulip):

@Frank McSherry do you think we could process some of the tuples creation for each loop iteration, in parallel (and if it would help) as we have the base relations and the deltas and can join them I think independently + union the 3 subresults

Frank McSherry (May 18 2018 at 17:34, on Zulip):

The sorting and such does a bit of memmove action. I can probably tidy some of it up, but it's probably somewhat intrinsic to the approach.

Frank McSherry (May 18 2018 at 17:35, on Zulip):

Yes, that could totally work. All of the update steps do read-only access to their inputs, populate a vec, and then append that vec to a "todo list" for each variable.

lqd (May 18 2018 at 17:35, on Zulip):

(and rayon's par_iter and par_sort)

Frank McSherry (May 18 2018 at 17:35, on Zulip):

Although, looking at the profiling, not so much time is spend in join_into and antijoin_into, so perhaps it would be better to parallelize the remaining distincts.

Frank McSherry (May 18 2018 at 17:37, on Zulip):

I'm sure there are plenty more optimizations too. Each of the named relations is in un-indexed form, and then indexed a few different ways. That should mean we can remove at least one instance of each named relation, which should do something positive probably. >.>

Frank McSherry (May 18 2018 at 17:38, on Zulip):

Random other things too: the way I merge two sorted lists is to append one on to the other and call sort(), which .. you might have opinions about.

nikomatsakis (May 18 2018 at 17:38, on Zulip):

what is the current :timer_clock: vs :frog: results?

nikomatsakis (May 18 2018 at 17:38, on Zulip):

with 'less dedup' I mean

lqd (May 18 2018 at 17:39, on Zulip):

(in my recent tests of your previous ideas removing some of those bigger relations resulted in visible wins, albeit on the Naive prog / :frog:)

Frank McSherry (May 18 2018 at 17:39, on Zulip):

It's a bit hard to say; lemme push the manual distinct repo, and then gist my code.

Frank McSherry (May 18 2018 at 17:41, on Zulip):

Manual distinct pushed (new method: variable_indistinct()) and gist updated with code that uses it.

Frank McSherry (May 18 2018 at 17:42, on Zulip):

But, perf seems good enough that next step is probably collapsing CFG. ;)

lqd (May 18 2018 at 17:43, on Zulip):

:timer_clock: 11s vs :frog: 8s

lqd (May 18 2018 at 17:43, on Zulip):

(on this beefy machine)

nikomatsakis (May 18 2018 at 17:43, on Zulip):

yeah, this is great!

nikomatsakis (May 18 2018 at 17:44, on Zulip):

25% win, rounding down a bit

lqd (May 18 2018 at 17:44, on Zulip):

:frog: compiles in 12s here vs :timer_clock: 2.5mins

nikomatsakis (May 18 2018 at 17:44, on Zulip):

and..there is that...

lqd (May 18 2018 at 17:44, on Zulip):

:3

nikomatsakis (May 18 2018 at 17:45, on Zulip):

we should discuss how to transition the repo

nikomatsakis (May 18 2018 at 17:45, on Zulip):

a bit tricky with e.g. @Santiago Pastorino's in-flight changes but I think those are simple enough we can comment them out and re-do

lqd (May 18 2018 at 17:45, on Zulip):

@Frank McSherry <3 (you comin to rustfest paris next week BTW ?)

Frank McSherry (May 18 2018 at 17:46, on Zulip):

Negative @lqd. That sounds like it might have been a good plan, but didn't think that far ahead.

Frank McSherry (May 18 2018 at 17:46, on Zulip):

I'll have a proxy there, though. @Andrea Lattuada should be showing up.

lqd (May 18 2018 at 17:47, on Zulip):

@nikomatsakis we can let Santiago land his work while I convert the pre-pass, then convert the difference, and then you can decide ?

nikomatsakis (May 18 2018 at 17:47, on Zulip):

sounds fine

lqd (May 18 2018 at 17:47, on Zulip):

or is that too .. late ?

nikomatsakis (May 18 2018 at 17:47, on Zulip):

there's no great hurry— but I would say that if we can reduce the build time from minutes to seconds

nikomatsakis (May 18 2018 at 17:48, on Zulip):

it will help push further iteration along :)

nikomatsakis (May 18 2018 at 17:48, on Zulip):

I'd prioritize that personally

nikomatsakis (May 18 2018 at 17:48, on Zulip):

anyway I think @Santiago Pastorino is busy today

lqd (May 18 2018 at 17:48, on Zulip):

maybe then land these 2 :frog: sooner ?

nikomatsakis (May 18 2018 at 17:49, on Zulip):

if you want to prep a PR, I'd merge it

nikomatsakis (May 18 2018 at 17:49, on Zulip):

in any case I think we should consider reframing the 'insensitive' thing not as its own analysis

nikomatsakis (May 18 2018 at 17:49, on Zulip):

but as a refinement to timely-opt

nikomatsakis (May 18 2018 at 17:49, on Zulip):

ths might be an opportunity to do that reshuffling :)

lqd (May 18 2018 at 17:49, on Zulip):

alright I'll do this between this evening (about to grab dinner) and tomorrow

nikomatsakis (May 18 2018 at 17:49, on Zulip):

the only thing I would say is:

nikomatsakis (May 18 2018 at 17:50, on Zulip):

do we want to think about macros to make it easier to write?

nikomatsakis (May 18 2018 at 17:50, on Zulip):

I guess that can prob wait

nikomatsakis (May 18 2018 at 17:50, on Zulip):

particularly as we've already ported it

lqd (May 18 2018 at 17:51, on Zulip):

:thumbs_up:

lqd (May 18 2018 at 17:51, on Zulip):

(heading off to dinner, bbl)

Frank McSherry (May 18 2018 at 17:52, on Zulip):

The writing was pretty formulaic, btw. But not obviously macro-able. Working through each join and naming each relation I thought I'd need, then seeing what those were (in rustc error messages) and prepping each of them as from_map of their source relation.

nikomatsakis (May 18 2018 at 17:53, on Zulip):

yeah I think it's better to wait until we're all done, sort of, and then see if we can add macros or what

nikomatsakis (May 18 2018 at 17:53, on Zulip):

I have a feeling it will evolve further before we're done

nikomatsakis (May 18 2018 at 17:53, on Zulip):

no need to bake things in yet

qmx (May 18 2018 at 17:54, on Zulip):

@nikomatsakis IMO yea, macros very last

qmx (May 18 2018 at 17:54, on Zulip):

and I'm really really interested into how this will be done

Frank McSherry (May 18 2018 at 18:06, on Zulip):

Btw, just added a few more variable_indistinct calls, doing less distinct work, and things actually slowed down a bit (presumably because the non-distinct relations made more work for others). So, might be a bit of exploration to do here.

nikomatsakis (May 18 2018 at 18:08, on Zulip):

ah so you were allowing some duplication to slip in basically?

nikomatsakis (May 18 2018 at 18:08, on Zulip):

(I tried to add distinct_total in places I thought might add duplications, but I may have left some because there was a distinct coming later, can't remember)

nikomatsakis (May 18 2018 at 18:08, on Zulip):

(and I could have just been wrong)

Frank McSherry (May 18 2018 at 18:11, on Zulip):

I tried removing all distincts except for a few that make all recursive definitions distinct (subset, requires, and dead_can_reach). So, fewer "unnecessary" distincts, but it seems like they may occasionally help performance.

nikomatsakis (May 18 2018 at 18:13, on Zulip):

interesting

Frank McSherry (May 18 2018 at 18:24, on Zulip):

fwiw, :frog: is taking 5,764 rounds to converge, and I think a lot of the distinct work is essentially looking up relatively few facts in larger base relations (vs many facts at once, derived over fewer rounds).

Frank McSherry (May 18 2018 at 18:25, on Zulip):

If we had a "large step" cfg relation, even without collapsing down indistinguishable components, it should probably speed things up just due to the way that doing distincts in bulk is more efficient than doing them drawn out over multiple rounds.

nikomatsakis (May 18 2018 at 18:25, on Zulip):

what do you mean by "large step"? sort of "reachable"?

nikomatsakis (May 18 2018 at 18:26, on Zulip):

like, transitive clsoure? (or some subset of it?)

Frank McSherry (May 18 2018 at 18:26, on Zulip):

well, as opposed to "small step" (random PL vocabulary sneaking back in).

nikomatsakis (May 18 2018 at 18:26, on Zulip):

oh well that clears everything up

nikomatsakis (May 18 2018 at 18:26, on Zulip):

;)

Frank McSherry (May 18 2018 at 18:27, on Zulip):

If we took all the steps that were legit to take at once (i.e. to all points that are not kills, nor at which there is a change in the region_live_at relation).

nikomatsakis (May 18 2018 at 18:27, on Zulip):

yeah I have to think about what that means

nikomatsakis (May 18 2018 at 18:27, on Zulip):

that is, you can't just compute transitive closure of CFG

nikomatsakis (May 18 2018 at 18:27, on Zulip):

but definitely you can imagine taking multiple steps

nikomatsakis (May 18 2018 at 18:28, on Zulip):

on a related note I've wondered if we can aggregate the effects of (e.g.) a basic block

nikomatsakis (May 18 2018 at 18:28, on Zulip):

and then compute at the boundaries of basic blocks

nikomatsakis (May 18 2018 at 18:28, on Zulip):

(as is often done...)

Frank McSherry (May 18 2018 at 18:28, on Zulip):

E.g. we could put together a relation Equiv(P,Q) which is CFG(P,Q) minus any pairs for which RLA(R,P), !RLA(R,Q) or any Killed(R,P) or Killed(R,Q).

nikomatsakis (May 18 2018 at 18:28, on Zulip):

right

nikomatsakis (May 18 2018 at 18:30, on Zulip):

that makes sense

Frank McSherry (May 18 2018 at 18:30, on Zulip):

In principle we could collapse down everything to some representative of Equiv, but even skipping that and just applying all the rules at once (rather than over .. ~100 rounds or whatever) would do something solid.

Frank McSherry (May 18 2018 at 18:47, on Zulip):

Fwiw, if you impl the rules

        // InEquiv(P,Q) :- CFG(P,Q), Killed(B,P).
        // InEquiv(P,Q) :- CFG(P,Q), RLA(R,P), !RLA(R,Q).

then there are 6,253 inequiv edges out of 51,896 cfg edges.

lqd (May 18 2018 at 19:01, on Zulip):

we could also process — gross oversimplification — 4 tuples at a time and have the operators use SIMD

lqd (May 18 2018 at 19:02, on Zulip):

(maybe ? sometimes ?)

Frank McSherry (May 18 2018 at 19:02, on Zulip):

The EmptyHeaded folks use it to do efficient galloping, but .. I think it involves some aggressive pre-computation for the relations.

Frank McSherry (May 18 2018 at 19:03, on Zulip):

Not exactly multiple tuples at a time, but searching for multiple matches for one tuple at a time.

Frank McSherry (May 18 2018 at 19:04, on Zulip):

Lots to do, but I think with those equiv vs cfg numbers, there is a 10x speed-up waiting if one consolidates indistinguishable components of the cfg.

lqd (May 18 2018 at 19:04, on Zulip):

I also remember LegoBase (I think) wrt to merging of operators and the likes

lqd (May 18 2018 at 19:04, on Zulip):

ofc they staged queries but we know them ahead of time

lqd (May 18 2018 at 19:05, on Zulip):

agreed, the cfg looks like a good avenue

lqd (May 18 2018 at 19:07, on Zulip):

maybe could also try automatically different orders for the rules "evaluation"

lqd (May 18 2018 at 19:08, on Zulip):

(but that may overfitting to clap in particular, who knows)

lqd (May 18 2018 at 19:09, on Zulip):

as you said reducing the millions of tuples flying about will surely always be the best bet :)

Frank McSherry (May 18 2018 at 19:15, on Zulip):

fwiw, breakdown of total tuples produced (in the 9.5s or so):

FINAL: "region_live_at_p"   1076158
FINAL: "inequiv"    1154468
FINAL: "killed_p"   980
FINAL: "cfg_edge_p" 51896
FINAL: "region_live_at" 1076158
FINAL: "dead_can_reach_live_r1pq"   38684
FINAL: "dead_can_reach_live"    38684
FINAL: "dead_can_reach_r2q" 194748
FINAL: "dead_can_reach_1"   156064
FINAL: "dead_can_reach" 194748
FINAL: "dead_can_reach_origins" 43222
FINAL: "dead_region_requires_rpq"   50555
FINAL: "dead_region_requires_2" 907164
FINAL: "dead_region_requires_1" 858650
FINAL: "dead_region_requires"   50555
FINAL: "live_to_dead_regions_r2pq"  38098
FINAL: "live_to_dead_regions_p" 38098
FINAL: "live_to_dead_regions_2" 2719027
FINAL: "live_to_dead_regions_1" 2931089
FINAL: "live_to_dead_regions"   38098
FINAL: "requires_rp"    858674
FINAL: "requires_bp"    858674
FINAL: "requires_2" 859328
FINAL: "requires_1" 858650
FINAL: "requires"   858674
FINAL: "subset_p"   2714121
FINAL: "subset_r2p" 2714121
FINAL: "subset_r1p" 2714121
FINAL: "subset_2"   2661662
FINAL: "subset_1"   2785151
FINAL: "subset" 2714121
FINAL: "subset" 2714121
FINAL: "subset_1"   2785151
FINAL: "subset_2"   2661662
FINAL: "subset_r1p" 2714121
FINAL: "subset_r2p" 2714121
FINAL: "subset_p"   2714121
FINAL: "requires"   858674
FINAL: "requires_1" 858650
FINAL: "requires_2" 859328
FINAL: "requires_bp"    858674
FINAL: "requires_rp"    858674
FINAL: "borrow_live_at" 0
FINAL: "live_to_dead_regions"   38098
FINAL: "live_to_dead_regions_1" 2931089
FINAL: "live_to_dead_regions_2" 2719027
FINAL: "live_to_dead_regions_p" 38098
FINAL: "live_to_dead_regions_r2pq"  38098
FINAL: "dead_region_requires"   50555
FINAL: "dead_region_requires_1" 858650
FINAL: "dead_region_requires_2" 907164
FINAL: "dead_region_requires_rpq"   50555
FINAL: "dead_can_reach_origins" 43222
FINAL: "dead_can_reach" 194748
FINAL: "dead_can_reach_1"   156064
FINAL: "dead_can_reach_r2q" 194748
FINAL: "dead_can_reach_live"    38684
FINAL: "dead_can_reach_live_r1pq"   38684
FINAL: "region_live_at" 1076158
FINAL: "cfg_edge_p" 51896

NB: each relation seems to be dropped twice. Not sure what is up with that.

Frank McSherry (May 18 2018 at 19:17, on Zulip):

Oh, the boxed versions getting dropped (the ones Iteration holds on to).

Frank McSherry (May 18 2018 at 19:20, on Zulip):

Ooo, optimization:

        // live_to_dead_regions(R1, R2, P, Q) :-
        //   subset(R1, R2, P),
        //   cfg_edge(P, Q),
        //   region_live_at(R1, Q),
        //   !region_live_at(R2, Q).

should be

        // live_to_dead_regions(R1, R2, P, Q) :-
        //   subset(R1, R2, P),
        //   cfg_edge(P, Q),
        //   !region_live_at(R2, Q).
        //   region_live_at(R1, Q),
lqd (May 18 2018 at 19:20, on Zulip):

should we try and get all antijoins as soon as possible ?

lqd (May 18 2018 at 19:21, on Zulip):

the earliest we filter the better

Frank McSherry (May 18 2018 at 19:21, on Zulip):

Not sure, but looking at the sizes up above, there is a 2.9M -> 2.7M change with the semijoin, and then a 2.7M -> 38K change with the antijoin. Not sure if it is general, though.

Frank McSherry (May 18 2018 at 19:22, on Zulip):

Both the antijoin and the semijoin filter, so it isn't as obvious as that unfortunately. (I made the mistake of thinking it was when I posted "Ooo, optimization").

Andrea Lattuada (May 18 2018 at 19:22, on Zulip):

:wave: Hi folks! Frank’s proxy here :) looking forward to RustFest, and happy to chat about Differential Dataflow-related things!

lqd (May 18 2018 at 19:22, on Zulip):

true, so did I

Frank McSherry (May 18 2018 at 19:23, on Zulip):

But in this case I think that joining subset and cfg first might be wrong (should probably join cfg and rla, then semijoin against subset).

Frank McSherry (May 18 2018 at 19:24, on Zulip):

Heyo @Andrea Lattuada !

Frank McSherry (May 18 2018 at 19:31, on Zulip):

Pondering whether I should just go to Paris anyhow and see what is going on...

lqd (May 18 2018 at 19:32, on Zulip):

tickets were sold out, but I saw them talking about last minute tickets

lqd (May 18 2018 at 19:34, on Zulip):

(I think Felix is also part of the organization :thinking_face:)

Frank McSherry (May 18 2018 at 19:34, on Zulip):

Was mostly just thinking about going to Paris. >.>

lqd (May 18 2018 at 19:35, on Zulip):

lol

nikomatsakis (May 18 2018 at 19:38, on Zulip):

Was mostly just thinking about going to Paris. >.>

I agree with this reasoning

lqd (May 18 2018 at 19:38, on Zulip):

@nikomatsakis do you want me to replace naive.rs with the :frog: naive, and in general remove DD usage ? what to do about location_insensitive, want me to translate with the PR ?

lqd (May 18 2018 at 19:39, on Zulip):

or later

nikomatsakis (May 18 2018 at 19:39, on Zulip):

@nikomatsakis do you want me to replace naive.rs with the :frog: naive, and in general remove DD usage ?

yes

nikomatsakis (May 18 2018 at 19:39, on Zulip):

re: location-insensitve, i'd be ok with commenting it out for now

nikomatsakis (May 18 2018 at 19:39, on Zulip):

translating it would be a good exercise

nikomatsakis (May 18 2018 at 19:39, on Zulip):

for someone else to get familiar :)

lqd (May 18 2018 at 19:41, on Zulip):

@Frank McSherry do you want me to git commit with your GH email as the author ? since you are, you know, the author

Frank McSherry (May 18 2018 at 19:42, on Zulip):

Oh, however you like @lqd. I'm happy to donate authorship of that stuff. Whichever way is least likely to cause problems down the road (if the commits are serious about authorship and don't like screwing around, then probably yes).

lqd (May 18 2018 at 19:43, on Zulip):

I wouldn't know myself, maybe @nikomatsakis can chime in ?

nikomatsakis (May 18 2018 at 19:43, on Zulip):

I don't think it matters either way

nikomatsakis (May 18 2018 at 19:44, on Zulip):

in general we state that, if you open a PR, you are claiming willingness to release that code under the license

lqd (May 18 2018 at 19:44, on Zulip):

maybe I just integrate :frog: with our naive version and then frank opens a PR for the opt version ?

nikomatsakis (May 18 2018 at 19:45, on Zulip):

either wfm

nikomatsakis (May 18 2018 at 19:45, on Zulip):

@Frank McSherry may not want to :)

lqd (May 18 2018 at 19:45, on Zulip):

:)

Frank McSherry (May 18 2018 at 19:45, on Zulip):

oh geez, yeah that means I have to figure out how to PR at you all without deleting everything else in your repo.

nikomatsakis (May 18 2018 at 19:45, on Zulip):

heh that was sort of the reaction I expected

nikomatsakis (May 18 2018 at 19:45, on Zulip):

I say @lqd just do it all and credit @Frank McSherry in the comment

Frank McSherry (May 18 2018 at 19:46, on Zulip):

"hrm, looks like someone reverted all of those PartialOrd and Ord changes from several versions back..."

lqd (May 18 2018 at 20:01, on Zulip):

@Frank McSherry is the gist from a couple hours back the version I should add ?

Frank McSherry (May 18 2018 at 20:05, on Zulip):

probably, lemme check. the one just post-variable_indistinct is good.

lqd (May 18 2018 at 20:06, on Zulip):

yeah https://gist.github.com/frankmcsherry/adb9ed3c433eb9f4ddaf591ad9888dea is right with the indistincts

Frank McSherry (May 18 2018 at 20:07, on Zulip):

seems right to me.

Frank McSherry (May 18 2018 at 20:07, on Zulip):

I tidied it a bit (no need to have a scope to get bla) but it's essentially the right thing for now.

Frank McSherry (May 18 2018 at 20:07, on Zulip):

I suspect it will continue to evolve a bit.

lqd (May 18 2018 at 20:07, on Zulip):

so without the antijoins ordering, or removing/adding some distincts you were mentioning

lqd (May 18 2018 at 20:08, on Zulip):

yeah and niko really wants to switch away from blas ;)

Frank McSherry (May 18 2018 at 20:08, on Zulip):

no, haven't tried out the antijoin order change, and the other distincts were small order changes (0.1s-ish).

lqd (May 18 2018 at 20:08, on Zulip):

alright, thank you

Frank McSherry (May 18 2018 at 20:09, on Zulip):

No worries, thanks for guiding all the work around!

lqd (May 18 2018 at 20:10, on Zulip):

you did all the work! :p

lqd (May 18 2018 at 20:23, on Zulip):

I'm loving the 20x reduction in compile time :3

lqd (May 18 2018 at 20:46, on Zulip):

/me fixes his "Datafro" typos

lqd (May 18 2018 at 21:06, on Zulip):

@nikomatsakis unless I made some git mistakes, it might be time to :frog: https://github.com/rust-lang-nursery/polonius/pull/36

nikomatsakis (May 18 2018 at 21:07, on Zulip):

love the PR title :)

lqd (May 18 2018 at 21:07, on Zulip):

:)

Frank McSherry (May 18 2018 at 21:07, on Zulip):

:frog:

Frank McSherry (May 18 2018 at 21:08, on Zulip):

Kinda thinking Niko gets 50% authorship too, since I copy/pasted all of his comments.

lqd (May 18 2018 at 21:08, on Zulip):

I did 50% of those copy/pastes frank

nikomatsakis (May 18 2018 at 21:09, on Zulip):

I think I'll double-check by hand it produces the same tuples, then merge

nikomatsakis (May 18 2018 at 21:09, on Zulip):

though I should really read the core carefully at some point

lqd (May 18 2018 at 21:09, on Zulip):

(kidding ofc, but I actually did integrate more of the timelyopt comments so we don't lose them)

Frank McSherry (May 18 2018 at 21:09, on Zulip):

def take a peek; I unsafe cast a few u32s to usizes, but pretty sure it's all good

Frank McSherry (May 18 2018 at 21:10, on Zulip):

(( i don't do anything like that, but since the datafrog crate is under my control, .. muaaahaaha. ))

lqd (May 18 2018 at 21:10, on Zulip):

agreed wrt to manual checking first

nikomatsakis (May 18 2018 at 21:10, on Zulip):

yeah, that'll just have to wait till later in the day

nikomatsakis (May 18 2018 at 21:10, on Zulip):

anyway nice job y'all!

lqd (May 18 2018 at 21:12, on Zulip):

it will be nice to have :timer_clock: vs :frog: numbers from your machine as well

nikomatsakis (May 18 2018 at 21:13, on Zulip):

ping @pnkfelix — this branch may change the integration story

nikomatsakis (May 18 2018 at 21:14, on Zulip):

@lqd doing a local build now

lqd (May 18 2018 at 21:14, on Zulip):

I also used the word bespoke as I know you like it

nikomatsakis (May 18 2018 at 21:14, on Zulip):

lol

nikomatsakis (May 18 2018 at 21:14, on Zulip):

I actually picked it up from @Frank McSherry ;)

nikomatsakis (May 18 2018 at 21:14, on Zulip):

I think I recall him talking about "bespoke union find analyses" or something

lqd (May 18 2018 at 21:14, on Zulip):

it all makes sense

nikomatsakis (May 18 2018 at 21:15, on Zulip):

:heavy_large_circle:

lqd (May 18 2018 at 21:15, on Zulip):

(seriously tho, CHR might be good for bespoke union find (ofc Chalk comes to mind))

lqd (May 18 2018 at 21:17, on Zulip):

(I'll redirect most talks about this to eternaleye, who would precisely explain why, better than I could — this goes for this topic and many others ;)

nikomatsakis (May 18 2018 at 21:18, on Zulip):

"CHR, what else"

lqd (May 18 2018 at 21:40, on Zulip):

if all goes well, after merging we should also create an issue for the location-insensitive :frog: conversion, I left some comments about that in the PR as well

lqd (May 18 2018 at 23:18, on Zulip):

@nikomatsakis pushed another cleanup commit — measurements are variable on my laptop, but it's a 5% max improvement

lqd (May 18 2018 at 23:22, on Zulip):

I also suspect mw, zoxc, or njn will be able to improve all this :)

nikomatsakis (May 18 2018 at 23:39, on Zulip):

had a chance to test it locally:

Version Compile Time Run Time
:timer_clock: 210s 14.191s
:frog: 11s 8s
nikomatsakis (May 18 2018 at 23:40, on Zulip):

I'd...say we have a winner :)

nikomatsakis (May 18 2018 at 23:40, on Zulip):

/me afk

lqd (May 18 2018 at 23:47, on Zulip):

we rarely saw results with more than one :timer_clock: worker AFAICT ?

nikomatsakis (May 19 2018 at 00:19, on Zulip):

With threads (8 core machine):

Version Workers Compile Time Run Time
:timer_clock: 1 210s 14.191s
:timer_clock: 2 210s 11.035s
:timer_clock: 4 210s 8.898s
:timer_clock: 8 210s 7.953s
:frog: 1 11s 8s
lqd (May 19 2018 at 00:23, on Zulip):

oh awesome thank you :)

Frank McSherry (May 19 2018 at 07:34, on Zulip):

Comment on the PR, @lqd. Just noticed another 1.5s we can shave off by further _indistinct() use (for some reason I forgot to do it to all of the requires_ variants).

lqd (May 19 2018 at 08:11, on Zulip):

oh very nice :simple_smile:

lqd (May 19 2018 at 08:15, on Zulip):

I also tried "compressed", with the equiv / inequiv rules. As you mentioned, it can likely be a big win to use that successfully

lqd (May 19 2018 at 08:45, on Zulip):

it is on everyone's mind :smiley: https://twitter.com/martinkl/status/997500551746158593

lqd (May 19 2018 at 08:47, on Zulip):

which reminds me we could also look at Eve's Rust runtime, since they supposedly did just that

lqd (May 19 2018 at 08:51, on Zulip):

(Expressing CRDTs using Datalog :thinking_face:)

lqd (May 19 2018 at 09:07, on Zulip):

I also pushed the requires_* indistincts, 10-15% gain for free

nikomatsakis (May 19 2018 at 09:11, on Zulip):

Comment on the PR, @lqd. Just noticed another 1.5s we can shave off by further _indistinct() use (for some reason I forgot to do it to all of the requires_ variants).

taking that into account, I get:

With threads (8 core machine):

Version Workers Compile Time Run Time
:timer_clock: 1 210s 14.191s
:timer_clock: 2 210s 11.035s
:timer_clock: 4 210s 8.898s
:timer_clock: 8 210s 7.953s
:frog: 1 11s 7.154s
lqd (May 19 2018 at 09:30, on Zulip):

nice :)

nikomatsakis (May 19 2018 at 09:32, on Zulip):

I'm making a few tweaks as I go; just got another 5% win incidentally =)

lqd (May 19 2018 at 09:33, on Zulip):

if we could "uncompress/unpack" from the InEquiv points, it would be :thumbs_up: — unless I'm horribly mistaken, computing these points is very fast, using these edges instead of the full cfg is very fast, but don't know yet how to get to the original results back (but that would make it < 2s on clap)

nikomatsakis (May 19 2018 at 09:33, on Zulip):

(so down to 6.8s on my machine)

nikomatsakis (May 19 2018 at 09:34, on Zulip):

I've not yet gotten to that point, still going over the relations bit by bit, but yes that was my intuition when filing #20

lqd (May 19 2018 at 09:34, on Zulip):

("compressing" is <1s on clap on my slow laptop)

lqd (May 19 2018 at 09:37, on Zulip):

hacky, and full of clones, but here's the InEquiv computation https://gist.github.com/lqd/743abae3913bc3785a955b749289ca03

nikomatsakis (May 19 2018 at 09:38, on Zulip):

lovin' the faster compile times

lqd (May 19 2018 at 09:39, on Zulip):

oh yeah your 5% win, a bit like adapting timelyopt's previous "micro-optimization arrangements" :thumbs_up: there were a couple of those before

nikomatsakis (May 19 2018 at 09:43, on Zulip):

yeah, although I tried in one other cases and it was not a win

nikomatsakis (May 19 2018 at 09:43, on Zulip):

I'm not really sure why

nikomatsakis (May 19 2018 at 09:44, on Zulip):

that is, removing dead_can_reach_r2q went from 6.8s to 6.9s

nikomatsakis (May 19 2018 at 09:44, on Zulip):

(almost sounds like noise, but it was quite consistent)

nikomatsakis (May 19 2018 at 09:44, on Zulip):

at this point though I would say

nikomatsakis (May 19 2018 at 09:44, on Zulip):

we need more tests

nikomatsakis (May 19 2018 at 09:44, on Zulip):

that is, our "benchmark suite" is rather limited :)

nikomatsakis (May 19 2018 at 09:44, on Zulip):

anyway, I read through the datafrog-opt code now fairly closely

nikomatsakis (May 19 2018 at 09:44, on Zulip):

looks great!

nikomatsakis (May 19 2018 at 09:46, on Zulip):

just gonna run rustfmt then merge

lqd (May 19 2018 at 09:46, on Zulip):

yay

nikomatsakis (May 19 2018 at 09:46, on Zulip):

insert appropriate celebratory emojis here

nikomatsakis (May 19 2018 at 09:47, on Zulip):

:cake:

lqd (May 19 2018 at 09:47, on Zulip):

:tada: :clap: :handshake: :thumbs_up: :confetti_ball:

lqd (May 19 2018 at 09:47, on Zulip):

and of course :frog:

nikomatsakis (May 19 2018 at 09:49, on Zulip):

ha yes

nikomatsakis (May 19 2018 at 09:49, on Zulip):

it'd be nice to move from a git dependency to a crates.io dep

lqd (May 19 2018 at 09:49, on Zulip):

yup naive has 3 iterations but doesn't really need it

nikomatsakis (May 19 2018 at 09:49, on Zulip):

but I don't know if the latest datafrog is fully published there?

nikomatsakis (May 19 2018 at 09:49, on Zulip):

anyway not a big thing

lqd (May 19 2018 at 09:50, on Zulip):

and because of that the indices were not shared like they are in opt (plus I didn't know I could collapse them into one at the time, but we can do that if you want)

lqd (May 19 2018 at 09:50, on Zulip):

I don't think frank has published it on crates.io

nikomatsakis (May 19 2018 at 09:53, on Zulip):

and because of that the indices were not shared like they are in opt (plus I didn't know I could collapse them into one at the time, but we can do that if you want)

I'd like to keep naive as "obviously correct" as we can, but actually one iteration is somehow more "obvious" than multiple, so perhaps it's worth it.

nikomatsakis (May 19 2018 at 09:53, on Zulip):

that is, in "the datalog source", we don't have multiple iterations

lqd (May 19 2018 at 09:54, on Zulip):

true

lqd (May 19 2018 at 09:54, on Zulip):

and it'd probably be more idiomatic to actually share those indices

lqd (May 19 2018 at 09:54, on Zulip):

more datafroggy

lqd (May 19 2018 at 09:55, on Zulip):

I can take care of it whenever, but it would also be a "good first issue" for others, get into the API, with an executable test with known and comparable results

nikomatsakis (May 19 2018 at 09:57, on Zulip):

true — can you maybe file an issue around it?

lqd (May 19 2018 at 09:57, on Zulip):

will do

lqd (May 19 2018 at 09:57, on Zulip):

and there also needs a similar one for location-insensitive work

nikomatsakis (May 19 2018 at 09:58, on Zulip):

yep

nikomatsakis (May 19 2018 at 10:00, on Zulip):

I wanted to write something like that last night in any case — I want to think about whether the location-insensitive work ought to be done as a 'separate analysis' or as part of the datafrog-opt analysis. (maybe we should just clone datafrog-opt and layer the location-insensitive in there, but that might not make sense either.)

nikomatsakis (May 19 2018 at 10:01, on Zulip):

I probably won't get much more time to poke at this during the weekend, but we can pick it up on monday I suppose

nikomatsakis (May 19 2018 at 10:01, on Zulip):

I guess to start it makes sense to port the location-insensitive analysis as it was

lqd (May 19 2018 at 10:05, on Zulip):

maybe others will get to it before then, otherwise yeah I can do both this weekend, or get back to it after that

nikomatsakis (May 19 2018 at 10:08, on Zulip):

either way; it seems fine to me to let e.g. @Santiago Pastorino or others do some of it, to get a feel for how :frog: works

lqd (May 19 2018 at 10:12, on Zulip):

I can't add labels but here's the first issue https://github.com/rust-lang-nursery/polonius/issues/37

lqd (May 19 2018 at 10:13, on Zulip):

do we need a new issue for the location-insensitive port or would I mention it on https://github.com/rust-lang-nursery/polonius/issues/29 ?

nikomatsakis (May 19 2018 at 10:14, on Zulip):

I'd say just mention it in the existing issue

lqd (May 19 2018 at 10:23, on Zulip):

alright done, https://github.com/rust-lang-nursery/polonius/issues/29#issuecomment-390395464

Frank McSherry (May 19 2018 at 10:24, on Zulip):

I'm up for pushing at crates.io whenever. I just pushed a version with a perf improvement. Nothing for borrow_check, but another example (graspan1.rs) got about 2x out of it. It .. lead me to something I'm not sure I know the right idiom for:

The Relation type has a From implementation for I: IntoIterator, which is pretty general but not actually the right implementation for Vec<Tuple>, because it drains the iterator and collects it into a new allocation. It seems like you'd also want a From for Vec<Tuple>, perhaps using specialization (?). For now, there is a from_vec() method, and using that gives some wins for me in the other project (much larger inputs relative to derived facts).

Frank McSherry (May 19 2018 at 10:25, on Zulip):

I've also prepped a blog post about the engine, meant to explain things from the ground up. Much editing to do, but here is a peek: https://github.com/frankmcsherry/blog/blob/master/posts/2018-05-19.md

lqd (May 19 2018 at 10:25, on Zulip):

I was seeing a number of mergeswith empty relations, removing those out didn't make much a difference I think, but could be wrong

Frank McSherry (May 19 2018 at 10:26, on Zulip):

If any of @lqd , @nikomatsakis , @qmx do not want to be ack'd in it, please let me know and I will not.

Frank McSherry (May 19 2018 at 10:27, on Zulip):

Oh, interesting. Probably insert isn't checking for non-emptiness before pushing an empty vector on the list. Lemme add that just so that things are tasteful (it should be almost no cost, as merge() is cheap to merge an empty list).

lqd (May 19 2018 at 10:27, on Zulip):

(I don't mind being praised in public by my idols, but don't feel obligated to do it :p — joking, of course :)

nikomatsakis (May 19 2018 at 10:29, on Zulip):

It seems like you'd also want a From for Vec<Tuple>, perhaps using specialization (?). For now, there is a from_vec() method, and using that gives some wins for me in the other project (much larger inputs relative to derived facts).

Yeah this seems like a classic case for specialization

Frank McSherry (May 19 2018 at 15:21, on Zulip):

Random further thoughts (I'll poke at things with time, but want to write them down): in several cases we have e.g.

        let subset_r1p = iteration.variable_indistinct("subset_r1p");
        let subset_r2p = iteration.variable_indistinct("subset_r2p");
        let subset_p = iteration.variable_indistinct("subset_p");

where if the tuples are ordered p_r1_r2 say, then the same sorted lists work for _r1p and _p. We just have to swing around the order to _pr1 and then describe a trait ProvidesKeyVal or something that allows one to dance around using either key (p or pr1).

Frank McSherry (May 19 2018 at 15:29, on Zulip):

Hrm. Maybe more complicated than I imagine, due to the difficulty of going from a &(X, Y, Z) to a &(X, Y).

lqd (May 19 2018 at 15:56, on Zulip):

@Frank McSherry possibly stupid question, should we actually have (some?) inputs as indistinct variables ?

Frank McSherry (May 19 2018 at 15:57, on Zulip):

Totally could, but I don't think anything good happens. The indistinctness prevents comparisons of new data against existing data, but this doesn't happen for the input collections.

lqd (May 19 2018 at 15:58, on Zulip):

oh only affecting tuple creations

lqd (May 19 2018 at 16:01, on Zulip):

don't those variables pass through the regular mechanism at the beginning though ?

lqd (May 19 2018 at 16:02, on Zulip):

eg fill one the Variable inputs (not Relations) and it'll transition from the different tuple storage during the .changed() rounds

lqd (May 19 2018 at 16:04, on Zulip):

(esp for some of the bigger inputs — if what I'm saying makes sense)

lqd (May 19 2018 at 16:10, on Zulip):

ofc even if it did gain a bit, it'd be more worthwhile to go for the 10x cfg equiv route

Frank McSherry (May 19 2018 at 16:13, on Zulip):

They do pass through the reg mechanisms, but it's all meant to be zero cost for the first relation that comes in. That is, a relation is sorted and distinct, so you only do any work once you have two of them. The transition from to_add to recent to stable is just a pointer move, until you have conflicting relations.

lqd (May 19 2018 at 16:13, on Zulip):

(trying indistinct out on rla and cfg_edge_pseemed to make a bit of improvement)

Frank McSherry (May 19 2018 at 16:13, on Zulip):

Hrm. Interesting that it did anything! It's possible I've botched the logic somewhere.

lqd (May 19 2018 at 16:14, on Zulip):

could be noise as well :3 (but it seemed to have faster rounds than usual) just another thing we can note

lqd (May 19 2018 at 16:14, on Zulip):

oh

Frank McSherry (May 19 2018 at 16:15, on Zulip):

Hrm. Yeah, double checking the insert and changed logic, the intent is that nothing should happen if there is just one relation that comes in.

lqd (May 19 2018 at 16:15, on Zulip):

you might also have made those improvements earlier today, while the rev polonius used is from yesterday

Frank McSherry (May 19 2018 at 16:16, on Zulip):

Some improvements landed, not sure. The ones I know landed seemed to have zero impact on polonius, but helped me out on some other analyses.

lqd (May 19 2018 at 16:16, on Zulip):

I'll try with everything uptodate, but it's probably not that important anyway :)

Frank McSherry (May 19 2018 at 16:17, on Zulip):

These thing can matter, if the logic is dodgy. :) The 2x improvement I got in non-polonius compute was adding a From implementation for Vec<Tuple> rather than using the IntoIterator one (which drains the vec and then collects it).

lqd (May 19 2018 at 16:19, on Zulip):

cough even without the 2x it'd be faster than the ASPLOS paper cough

lqd (May 19 2018 at 16:36, on Zulip):

the indistinct could be noise for the couple % over yesterday's rev

lqd (May 19 2018 at 16:37, on Zulip):

I did try just updating to today's and seemed overall still a wee bit faster than yersterday's anyway (that is without any other change than the new rev)

Frank McSherry (May 19 2018 at 16:42, on Zulip):

Cool; I can take a peek. It certainly shouldn't hurt, I just don't understand why it helps either. :)

lqd (May 19 2018 at 16:43, on Zulip):

it doesn't really anymore so it was surely just a fluke of this high variance machine :/

lqd (May 19 2018 at 16:43, on Zulip):

which makes perfect sense that it wouldn't :)

Reed Koser (May 19 2018 at 17:17, on Zulip):

Minor nit on your blog post @Frank McSherry , I think datalog :- for bindings rather than <-. Otherwise, very impressive work =D

nikomatsakis (May 19 2018 at 18:04, on Zulip):

(I think there are competing conventions)

Santiago Pastorino (May 19 2018 at 18:12, on Zulip):

either way; it seems fine to me to let e.g. @Santiago Pastorino or others do some of it, to get a feel for how :frog: works

what needs to be done?

Santiago Pastorino (May 19 2018 at 18:12, on Zulip):

sorry, ETOOMANYMESSAGES

Santiago Pastorino (May 19 2018 at 18:12, on Zulip):

didn't read

Santiago Pastorino (May 19 2018 at 18:12, on Zulip):

tried to read some and figured that a lot of stuff was already using a context which I lack

Santiago Pastorino (May 19 2018 at 18:13, on Zulip):

probably you have been talking since a while about some stuff and I have no clue :P

Santiago Pastorino (May 19 2018 at 18:13, on Zulip):

just figured that you changed datalog with datafrog, I still need to see what's that about

Reed Koser (May 19 2018 at 18:14, on Zulip):

classic... Sorry Frank

nikomatsakis (May 19 2018 at 18:15, on Zulip):

I'm gonna spin off a new topic @Santiago Pastorino for just this :)

Frank McSherry (May 20 2018 at 06:24, on Zulip):

More random optimization thoughts (not sure how best to effect, without substantial changes): queries like

    // subset(R1, R2, Q) :-
    //   subset(R1, R2, P) :-
    //   cfg_edge(P, Q),
    //   region_live_at(R1, Q),
    //   region_live_at(R2, Q).

require a few collections that we maintain (subsets _1 and _2). In fact, once we get going we don't need these collections, because none of the other relations change, so no one actually consults them at all. We need the tuples to be sorted and batched, but not maintained (distinctly or not). So the random merging and sorting of lists isn't really a big deal.

Frank McSherry (May 20 2018 at 06:44, on Zulip):

subset_r2p looks dead in datafrog_opt; can anyone confirm that? I get a small bump by deleting it.

nikomatsakis (May 20 2018 at 09:31, on Zulip):

@Frank McSherry I don't quite understand the distinction you are drawing between "sorted/batched" and "maintained"

lqd (May 20 2018 at 09:45, on Zulip):

(subset_r2p is unused indeed)

lqd (May 20 2018 at 09:49, on Zulip):

(so is live_to_dead_regions_p)

lqd (May 20 2018 at 09:55, on Zulip):

maybe the distinction is that we could need less data, as being only used as temporaries, only the round's recent batch is effectively used, while :frog: will still assume we might want to .complete() it one day ?

nikomatsakis (May 20 2018 at 10:02, on Zulip):

it's subtle. But I can see something along those lines, perhaps specifically because they are not distinct?

nikomatsakis (May 20 2018 at 10:02, on Zulip):

or maybe I'm misremembering

nikomatsakis (May 20 2018 at 10:03, on Zulip):

anyway I'm still having my morning coffee =) not fully awake yet

lqd (May 20 2018 at 10:07, on Zulip):

(tbf so am I, but it's noon :p)

nikomatsakis (May 20 2018 at 10:09, on Zulip):

PS a PR removing subset_r2p and live_to_dead_regions_p would not be unwelcome :)

lqd (May 20 2018 at 10:10, on Zulip):

on it (and updating to frog's latest rev)

nikomatsakis (May 20 2018 at 10:19, on Zulip):

it's subtle. But I can see something along those lines, perhaps specifically because they are not distinct?

to elaborate on what I mean: I think @lqd you were suggesting that those variables don't need a "stable"? But I thought it was still used to sometimes catch duplicates — but I guess the point is that if there is some new tuple in subset, it will always be new in subset_1 (which is subset join cfg_edge). That... it true here, I guess, and distinct has nothing to do with it.

(But I suppose that if the closure given to from_join were to drop some fields, it might not be true, in which case distinct would be important, right?)

lqd (May 20 2018 at 10:22, on Zulip):

I thought it could be that — but I'm no Frank :D — and for this very specific case of indexing, where we wouldn't/shouldn't drop fields to get this benefit ?

lqd (May 20 2018 at 10:23, on Zulip):

PR is up https://github.com/rust-lang-nursery/polonius/pull/40 (I haven't ran it a lot for a before/after, but the numbers seem lower than before, on this slow/high variance machine)

nikomatsakis (May 20 2018 at 10:26, on Zulip):

yeah, I see some slight win, although the numbers are a bit jumpier today than yesterday for some weird reason

Frank McSherry (May 20 2018 at 10:32, on Zulip):

Good news (esp for @lqd ): I implemented leapfrog triejoin.

Frank McSherry (May 20 2018 at 10:32, on Zulip):

Or, leapfrog join, let's say. No tries involved yet.

lqd (May 20 2018 at 10:32, on Zulip):

:D

Frank McSherry (May 20 2018 at 10:33, on Zulip):

I've got a few changes in place, but they are about a 1.5s improvement over what I had before. Still poking at things though.

Frank McSherry (May 20 2018 at 10:34, on Zulip):

Main gist is that when doing multiple joins to extend just one attribute, you can do it all at once and save yourself the pain of intermediate materialization:

            // subset(R1, R2, Q) :-
            //   subset(R1, R2, P) :-
            //   cfg_edge(P, Q),
            //   region_live_at(R1, Q),
            //   region_live_at(R2, Q).
            //
            // Carry `R1 <= R2` from P into Q if both `R1` and
            // `R2` are live in Q.
            use datafrog::leapfrog::{LeapFrog, Wrapper, leapfrog_into};
            let mut wrapper1 = Wrapper::from(&cfg_edge_rel,       |&(p,(_r1,_r2))| p);
            let mut wrapper2 = Wrapper::from(&region_live_at_rel, |&(_p,(r1,_r2))| r1);
            let mut wrapper3 = Wrapper::from(&region_live_at_rel, |&(_p,(_r1,r2))| r2);
            let mut leapers: [&mut LeapFrog<_,_>; 3] = [&mut wrapper1, &mut wrapper2, &mut wrapper3];
            leapfrog_into(&subset_p, &mut leapers, &subset_p, |&(_p,(r1,r2)),&q| (q,(r1,r2)));
lqd (May 20 2018 at 10:34, on Zulip):

this work has both scientific and comedic value

Frank McSherry (May 20 2018 at 10:35, on Zulip):

You just need to indicate for each of the participating relations how they should search for a tuple prefix, which clues them in on how to propose/reject possible extension values.

lqd (May 20 2018 at 10:36, on Zulip):

Leaper::from

Frank McSherry (May 20 2018 at 10:37, on Zulip):

Interesting. We'll need a name for the antijoin variant (currently AntiWrapper) that leaps around avoiding things. Frogger, perhaps?

Frank McSherry (May 20 2018 at 10:38, on Zulip):

E.g.

            // live_to_dead_regions(R1, R2, P, Q) :-
            //   subset(R1, R2, P),
            //   cfg_edge(P, Q),
            //   region_live_at(R1, Q),
            //   !region_live_at(R2, Q).

            use datafrog::leapfrog::{LeapFrog, Wrapper, AntiWrapper, leapfrog_into};
            let mut wrapper1 = Wrapper::from(&cfg_edge_rel,       |&(p,(_r1,_r2))| p);
            let mut wrapper2 = Wrapper::from(&region_live_at_rel, |&(_p,(r1,_r2))| r1);
            let mut wrapper3 = AntiWrapper::from(&region_live_at_rel, |&(_p,(_r1,r2))| r2);
            let mut leapers: [&mut LeapFrog<_,_>; 3] = [&mut wrapper1, &mut wrapper2, &mut wrapper3];
            leapfrog_into(&subset_p, &mut leapers, &live_to_dead_regions, |&(p,(r1,r2)),&q| (r1,r2,p,q));
lqd (May 20 2018 at 10:38, on Zulip):

(I feel niko wanted to have the bad guy from the muppets movie in there somewhere)

Frank McSherry (May 20 2018 at 10:39, on Zulip):

Doc Hopper?

lqd (May 20 2018 at 10:40, on Zulip):

Constantine ? https://rust-lang.zulipchat.com/#narrow/stream/122657-wg-nll/subject/porting-to-datafrog/near/126803992 (I know very little about the topic sorry :p)

lqd (May 20 2018 at 10:43, on Zulip):

in any case, super good news

Frank McSherry (May 20 2018 at 10:44, on Zulip):

It's all a bit hack-y at the moment. May take a day or two to tidy this up. I don't have a sense for whether there is a time crunch on anything here. Perhaps eventually good to push a crate, once the types settled down a bit.

lqd (May 20 2018 at 10:52, on Zulip):

maybe the concept of parts/steps/hops/leap

let mut cfg_edge_leap = cfg_edge_rel.join_leap(|&(p,(_r1,_r2))| p);
let mut region_live_at_leap = region_live_at_rel.join_leap(&(_p,(r1,_r2))| r1);
let mut region_live_at_anti_leap = region_live_at_rel.antijoin_leap(|&(_p,(_r1,r2))| r2);
let mut leaps = [&mut cfg_edge_leap, &mut region_live_at_leap, &mut region_live_at_anti_leap];
live_to_dead_regions.from_leaps(&subset_p, &mut leaps, |&(p,(r1,r2)),&q| (r1,r2,p,q));
Frank McSherry (May 20 2018 at 11:02, on Zulip):

Also, it looks like requires is only ever used followed by a !killed, which makes me think we could put it as part of the requires derivation. I.e. make requires into requires_but_not_killed or something like that. Save all the antijoins with killed (which is not large, but which currently results in materializations).

Off to explore a bit! Will check back later.

lqd (May 20 2018 at 11:02, on Zulip):

(or ofc extend_from_leaps)

Reed Koser (May 20 2018 at 16:01, on Zulip):

So I've been playing with trying to build a more efficient implementation of Relation::merge that actually zips the two tuple lists instead of just appending them and then doing a sort/dedup

Reed Koser (May 20 2018 at 16:02, on Zulip):

I'm not sure I'll be able to make it faster, but one interesting thing I noticed is that there are a not-insignificant number of cases where one of the tuple lists basically just picks up after the other one leaves off
i.e. elements1[elements1.len() - 1] < elements2[0]

Reed Koser (May 20 2018 at 16:03, on Zulip):

not sure why exactly, but that "fast path" might be worth keeping even if my other stuff doesn't work out

Jake Goulding (May 20 2018 at 16:07, on Zulip):

Good news (esp for @lqd ): I implemented leapfrog triejoin.

clearly this should be triefrog. leaptriefrog?

lqd (May 20 2018 at 16:12, on Zulip):

unfortunately there are no tries ATM, only joins :3 <insert dog logic meme here>

Frank McSherry (May 20 2018 at 16:25, on Zulip):

@Reed Koser That's interesting! (the continuation thing). I was a bit lazy here, relying on the fact that sort() does a merge sort, and so it should i. identify the sorted regions, then ii. do a merge as fast as Rust knows how. In principle this can be sped up (especially when we want to do multiple merges), but it may not be too far off of the "right answer".

Another random observation: the join orders and tuple orders and stuff have implications for sorting. There are times where the tuple layout is different, but the order is the same or very similar (e.g. flipping the last two elements). Paying attention to these things might make sorting less painful (though I didn't pay much attention myself).

Reed Koser (May 20 2018 at 16:30, on Zulip):

:+1: I haven't looked in to how map etc. maintain the sorted invariants. Initial experiments seem to indicate that my zipper merge is a little faster, but I'm currently playing whack-a-mole with system issues (curse you dynamic frequency scaling!) to make sure my benchmarks are testing what I think they are

Frank McSherry (May 20 2018 at 16:31, on Zulip):

Another way to speed up the computation, perhaps, is to observe that all of the Loan computations are independent (there are no rules that change the loan associated with the tuple). This means that one could implement everything as an outer loop over borrows, with the inner component of each loop having a much smaller working set. It's not obviously a good fit for :frog: which prefers to do everything at the same time, but worth keeping in mind (also, perhaps more obvious when one could stop early: a loan requirement can't cross cfg edges where it is killed, independent of the number of regions alive there).

Frank McSherry (May 20 2018 at 16:34, on Zulip):

@Reed Koser Another related optimization: the self.stable field could probably be one Vec<Tuple> which has delineated regions that are sorted. This means that rather than lots of merges, one just removes delineations and sorts (merges) regions. This saves the allocations and memcpys and such, which .. are some fraction of the work that is going on.

Frank McSherry (May 20 2018 at 16:36, on Zulip):

Also, though perhaps this isn't the best time (boarding!) I'd love to get a rundown of the intent of the _opt variants. Clearly the subset and requires they compute are different, and I'd love to know how and why (are there better names for them, for someone starting from the naive implementation?).

Reed Koser (May 20 2018 at 16:37, on Zulip):

I can look in to that once I'm sure my merge is faster
One disadvantage of the approach I'm taking is that it always allocates. I could probably play some clever indexing tricks and shuffle things around in memory to make it work, but I don't think it's possible to do what I want without either allocating or using lots of unsafe

lqd (May 20 2018 at 16:42, on Zulip):

IIRC the opt was started mainly to limit the TC over subset which Frank had warned against doing :)

lqd (May 20 2018 at 16:43, on Zulip):

at first, then also requires(edited with what I actually meant to say)

lqd (May 20 2018 at 16:44, on Zulip):

yes we can see some of the steps Niko took in the PR https://github.com/rust-lang-nursery/polonius/pull/23/commits

Reed Koser (May 20 2018 at 16:45, on Zulip):

yeah, I think eventually we wanted to drive computation of the subset relation from the invalidates facts

Reed Koser (May 20 2018 at 16:46, on Zulip):

I think @Santiago Pastorino 's PRs are the first steps in that direction (https://github.com/rust-lang-nursery/polonius/pull/28 and https://github.com/rust-lang-nursery/polonius/pull/34)

Frank McSherry (May 20 2018 at 16:46, on Zulip):

Other random observations: even in frog-opt, about 0.5M facts in subset are symmetric relation facts, e.g. (p, (r, r)).

Frank McSherry (May 20 2018 at 16:47, on Zulip):

These are probably logically redundant somehow. Trying to map out how many distinct equivalence classes of region there are.

lqd (May 20 2018 at 16:50, on Zulip):

maybe Yannakakis' algorithm could provide us with a simple plan wrt join orders ? eg at the beginning of coding a computation, we'd describe the variables and needed joins, and "something" could give us some of this info, eg iteration.explain_analyze() ;)

Reed Koser (May 20 2018 at 17:31, on Zulip):
datafrog_opt            time:   [1.2379 ms 1.2448 ms 1.2527 ms]
                        change: [-27.409% -23.825% -20.116%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

:ok_hand: :frog:

Reed Koser (May 20 2018 at 17:32, on Zulip):

I /think/ I measured it correctly

Reed Koser (May 20 2018 at 17:39, on Zulip):

https://github.com/frankmcsherry/datafrog/pull/2

Jake Goulding (May 20 2018 at 17:47, on Zulip):

@Reed Koser Dunno if it applies, but could maybe use https://docs.rs/itertools/0.7.8/itertools/trait.Itertools.html#method.merge ?

Reed Koser (May 20 2018 at 17:48, on Zulip):

No, it's not just a simple merge. We also need to de-duplicate at the same time

Reed Koser (May 20 2018 at 17:49, on Zulip):

I mean we don't /need/ to but we can and I think it's faster

Reed Koser (May 20 2018 at 17:49, on Zulip):

Haven't tested it though so who knows

lqd (May 20 2018 at 17:53, on Zulip):

(I had mentioned the empty rels I saw in merge before, but in my limited testing special casing them didn't make a big difference) the zipper merge does improve datafrog_opt on clap indeed, sweet :thumbs_up:

Reed Koser (May 20 2018 at 18:10, on Zulip):

Nice, I'm glad my benchmarking did actually mean something =P

Reed Koser (May 21 2018 at 00:29, on Zulip):

So, I've spent some time profiling polonius

Reed Koser (May 21 2018 at 00:31, on Zulip):

It seems like Relation::merge is still the largest contributor to runtime, followed by gallop (as part of join_core, presumably)

Reed Koser (May 21 2018 at 00:31, on Zulip):

which I guess is unsurprising

Reed Koser (May 21 2018 at 00:31, on Zulip):

I'm trying to get data about which relations/operations are the worst offenders now

Reed Koser (May 21 2018 at 02:16, on Zulip):

pasted image I started making graphs again :wink:

Reed Koser (May 21 2018 at 03:26, on Zulip):

@Frank McSherry did you ever play with using some sort of hash-based join instead of the sort-and-zip approach? I feel like it might help perf since instead of maintaining a bunch of independent "views" of the data (subset_r1p, subset_r2p, etc.), we can keep a single "source of truth" vector and have the joins accelerated by something like HashMap<K, usize> maps. Right now it seems like tuples flow through the system, sometimes with a latency of like 3 rounds, before finally coming back around to impact the same set that kicked off their insertion in the first place. So you get this kind of rippling effect though all of the variables as changes express themselves after some latency. I know we'll have to accept some of that, but from what I've seen in the profiling data the best thing to do to improve perf seems to be reducing the number of variables (and maybe rounds?) that we have to do merges on.

Reed Koser (May 21 2018 at 03:30, on Zulip):

It might end up all coming out in the wash though, since to support my vision of hash-based joins we still need to maintain a O(N) sized dictionary of key->indices plus we would have garbage cache locality on accesses to the datastore in all likelyhood

Reed Koser (May 21 2018 at 03:30, on Zulip):

/me is like 30% of the way through implementing this so I guess we'll see

Reed Koser (May 21 2018 at 05:03, on Zulip):

... and I think that's about as far as I'm going to get. The ownership issues are many, and I'd either need to do a boatload of Rc based pointer chasing or train cars full of copies to get it compiling. Either way memory usage would probably be much higher for very little perf gain. It was educational, if not necessarily a step in the right direction.

Reed Koser (May 21 2018 at 05:05, on Zulip):

and the borrow checker really saved me here, if it was C++ I would have gotten really sad after chasing iterator invalidation bugs forever =)

Frank McSherry (May 21 2018 at 07:29, on Zulip):

Looming updates (on my laptop):

1. Pointing polonius at new datafrog master: 7.680s -> 6.934s.
2. Using treefrog leapjoin: 6.934s -> 5.744s.

The treefrog leapjoin (lftj is logicblox's property) structures some joins differently, and it would be good to explain this so that folks can take advantage. Roughly, the more constraints you can put in at the same time the better, and the more you break apart queries into little relations the less better. I'll get a blog post out today I suspect.

Also, with current numbers, on clap the subset relation is larger than all the others put together. It is also a great candidate for TFLJing, in that it is mostly a set of rules for how to extend requires, rather than anything we need to materialize. That possibility might be speculation on my part, so I'll try and explain how inlining subset could help come blogpost time.

lqd (May 21 2018 at 09:01, on Zulip):

(I was looking at reducing the cfgrelation, nothing particular to report at this time; apart from the exact idea mentioned in issue #20 I think, will not help for clap itself)

Frank McSherry (May 21 2018 at 09:12, on Zulip):

Continued:

3. Write own binary_search method: 5.744s -> 5.335s

lqd (May 21 2018 at 09:22, on Zulip):

maybe a different sorting algorithm

Frank McSherry (May 21 2018 at 09:45, on Zulip):

Maybe! Though, it's not so much the performance of Rust's binary_search as the guarantees it does/doesn't provide.

Frank McSherry (May 21 2018 at 09:46, on Zulip):

It is important to have a binary search that finds the first match of a thing (e.g. a key, in a list of (key,val)), so that we can scan forward and find all other matches.

Frank McSherry (May 21 2018 at 09:46, on Zulip):

I think the current implementation does this, but the team declined to commit to it doing this.

Frank McSherry (May 21 2018 at 10:57, on Zulip):

https://github.com/frankmcsherry/blog/blob/master/posts/2018-05-19.md#addendum-2018-05-21-treefrog-leapjoin

Frank McSherry (May 21 2018 at 11:04, on Zulip):

Associated with this, I have a highly modified datafrog_opt.rs that might want a different name to land. It should be the same logic, but it is different enough that perhaps it makes sense to keep both versions around for the moment?

nikomatsakis (May 21 2018 at 12:39, on Zulip):

Other random observations: even in frog-opt, about 0.5M facts in subset are symmetric relation facts, e.g. (p, (r, r)).

wait, are they literally r <= r? then we really don't need them...

nikomatsakis (May 21 2018 at 12:40, on Zulip):

Also, though perhaps this isn't the best time (boarding!) I'd love to get a rundown of the intent of the _opt variants. Clearly the subset and requires they compute are different, and I'd love to know how and why (are there better names for them, for someone starting from the naive implementation?).

@Frank McSherry basically my intution was to avoid computing the transitive closure, and instead only propagate edges where needed. The purpose of computing the transitive closure, for the most part, was that sometimes we would remove intermediate regions but we wanted to keep the overall effect. Example:

At some point P, you add r1 <= r2 and r2 <= r3. Then in the successor point Q, r2 goes dead. We still want to ensure that (at Q) we know that r1 <= r3.

We used to compute TC and then remove everything related to r2. We now do this more lazilly, only computing the reachability for regions like r2 that (A) are dead in Q but (B) have a live predecessor in Q.

lqd (May 21 2018 at 23:23, on Zulip):

Associated with this, I have a highly modified datafrog_opt.rs that might want a different name to land. It should be the same logic, but it is different enough that perhaps it makes sense to keep both versions around for the moment?

if it's mostly about using the leapfrog join, it could also be good for someone to reimplement it and gain familiarity :)

Frank McSherry (May 22 2018 at 06:59, on Zulip):

That sounds good too. I've got them locally now, but can gist them around or hand them off to anyone who wants to learn more.

Apropos that, I'm trying to grok the _opt rules to see if there are opportunities to use treefrog better. The main thing is that with this infrastructure, the actions that "cost" are adding a new variable; the large number of involved relations only improve the situation (by constraining the variable more). The "number of joins" is not especially important. With that in mind, it is possible that the _opt rules could be improved by fusing together a bunch of the rules, to try and get roughly one leapjoin per variable introduced. It may not be exactly that easy, but I'll try and fish out some examples to show what I mean.

lqd (May 22 2018 at 07:13, on Zulip):

a bit like folding "subset" into "requires" like you mentioned earlier

Frank McSherry (May 22 2018 at 07:16, on Zulip):

Yeah, if that would work out it would be great. I .. can't understand the _opt logic well enough to know if that will be the case, though. :)

lqd (May 22 2018 at 07:22, on Zulip):

haha :simple_smile: — I was rereading your "DD computation explanations" paper yesterday since I was about to track tuples flowing

lqd (May 22 2018 at 07:26, on Zulip):

tbf this could rely on the limited datasets too much, might be prudent to wait for proper testing (or just focus on the rules but that's like, harder, man)

lqd (May 22 2018 at 07:32, on Zulip):

(tbf2 the invalidates data could/should also cut down the cfg — in addition to another PR helping subset by reducing outlives by 50%)

Frank McSherry (May 22 2018 at 07:52, on Zulip):

Yeah, I would be interested to see a bunch more inputs before specializing too much; that's a good point. Is the clap example the known-worst case, or is there a spectrum?

Frank McSherry (May 22 2018 at 07:52, on Zulip):

I'll just chill for a bit then. ;)

Frank McSherry (May 22 2018 at 07:53, on Zulip):

Btw, Datalog has simpler explanations than the DD explanations. There is a paper by https://yanniss.github.io which has detail (and other good datalog links there).

Frank McSherry (May 22 2018 at 07:55, on Zulip):

This is the explanations paper I'm thinking of: https://yanniss.github.io/DeclarativeDebugging.pdf

lqd (May 22 2018 at 08:21, on Zulip):

thanks, I was also reading this one :)

lqd (May 22 2018 at 08:23, on Zulip):

I was wondering the same about clap and worst-cases, it looks like to be one of the slow cases / the slowest case that is tracked on https://perf.rust-lang.org/ but wondered how "realistic" it was

lqd (May 22 2018 at 08:23, on Zulip):

we can generate our own facts easily though

lqd (May 22 2018 at 08:24, on Zulip):

which examples would be best, surely niko and felix know

nikomatsakis (May 22 2018 at 09:34, on Zulip):

OK, I finally caught up with y'all on this TreeFrog LeapJoin business. =) Very cool!

lqd (May 22 2018 at 11:29, on Zulip):

@nikomatsakis I think you said you wanted to keep #20 (condensing the CFG) in your backpocket for now, was this mostly because of the location::all issue ? or maybe also because of the current lack of tests/benchmarks, to validate if potential improvements are correct and not overfitted to the repo's couple datasets ?

lqd (May 22 2018 at 12:40, on Zulip):

@Frank McSherry btw, in your leapjoin section about dead_region_requires((R, P, Q), B) I'm assuming you switched from joining with requires_bp like in the current version, to requires_rp because after leapjoining everything (let's call it, a leap surgery :3) you found out that you didn't need the requires_bp index anymore ? (not that it changes the results of course, just checking)

Frank McSherry (May 22 2018 at 12:48, on Zulip):

Yeah, that is a good point: some of the performance improvement comes from many of the multiply-indexed relations simplifying down to just one relation. For example, there is only one copy of each of subset and requires.

lqd (May 22 2018 at 12:50, on Zulip):

awesome thank you

Frank McSherry (May 22 2018 at 12:55, on Zulip):

https://gist.github.com/frankmcsherry/d3a5c56458779fa7df5934cbd68cb571

Frank McSherry (May 22 2018 at 12:55, on Zulip):

for the treefrogged version.

lqd (May 22 2018 at 12:59, on Zulip):

I was trying it out with the Naive rules first :) which is nice

nikomatsakis (May 22 2018 at 13:07, on Zulip):

@Frank McSherry I was thinking as I rode the subway today: first off, I've not looked at your actual code, and I guess I have to review the blog post, but it seems like it would be useful if the count function provided some feedback to the function that filters tuples later on (e.g., where to start looking for matches etc)

nikomatsakis (May 22 2018 at 13:07, on Zulip):

but secondly, we often have some information about ordering we could supply, though I'm not sure if that is useful

Frank McSherry (May 22 2018 at 13:09, on Zulip):

I'm not sure I grok the sort of feedback that could be provided. Can you give an example? The intersect method picks up mostly where count leaves off, looking at the same range that it just went and checked the bounds of to get the count back.

nikomatsakis (May 22 2018 at 13:10, on Zulip):

mm I guess a simple example would be that we know that cfg_edge will produce relatively matches

nikomatsakis (May 22 2018 at 13:11, on Zulip):

so e.g. when I would try to optimize, I would try to order joins with cfg_edge earlier

nikomatsakis (May 22 2018 at 13:11, on Zulip):

that said, it seems like the count mechanism is just more general and this may not be something we need to manually specify anymore

nikomatsakis (May 22 2018 at 13:11, on Zulip):

which is even better

Frank McSherry (May 22 2018 at 13:12, on Zulip):

I think there isn't much advantage to get here (unless I'm mis-understanding): the count method will return something like 1 pretty often (for cfg) and we will determine that it should be the one to propose the extension.

nikomatsakis (May 22 2018 at 13:12, on Zulip):

right, I agree

nikomatsakis (May 22 2018 at 13:12, on Zulip):

I guess my thought was that we might avoid invoking count — but then it's just doing work we kind of have to do anyway

Frank McSherry (May 22 2018 at 13:13, on Zulip):

Yeah, count happens to id where we need to head in the big array of tuples anyhow (and that info is cached for propose and intersect). Even if we hard-wired in "always use cfg don't bother doing count" I think the times would be pretty similar. We could check though.

nikomatsakis (May 22 2018 at 13:15, on Zulip):

nope, I'm convinced =)

lqd (May 23 2018 at 07:52, on Zulip):

@Frank McSherry hello :wave: — just checking, even with leapfrog we couldn't easily fold subset or requires right ? or borrow_live_at into the recent errors(B, P) :- invalidates(P, B), borrow_live_at(B, P). because all of those end up using more than one dynamic variable instead of just Relations. (I'm guessing we somehow can make a specific version of borrow_live_at that contains the invalidates join, for the regular mode; while in verbose mode we "have" to produce those triples; and I guess it might also change depending on how much of this error computation etc can be done in polonius rather than rustc)

Frank McSherry (May 23 2018 at 08:06, on Zulip):

That sounds right. It isn't too hard to treefrog together two variables, but it isn't supported right now. You would need to i. write wrapper for Variables, which would need to check each of the relations when they count/propose/intersect, and ii. make sure that you write each of the update leapjoins (i.e. a leapjoin stemming from each of the involved variables, as each one just responds to changes in one relation).

lqd (May 23 2018 at 08:09, on Zulip):

oh cool :) Maybe it can be a "good first issue" on the datafrog repo for people to contribute

lqd (May 23 2018 at 08:09, on Zulip):

so that you don't have to do all the work heh

nikomatsakis (May 23 2018 at 09:08, on Zulip):

@lqd

while in verbose mode we "have" to produce those triples; and I guess it might also change depending on how much of this error computation etc can be done in polonius rather than rustc

I think we can just drop "verbose" mode if it's getting in the way of something.

lqd (May 23 2018 at 09:10, on Zulip):

totally; until we know exactly what rustc will require between the errors and the borrow_live_at tuples, this can be in flux for a while

lqd (May 23 2018 at 10:14, on Zulip):

(nothing to be worried about anyway: even the multi variable join we know can be done would just be a bonus; leapfrog producing errors (but not borrow_live_at) is slightly faster than leapfrog producing borrow_live_at, we can even produce both by checking verbose mode during the iteration, and it still approximately the same — and both of those 2 are already faster than without leapfrog, like 20–30%; that being said I would also like to make sure the errorsare actually correct otherwise those numbers are not that useful :) so I'll finish the frontend soon; meanwhile frank can rest and land this new API at his leisure :)

Frank McSherry (May 23 2018 at 10:21, on Zulip):

@lqd ideally yeah, we trick someone most Rust-focused into coming up to speed on how :frog: works, so that its understanding does live only in my head, and the code doesn't have the risk of rotting due to it not being clear how to make it do a specific thing. Such issues could be good, esp if you all id such a person.

Reed Koser (May 26 2018 at 21:33, on Zulip):

I took a stab at writing a merge that could handle multiple relations at once. It seems like it's generally slower, but here's the code https://github.com/rust-lang-nursery/datafrog/compare/master...bobtwinkles:multimerge?expand=1 so people can learn from my mistakes =P
I'm just creating iterators over all the relations up front, and then stepping through them to generated a sorted list. Results (naming convention is merge_(# of relations)_(average relation size)/algorithm:

merge/merge_003x100/Individual
                        time:   [2.7191 us 2.7451 us 2.7767 us]
                        change: [-8.8104% -7.1505% -5.6703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe
merge/merge_003x100/MultiMerge
                        time:   [4.3975 us 4.4096 us 4.4266 us]
                        change: [-1.6184% -0.7284% +0.0416%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

merge/merge_004x100000/Individual
                        time:   [7.0525 ms 7.0865 ms 7.1178 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
merge/merge_004x100000/MultiMerge
                        time:   [7.9256 ms 7.9569 ms 7.9988 ms]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

My guess is that it's slower due to having poor cache locality (jumping all over memory to read from the different lists) compared to just merging things one at a time

lqd (Jun 28 2018 at 13:33, on Zulip):

I was wondering whether using indexes would help doing the :frog: joins, in addition to limiting the Variable indexes, or for such small relations maybe it wouldn't be that useful (maybe a compressed bitset, à la roaring bitmaps?)

lqd (Jun 28 2018 at 13:36, on Zulip):

and also if @Frank McSherry (hello ;) had thoughts about using a Factorized representation would be possible/useful/stupid (as IIRC there's also a worst-case optimal join algorithm that can work with them) and it seemed to lower both the space and time complexity (linked to the hypertree width) :)

nikomatsakis (Jun 28 2018 at 13:36, on Zulip):

what kind of indexes are you referring to ?

lqd (Jun 28 2018 at 13:37, on Zulip):

roaring bitmaps you mean ?

lqd (Jun 28 2018 at 13:37, on Zulip):

or using them in the Variables themselves ?

lqd (Jun 28 2018 at 13:39, on Zulip):

if the former: https://roaringbitmap.org/ a kind of hybrid structure

lqd (Jun 28 2018 at 13:45, on Zulip):

(with tricks similar to map fusion, to enable the multiple chunks to use bitsets/arrays/etc independently)

nikomatsakis (Jun 28 2018 at 13:54, on Zulip):

roaring bitmaps you mean ?

I've never heard of that either, but you wrote:

...I was wondering whether using indexes would help..

nikomatsakis (Jun 28 2018 at 13:55, on Zulip):

I'm not sure what you meant by 'indexes' there

nikomatsakis (Jun 28 2018 at 13:55, on Zulip):

I guess I have heard of roaring bitmaps

nikomatsakis (Jun 28 2018 at 13:56, on Zulip):

we do use sparse bitsets in the compiler

nikomatsakis (Jun 28 2018 at 13:57, on Zulip):

roaring's techniques feel potentially like overkill, but I'm not sure

lqd (Jun 28 2018 at 13:57, on Zulip):

yeah, I was wondering how to compare using galloping vs computing the intersection of keys using indices and then joining the values

lqd (Jun 28 2018 at 14:00, on Zulip):

and the other questions were about using factorized representations which seemed interesting to both reduce the number of materialized results, and the time to compute the join as well -- but maybe only on bigger datasets that we never reach with the NLL facts I'm not sure (and hoped Frank who knows everything would know whether this was a good idea before looking too much into it ;)

Frank McSherry (Jun 28 2018 at 16:30, on Zulip):

So, the variables and galloping are pretty much a lightweight LSM index. They are throughput optimized rather than lookup optimized, so if you felt that there were perhaps many rounds producing few tuples, a random access optimized index might be better. The current design does at most one linear pass across each index per round, and likely to a sparse subset based on the relative sizes (driven by galloping). As long as the number of facts is substantially larger than the number of iterations, this should be .. pretty good, I think?

Frank McSherry (Jun 28 2018 at 16:32, on Zulip):

It's totally reasonable to ask about swapping in a HashMap or the like, as an actual index, at which point you trade away some throughput and gain some reduction in minimum latency. I have no data on which is better / worse, but one advantage of the sorted lists is that you can re-use a (a,b,c) list for a joins, (a,b) joins, and (a,b,c) semijoins, unlike a HashMap where you need to pick the keys explicitly.

Frank McSherry (Jun 28 2018 at 16:33, on Zulip):

Factorized stuff could work, but it depends a lot on the query. If there are opportunities for factorization then they can be great, and if there are not then (I think) nothing good results. Nothing stood out at me in the current query. (For Niko's benefit: factorization is maintaining joined results in "cross-product of lists" form rather than expanded out as tuples, most commonly beneficial if another join can be done that can consume the cross product representation (meaning the next keys are not fields left unexpanded).

lqd (Jun 28 2018 at 16:38, on Zulip):

@Frank McSherry thanks a lot, as always :)

Frank McSherry (Jun 28 2018 at 16:38, on Zulip):

I'm not aware of easy idiomatic ways to integrate factorized query stuff, except perhaps hooking the gallop results as an iterator, and allow you to call either flat_map or collect. That being said, I've only come across a query where it helps once (it helped a lot), and I don't have a lot of experience with using them idiomatically.

Frank McSherry (Jun 28 2018 at 16:39, on Zulip):

Hey no worries! :D

Last update: Nov 21 2019 at 13:35UTC