Stream: t-compiler/wg-nll

Topic: stats' branch


qmx (May 10 2018 at 12:30, on Zulip):

so @nikomatsakis I've managed to make your branch to compile, but I'm not sure that's what you wanted it to do

qmx (May 10 2018 at 12:31, on Zulip):

https://github.com/qmx/borrow-check/commit/1cadd3c8e4334bf2f723bc381ffca9c103648605

nikomatsakis (May 10 2018 at 12:32, on Zulip):

maybe instead put a *x in the closure

nikomatsakis (May 10 2018 at 12:32, on Zulip):

e.g. flat_map(|(r1, r2s)| r2s.iter().map(move |r2| (r1, r2))) could be:

nikomatsakis (May 10 2018 at 12:32, on Zulip):
flat_map(|(r1, r2s)| r2s.iter().map(move |r2| (*r1, *r2)))
nikomatsakis (May 10 2018 at 13:17, on Zulip):

@qmx ok

qmx (May 10 2018 at 13:17, on Zulip):

I think I need to step back a little bit and understand the bigger picture here

nikomatsakis (May 10 2018 at 13:17, on Zulip):

I guess first thing is I have to refresh my memory what that branch was even doing

nikomatsakis (May 10 2018 at 13:17, on Zulip):

but yes, that sounds right

qmx (May 10 2018 at 13:17, on Zulip):

we wanted to print a histogram from the analysis data

nikomatsakis (May 10 2018 at 13:18, on Zulip):

right, I remember that much :)

nikomatsakis (May 10 2018 at 13:18, on Zulip):

probably, I suspect, many histograms

nikomatsakis (May 10 2018 at 13:18, on Zulip):

but let's start with one

nikomatsakis (May 10 2018 at 13:18, on Zulip):

anyway, how far back do you want to step?

qmx (May 10 2018 at 13:18, on Zulip):

but that led me to more philosophical questions like what (Region, Region) actually means

qmx (May 10 2018 at 13:18, on Zulip):

(beyond tuple of regions, of course :joy: )

nikomatsakis (May 10 2018 at 13:20, on Zulip):

:)

nikomatsakis (May 10 2018 at 13:20, on Zulip):

so the "underlying model" here is sort of "sets of tuples"

nikomatsakis (May 10 2018 at 13:20, on Zulip):

you can think of that in many ways

nikomatsakis (May 10 2018 at 13:20, on Zulip):

but e.g. if we think about the blog post

nikomatsakis (May 10 2018 at 13:20, on Zulip):

it talked about something like, at this point P in the code, we know that r1 <= r2

nikomatsakis (May 10 2018 at 13:21, on Zulip):

and maybe at some other point Q we know that r2 <= r3

nikomatsakis (May 10 2018 at 13:21, on Zulip):

(with me so far?)

qmx (May 10 2018 at 13:21, on Zulip):

yep!

nikomatsakis (May 10 2018 at 13:21, on Zulip):

we could represent those facts as a set of 3-tuples: (Point, Region, Region)

nikomatsakis (May 10 2018 at 13:21, on Zulip):

so we would have:

(P, r1, r2)
(Q, r2, r3)
nikomatsakis (May 10 2018 at 13:22, on Zulip):

what that all_subsets function aims to do is to collapse that to (one sec, afk)

nikomatsakis (May 10 2018 at 13:24, on Zulip):

right so we are going to reduce those 3-tuples to ignore the points

nikomatsakis (May 10 2018 at 13:25, on Zulip):

so basically the result in this case would just be

(r1, r2)
(r2, r3)
nikomatsakis (May 10 2018 at 13:25, on Zulip):

:heavy_check_mark: ?

qmx (May 10 2018 at 13:25, on Zulip):

gotcha

nikomatsakis (May 10 2018 at 13:26, on Zulip):

I sort of forget why I was doing that

nikomatsakis (May 10 2018 at 13:26, on Zulip):

well, basically, I think what I figured is

nikomatsakis (May 10 2018 at 13:26, on Zulip):

in the real analysis, the subset relation kind of grows and shrinks

nikomatsakis (May 10 2018 at 13:26, on Zulip):

but if we look at the union over all points

nikomatsakis (May 10 2018 at 13:26, on Zulip):

we get a kind of "upper bound" on how many edges result

nikomatsakis (May 10 2018 at 13:27, on Zulip):

anyway, it's not clear that this is the right kind of stats to be gathering

nikomatsakis (May 10 2018 at 13:27, on Zulip):

(but it seems like an ok starting point)

nikomatsakis (May 10 2018 at 13:27, on Zulip):

(oh, I sort of remember, I think I wanted to print out the maximum in-degree and out-degree of a node at any particular point?)

qmx (May 10 2018 at 13:28, on Zulip):

exactly

nikomatsakis (May 10 2018 at 13:28, on Zulip):

right so I think I was going to take those 2-tuples and then produce two results:

nikomatsakis (May 10 2018 at 13:28, on Zulip):

like, for each region R, how many tuples are there like (R, _) -- that is kind of the "out-degree"

nikomatsakis (May 10 2018 at 13:29, on Zulip):

because each tuple (A, B) represents an edge A -> B in the "subset graph"

nikomatsakis (May 10 2018 at 13:29, on Zulip):

another way of saying it is that we are computing "how regions is R is a subset of"

nikomatsakis (May 10 2018 at 13:29, on Zulip):

then the other result would be "how many tuples are there like (_, R)"

nikomatsakis (May 10 2018 at 13:29, on Zulip):

that answers roughly the opposite question ("how many regions are a subset of R")

qmx (May 10 2018 at 13:30, on Zulip):

this makes sense

nikomatsakis (May 10 2018 at 13:30, on Zulip):

(it's interesting to note though that these tuples will be after we've computed the transitive closure operation; we might later be interested in the same results, but with "transitive implications" removed...)

nikomatsakis (May 10 2018 at 13:30, on Zulip):

anyway both of those computations begin with the all_tuples computation

nikomatsakis (May 10 2018 at 13:30, on Zulip):

though it occurs to me now

nikomatsakis (May 10 2018 at 13:30, on Zulip):

that differential-dataflow would make this more elegant

nikomatsakis (May 10 2018 at 13:30, on Zulip):

maybe we should be using that :)

qmx (May 10 2018 at 13:31, on Zulip):

/me is going back to read differential-dataflow examples

nikomatsakis (May 10 2018 at 13:32, on Zulip):

but in "normal Rust" code you might do something like:

for "each region `r`" {
  let out_degree = all_subset.tuples().filter(|(a, _)| if a == r).count();
  let in_degree = all_subset.tuples().filter(|(_, b)| if b == r).count();
}
nikomatsakis (May 10 2018 at 13:32, on Zulip):

@qmx let's do it in normal Rust first

nikomatsakis (May 10 2018 at 13:32, on Zulip):

I think it would be easier to learn 1 thing at a time ;)

nikomatsakis (May 10 2018 at 13:32, on Zulip):

then we can port to differential-dataflow

qmx (May 10 2018 at 13:32, on Zulip):

fair enough

nikomatsakis (May 10 2018 at 13:34, on Zulip):

so, once you compute those in- and out-degrees

nikomatsakis (May 10 2018 at 13:34, on Zulip):

you can then add those two pnts to the histogram

nikomatsakis (May 10 2018 at 13:34, on Zulip):
let mut out_histogram, in_histogram;
for "each region `r`" {
  let out_degree = all_subset.tuples().filter(|(a, _)| if a == r).count();
  let in_degree = all_subset.tuples().filter(|(_, b)| if b == r).count();
  out_histogram.add(out_degree);
  in_histogram.add(in_degree);
}
out_histogram.print();
in_histogram.print();
nikomatsakis (May 10 2018 at 13:35, on Zulip):

that was the idea, anyhow

qmx (May 10 2018 at 13:36, on Zulip):

sounds like a good start, lemme give it a try

nikomatsakis (May 10 2018 at 13:37, on Zulip):

@qmx I suppose a more efficient formulation might be like this

nikomatsakis (May 10 2018 at 13:38, on Zulip):
let mut in_count, out_count = FxHashMap::default();
for (a, b) in all_subsets {
  *out_count.entry(a).or_insert(0) += 1;
  *in_count.entry(a).or_insert(0) += 1;
}
nikomatsakis (May 10 2018 at 13:38, on Zulip):

assuming no duplicates

nikomatsakis (May 10 2018 at 13:39, on Zulip):

(we can remove duplicates in various ways)

nikomatsakis (May 10 2018 at 13:40, on Zulip):

that said, I guess what we care about is really the in-degree or out-degree at any particular point

nikomatsakis (May 10 2018 at 13:40, on Zulip):

anyway, I'll stop badgering you and let you get something

nikomatsakis (May 10 2018 at 13:40, on Zulip):

it'll be easy to tweak

lqd (May 10 2018 at 13:42, on Zulip):

tracking::RegionDegrees has similar information and updating code right ?

qmx (May 10 2018 at 13:42, on Zulip):

lol, that's what I was going to ask now

nikomatsakis (May 10 2018 at 13:43, on Zulip):

heh, yes... we could use that :P

nikomatsakis (May 10 2018 at 13:43, on Zulip):

I forgot that was there

qmx (May 10 2018 at 14:58, on Zulip):
In-degree stats
# Number of samples = 22
# Min = 1
# Max = 3
#
# Mean = 1.3636363636363638
# Standard deviation = 0.5677270907634907
# Variance = 0.3223140495867768
#
# Each ∎ is a count of 1
#
 1 ..  2 [ 15 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  6 ]: ∎∎∎∎∎∎
 3 ..  4 [  1 ]: ∎
 4 ..  5 [  0 ]:
 5 ..  6 [  0 ]:
 6 ..  7 [  0 ]:
 7 ..  8 [  0 ]:
 8 ..  9 [  0 ]:
 9 .. 10 [  0 ]:
10 .. 11 [  0 ]:
Out-degree stats
# Number of samples = 22
# Min = 1
# Max = 3
#
# Mean = 1.3636363636363638
# Standard deviation = 0.5677270907634907
# Variance = 0.3223140495867768
#
# Each ∎ is a count of 1
#
 1 ..  2 [ 15 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
 2 ..  3 [  6 ]: ∎∎∎∎∎∎
 3 ..  4 [  1 ]: ∎
 4 ..  5 [  0 ]:
 5 ..  6 [  0 ]:
 6 ..  7 [  0 ]:
 7 ..  8 [  0 ]:
 8 ..  9 [  0 ]:
 9 .. 10 [  0 ]:
10 .. 11 [  0 ]:
qmx (May 10 2018 at 15:09, on Zulip):

I'm having mixed feelings with the output tho, is this supposed to be something separate from dump? Cause it's a lot of info :)

lqd (May 10 2018 at 15:10, on Zulip):

maybe for -v with the other in/out degree info ? (or replacing it)

qmx (May 10 2018 at 15:12, on Zulip):

the thing is that this information has been only generated when you ask for -v

qmx (May 10 2018 at 15:13, on Zulip):

and I wonder if this is the expected behavior here

nikomatsakis (May 10 2018 at 15:13, on Zulip):

I imagined a --stats option (or family of options)

nikomatsakis (May 10 2018 at 15:15, on Zulip):

anyway I guess we should think about what else we want to know --

nikomatsakis (May 10 2018 at 15:16, on Zulip):

one thing I would like is to get some idea of how many regions "go dead" at each point. Meaning that:

nikomatsakis (May 10 2018 at 15:17, on Zulip):

@qmx if you wanted to mess about with differential-dataflow, we could talk through how to compute that

nikomatsakis (May 10 2018 at 15:17, on Zulip):

not sure if you still have more time for the morning :)

qmx (May 10 2018 at 15:18, on Zulip):

whoa, I didn't expect the clap analysis to be that slow (and big!)

qmx (May 10 2018 at 15:18, on Zulip):

@nikomatsakis I can talk now, and work on it later :)

qmx (May 10 2018 at 15:19, on Zulip):

gonna dump what I have on a PR later today

nikomatsakis (May 10 2018 at 15:21, on Zulip):

@qmx well consider this code

nikomatsakis (May 10 2018 at 15:21, on Zulip):

that is currently what "moves" subset relations from a point P to its successor Q

nikomatsakis (May 10 2018 at 15:23, on Zulip):

the differential-dataflow version of the datalog is a bit complex because of the "or" (;)

nikomatsakis (May 10 2018 at 15:24, on Zulip):

I glossed over that in my blog post; it's sort of a micro-perf-opt, actually

nikomatsakis (May 10 2018 at 15:24, on Zulip):

the idea is that there are some regions (the "universal regions") that are known to be live at every point

nikomatsakis (May 10 2018 at 15:24, on Zulip):

we could generate a fact region_live_at(R, P) for each point P

nikomatsakis (May 10 2018 at 15:24, on Zulip):

but instead we generate one universal_region(R) fact

nikomatsakis (May 10 2018 at 15:24, on Zulip):

this is kind of annoying and maybe wasn't the right call =)

nikomatsakis (May 10 2018 at 15:25, on Zulip):

anyway if we look at the first bit:

subset.map(|(r1, r2, p)| (p, (r1, r2))).join(&cfg_edge);
lqd (May 10 2018 at 15:25, on Zulip):

at least it's <5 universal regions, vs 50k region_live_at facts per universal region :)

nikomatsakis (May 10 2018 at 15:25, on Zulip):

that is equivalent to this in datalog:

subset(R1, R2, Q) :-
    subset(R1, R2, P),
    cfg_edge(P, Q).
nikomatsakis (May 10 2018 at 15:26, on Zulip):

differentia-dataflow operates over streams of tuples

nikomatsakis (May 10 2018 at 15:26, on Zulip):

the base primitive is join

nikomatsakis (May 10 2018 at 15:26, on Zulip):

which takes one stream of tuples of type (A, B) and another of type (A, C) and produces a third stream (A, B, C). think SQL join.

nikomatsakis (May 10 2018 at 15:26, on Zulip):

i.e., tuples are matched up when the first part is equal

nikomatsakis (May 10 2018 at 15:27, on Zulip):

so in this case we started out with tuples like (r1, r2, p) (that's the subset relation)

nikomatsakis (May 10 2018 at 15:27, on Zulip):

and we have the cfg_edge relation with has tuples (p, q)

nikomatsakis (May 10 2018 at 15:27, on Zulip):

we use map to reorder things so that the thing we want to join on is first, and the rest is latter

nikomatsakis (May 10 2018 at 15:27, on Zulip):
subset.map(|(r1, r2, p)| (p, (r1, r2)))
nikomatsakis (May 10 2018 at 15:27, on Zulip):

produces tuples (p, (r1, r2))

nikomatsakis (May 10 2018 at 15:28, on Zulip):

same data, different configuration

nikomatsakis (May 10 2018 at 15:28, on Zulip):

now we join that with the cfg-edge (p, q)

nikomatsakis (May 10 2018 at 15:28, on Zulip):

and so we will get tuples like (p, (r1, r2), q)

nikomatsakis (May 10 2018 at 15:28, on Zulip):

make sense?

qmx (May 10 2018 at 15:28, on Zulip):

yep, it's dense but makes sense

nikomatsakis (May 10 2018 at 15:29, on Zulip):

that's the basic pattern, but there are two other operations used from time to time

nikomatsakis (May 10 2018 at 15:29, on Zulip):

one is semijoin -- this is kind of like "filter"

nikomatsakis (May 10 2018 at 15:29, on Zulip):

the other is "antijoin", which is kind of like join except it removes things that match

qmx (May 10 2018 at 15:30, on Zulip):

/me can see more uses for that

nikomatsakis (May 10 2018 at 15:31, on Zulip):

anyway so what I wanted to compute, expressed in datalog, is something like

subset(R1, R2, Q) :-
  subset(R1, R2, P),
  cfg_edge(P, Q),
  !((region_live_at(R1, Q); universal_region(R1)); (region_live_at(R2, Q); universal_region(R2))).

(or something like that)

nikomatsakis (May 10 2018 at 15:31, on Zulip):

that is probably not legal datalog but anyway

nikomatsakis (May 10 2018 at 15:31, on Zulip):

i.e., where there was a subset relation at the point P (subset(R1, R2, P)) ,and there is an edge P->Q (cfg_edge(P, Q)), but either r1 or r2 is not live at Q

nikomatsakis (May 10 2018 at 15:32, on Zulip):

we should be able to translate that into differential-dataflow with a bit of ... shuffling around of the existing equations :P

nikomatsakis (May 10 2018 at 15:33, on Zulip):

I think I have the precise detail wrong there

nikomatsakis (May 10 2018 at 15:34, on Zulip):

said simpler, it would be something like this:

subset(R1, R2, Q) :-
  subset(R1, R2, P),
  cfg_edge(P, Q),
  (!viable_at(R1, Q)); (!viable_at(R2, Q)).

viable_at(R, P) :- region_live_at(R, P).
viable_at(R, P) :- universal_region(R). // <-- though this is not valid datalog but anyway
qmx (May 10 2018 at 15:34, on Zulip):

/me will need to re-read that a few times for it to sink in

nikomatsakis (May 10 2018 at 15:34, on Zulip):

we can probably jsut ignore the universal region stuff for now

nikomatsakis (May 10 2018 at 15:34, on Zulip):

"desugar" it

nikomatsakis (May 10 2018 at 15:35, on Zulip):
subset(R1, R2, Q) :-
  subset(R1, R2, P),
  cfg_edge(P, Q),
  !region_live_at(R1, Q).

subset(R1, R2, Q) :-
  subset(R1, R2, P),
  cfg_edge(P, Q),
  !region_live_at(R2, Q).
nikomatsakis (May 10 2018 at 15:35, on Zulip):

that would give rules like that (also avoiding ;, which is sort of confusing)

lqd (May 10 2018 at 15:36, on Zulip):

@nikomatsakis your automatic "datalog queries to DD" builder idea is looking better and better by the minute :)

nikomatsakis (May 10 2018 at 15:36, on Zulip):

heh yeah would be fairly easy

nikomatsakis (May 10 2018 at 15:36, on Zulip):

once you grok the pattern, that is

nikomatsakis (May 10 2018 at 15:36, on Zulip):

until then it looks completely opaque

lqd (May 10 2018 at 16:14, on Zulip):

I guess the only unknown for translation here would be when to use DD's collection::iterate rather than operators on other parts of the computation ? is that because there's one "rel1(...) :- relB(...)." (eg in subsetand requires) which is not present in borrow_live_atfor example ?

lqd (May 10 2018 at 16:29, on Zulip):

(I think it could influence how @qmx would implement viable_at)

lqd (May 10 2018 at 18:10, on Zulip):

or maybe more likely because those are inputs

nikomatsakis (May 10 2018 at 22:59, on Zulip):

@qmx I'd like to measure how often the conditions described in #20 happen

qmx (May 10 2018 at 23:08, on Zulip):

https://github.com/rust-lang-nursery/borrow-check/pull/21

qmx (May 10 2018 at 23:14, on Zulip):

ugh, I think travis got the new nightly :P

nikomatsakis (May 10 2018 at 23:16, on Zulip):

hmm so it really is broken? sigh, I guess I have to file an issue. I was sort of half-hoping it was just my system.

nikomatsakis (May 10 2018 at 23:16, on Zulip):

though not really since then I would have to figure out why my system is different :P

nikomatsakis (May 10 2018 at 23:17, on Zulip):

I think I added a rust-toolchain file with an older nightly locally

qmx (May 10 2018 at 23:19, on Zulip):

do we have all the feature we're using in beta?

nikomatsakis (May 10 2018 at 23:20, on Zulip):

ooh, ooh, we can use https://github.com/rust-lang-nursery/cargo-bisect-rustc

nikomatsakis (May 10 2018 at 23:20, on Zulip):

@qmx no way man, we're bleeding edge

qmx (May 10 2018 at 23:20, on Zulip):

:kitchen_knife:

nikomatsakis (May 10 2018 at 23:26, on Zulip):

ok alex informed me that we must now have edition = 2018 for that to work in our cargo.tomt

nikomatsakis (May 10 2018 at 23:27, on Zulip):

although..

nikomatsakis (May 10 2018 at 23:29, on Zulip):

well, there's still some bugs in the newer nightlies -- specifically doc tests

qmx (May 10 2018 at 23:58, on Zulip):

sooo, for #20 you want only the analysis/statistics first?

nikomatsakis (May 11 2018 at 00:03, on Zulip):

seems like a good starting point

nikomatsakis (May 11 2018 at 00:03, on Zulip):

I'd be curious to know how many such opportunities there are

nikomatsakis (May 11 2018 at 00:03, on Zulip):

seems like a precursor anyway if we are going to try to take advantage of them

Last update: Nov 22 2019 at 00:15UTC