Stream: t-compiler/wg-nll

Topic: stats-for-differential-dataflow

nikomatsakis (May 14 2018 at 13:50, on Zulip):

So the discussion over in the optimizing-transitive-closure thread got me thinking. Maybe we should alter the -z stats option to gather up internal statistics about the differential-dataflow computation itself (@qmx this might be nice "in road" for playing around with it; @lqd you might enjoy too). The idea would be fairly simple: we would insert the count operations at strategic points and dump out the results, trying to get a feel for how many tuples are flowing around at each point. I'm not sure that the hist package is very useful for this, though, we'd probably want to dump the results more in tabular form.

lqd (May 14 2018 at 13:55, on Zulip):

possibly related to but just with total numbers (as dumping 18M triples in a graph would be hard to read :)

nikomatsakis (May 14 2018 at 13:56, on Zulip):


nikomatsakis (May 14 2018 at 13:56, on Zulip):

I should probably merge my branch

nikomatsakis (May 14 2018 at 13:56, on Zulip):

since obviously it is the way to go

nikomatsakis (May 14 2018 at 13:56, on Zulip): I mean

nikomatsakis (May 14 2018 at 13:56, on Zulip):

though I ought to fix that assertion

lqd (May 14 2018 at 13:57, on Zulip):

agreed, it improves things so much :thumbs_up:

nikomatsakis (May 14 2018 at 14:05, on Zulip):

to be honest, I think an immediate next step would be cleaning up the code there

nikomatsakis (May 14 2018 at 14:05, on Zulip):

there are a number of repetitive patterns that I was too lazy to abstract

nikomatsakis (May 14 2018 at 14:05, on Zulip):

for example:

nikomatsakis (May 14 2018 at 14:05, on Zulip):
                        let dead_can_reach = Variable::from(Collection::new(
                            None::<(((Region, Point), (Region, Point)), _, _)>
nikomatsakis (May 14 2018 at 14:06, on Zulip):

it would be nice to make some sort of let dead_can_reach = Variable::empty(); instead

nikomatsakis (May 14 2018 at 14:06, on Zulip):

or maybe:

let dead_can_reach = Variable:<((Region, Point), (Region, Point))>::empty();

(I actually like declaring the types here)

nikomatsakis (May 14 2018 at 14:07, on Zulip):

similarly all the calls to join_core would be nicer if abstracted

Frank McSherry (May 14 2018 at 14:51, on Zulip):

Should be mostly easy to put in a Variable::empty() or Variable::new(), but it will need some handle to the scope. It is currently getting this from the empty input collection it is handed, but will otherwise need a handle in order to create the correct internal timely dataflow structures.

nikomatsakis (May 14 2018 at 14:54, on Zulip):

yeah, I was just sketching...

nikomatsakis (May 14 2018 at 14:54, on Zulip):

I need to open an issue for that stuff

lqd (May 14 2018 at 16:44, on Zulip):

@nikomatsakis did you want counts for only the main relations (region_live_at/subset/requires) or for the subrelations as well (which would require aggregation over the timely rounds I think) ?

nikomatsakis (May 14 2018 at 16:45, on Zulip):

@lqd seems like we could start with just the "main" relations

nikomatsakis (May 14 2018 at 16:45, on Zulip):

that said, I'd be sort of interested in all of them I imagine

nikomatsakis (May 14 2018 at 16:45, on Zulip):

I think it.. would probably require aggregation? I'm not sure I understand that part yet

nikomatsakis (May 14 2018 at 16:45, on Zulip):

specifically, I don't know if a call to consolidate() would suffice

nikomatsakis (May 14 2018 at 16:46, on Zulip):

@Frank McSherry can probably answer — specifically, if we call consolidate(), but there are internal rounds (due to a variable, etc), will we still have to sum over those rounds? That said, it seems...kind of interesting to see the full counts

nikomatsakis (May 14 2018 at 16:46, on Zulip):

i.e., what we want is kind of a measure of how much "work" is happening?

lqd (May 14 2018 at 16:47, on Zulip):

from the limited testing I did, if I put the consolidate + inspect inside the "subscope" I'd see more than one count output

lqd (May 14 2018 at 16:49, on Zulip):

if you want the main relations totals before the PR, here it is :p
'region_live_at' tuple counts: 1076158
'subset' tuple counts: 2714121
'requires' tuple counts: 858674

nikomatsakis (May 14 2018 at 16:51, on Zulip):

@lqd thanks; region_live_at is the "input" tuples anyway, right? (but still interesting)

nikomatsakis (May 14 2018 at 16:51, on Zulip):

it'd be good to also print out how many Point and Region instances there are

nikomatsakis (May 14 2018 at 16:51, on Zulip):

oh, bother, I had some changes to the Intern to make that easy

lqd (May 14 2018 at 16:52, on Zulip):

oh yes true, it's just in the verbose output for some reason, so I did those 3

lqd (May 14 2018 at 16:52, on Zulip):

I also wanted (and did before) add a number of points + borrow_live_at sum for the --skip-tuples case

lqd (May 14 2018 at 16:54, on Zulip):

eg: Output: 48992 keys, 832392 values

Frank McSherry (May 14 2018 at 17:07, on Zulip):

Re: consolidate(), you can put that measurement fragment in a few places:

1. outside the loop that defines each of the relations (if you .leave() either of them),
2. inside the loop after the distinct_total() (which will report new facts each round; if you print out the .1 field in addition to .2 it will report the "timestamp" == round of derivation),
3. inside the loop before the distinct_total() which should show you the work going in to the distinct, which might reveal large numbers of redundant derivations that the distinct suppresses (or confirm that they do not exist).

Frank McSherry (May 14 2018 at 17:09, on Zulip):

What consolidate does is flatten the collection down to at most one (data, time) pair for each distinct data and time, consolidating up the diff fields. When you map the data to () you just get the distinct times which are rounds of derivation, and the accumulated diff which should be their count (either pre- or post-distinct).

Frank McSherry (May 14 2018 at 17:31, on Zulip):

Another thing I can try for DD is to sort out the timely logging infra. In principle when enabled it produces all of the record counts and time spent in each operator across the computation. There is some work to do to synthesize it back into something intelligible (it will be a lot of "Join" and "Map"), but it seems like something I should do, and would probably be a lot more helpful than having you all place random measurement code everywhere.

I'll look into this and see what I can get quickly.

lqd (May 14 2018 at 17:55, on Zulip):

@Frank McSherry for the timestamp, we should we print the inner timestamp right ? (rn it's ((Root, ()), 1) but I assume it's because of the subscopes)

Frank McSherry (May 14 2018 at 17:55, on Zulip):

Ok, I've got vanilla timely up and producing logs about what's been run and for how long. It shouldn't be too hard to aggregate times by operator and message count by edge. I'm going to look into whether DD can help put names on all of the operators and such, to try and clean up the presentation.

Frank McSherry (May 14 2018 at 17:56, on Zulip):

@lqd yeah, the .inner field is the one you want. It should be the iteration counter, where the outer () is the empty timestamp indicating that no you cannot change the inputs at all (gives access to distinct_total).

lqd (May 14 2018 at 17:56, on Zulip):

awesome, thank you :)

Frank McSherry (May 14 2018 at 18:08, on Zulip):

I'm going to grab some food, but I wanted to drop some notes on the logging stuff. It may ultimately be as helpful as custom instrumentation (more flexible, for sure). I put a PR up at

which outlines some of the things, but the main gist is that you get a live timely dataflow stream of all the logging events, and can then whip up your own computation that consumes them (e.g. a DD computation that maintains accumulated counts and joins against the constructed names to produce more appealing reports).

I'll try and put together a prototype that may do a fine job of reporting record counts on every edge in the dataflow graph, without further modifying the computation (and without working around the measurement code). The downside is that it only sees record counts, and doesn't get to peek inside. If you have one record with 10M derivations, meaning a .2 field of 10M, that will be one record possibly.

lqd (May 14 2018 at 18:22, on Zulip):

@nikomatsakis gtg for a while, but I have something like this atm -- the first file is --stats -v to have the timely rounds stats, and the second one just --stats

Frank McSherry (May 14 2018 at 20:39, on Zulip):

Good news and bad news!

Good news is that I have a differential dataflow program that consumes a live timely logging stream and produces counts for records on all edges, described by the names of the operators.

Bad news is that it is lots of maps and joins and stuff. Here it is on the current BorrowCheck execution:

Frank McSherry (May 14 2018 at 20:42, on Zulip):

I'm not sure I've really got the name <--> channel mapping nailed down perfectly (more research to do here), but you can probably start to get the gist. The only change needed in the borrow_check program is to initiate the timely computation like so:

    let logger_config = timely::logging::LoggerConfig::new(
        |setup| {
            let addr = format!("{}", 8000 + setup.index);
            let send = ::std::net::TcpStream::connect(addr).unwrap();
        |_setup| {
            // No support for communication threads in this example.
            let addr = format!("{}", 8001);
            let send = ::std::net::TcpStream::connect(addr).unwrap();

    timely::execute_from_args_logging(vec![].into_iter(), logger_config, {
        let result = result.clone();

which yes uses a TCP socket to grab the logging data. In principle it can go to any W: Write, or Kafka, or a bunch of other places.

Frank McSherry (May 14 2018 at 20:44, on Zulip):

Another downside, unfortunately: the arrangement infrastructure is hiding some of the sizes. The records coming out of arrangements are the number of batches, rather than the number of records in the batch. This means that getting the details out of records into a distinct vs records out of a distinct means we have to look at the input to the arrange leading into the distinct, rather than just the distinct itself.

lqd (May 14 2018 at 21:16, on Zulip):

@nikomatsakis here's the commit for the stats I posted earlier if you want a PR (10 min compile times on this tiny laptop are bru·tal :) — and then we can integrate frank's new shinies later ?

Frank McSherry (May 14 2018 at 21:17, on Zulip):

btw, the change proposed up above should be conditional somehow. It will panic if there isn't anyone listening on that port, and you probably don't want that most of the time (also, some run-time overhead for teh logging).

Frank McSherry (May 14 2018 at 21:28, on Zulip):

It's midnight here, but if you want to check it out,

is the code for the log analysis. It's all just joining together things, and a bunch of consulting to remember which fields are which. More work to do to make it intelligible, and perhaps some thinking about how to better allow naming / grouping of operators in code so that the stats can be better interpreted.

lqd (May 14 2018 at 21:36, on Zulip):

@Frank McSherry very nice :)

nikomatsakis (May 15 2018 at 15:18, on Zulip):

@lqd btw I've not had a chance to look over what you did yet — will try to do that now

lqd (May 15 2018 at 15:46, on Zulip):

no worries :)

Last update: Jul 02 2020 at 19:25UTC