Stream: t-compiler/wg-nll

Topic: datafrog-blog-post-feedback


nikomatsakis (May 19 2018 at 10:35, on Zulip):

@Frank McSherry total nit, but I wonder if impl Trait would make this more readable:

before:

fn join_helper<Key: Ord, Val1, Val2, F: FnMut(&Key, &Val1, &Val2)>(
    input1: &Relation<(Key,Val1)>,
    input2: &Relation<(Key,Val2)>,
    mut result: F) {

    // do some stuff probably.

}

after (I'm also using the "rustfmt" style here):

fn join_helper<Key: Ord, Val1, Val2>(
    input1: &Relation<(Key,Val1)>,
    input2: &Relation<(Key,Val2)>,
    mut result: impl FnMut(&Key, &Val1, &Val2),
) {

    // do some stuff probably.

}
Frank McSherry (May 19 2018 at 10:37, on Zulip):

Interesting! I hadn't got around to testing impl Trait yet, but that does seem more tasteful.

nikomatsakis (May 19 2018 at 10:37, on Zulip):

so far it all reads very nicely to me — but then i'm definitely knee deep in this space right now :)

nikomatsakis (May 19 2018 at 10:38, on Zulip):

I find impl Trait in argument position is basically made for closures ;)

nikomatsakis (May 19 2018 at 10:38, on Zulip):

@Frank McSherry in this line, you use Fn not FnMut, is there a reason for that (I don't immediately see why):

pub fn join_into<Key: Ord, Val1: Ord, Val2: Ord, Result: Ord, F: Fn(&Key, &Val1, &Val2)->Result>(
nikomatsakis (May 19 2018 at 10:39, on Zulip):

though I guess most of the helpers wouldn't mutate environmental state anyway

Frank McSherry (May 19 2018 at 10:39, on Zulip):

Actually, while I have you, why must I write this:

pub fn join_into<Key: Ord, Val1: Ord, Val2: Ord, Result: Ord, F: Fn(&Key, &Val1, &Val2)->Result>(
    input1: &Variable<(Key, Val1)>,
    input2: &Variable<(Key, Val2)>,
    output: &Variable<Result>,
    logic: F) {

    let mut results = Vec::new();

    let recent1 = input1.recent.borrow();
    let recent2 = input2.recent.borrow();

    for batch2 in input2.stable.borrow().iter() {
        join_helper(&recent1, &batch2, |k,v1,v2| results.push(logic(k,v1,v2)));
    }

    for batch1 in input1.stable.borrow().iter() {
        join_helper(&batch1, &recent2, |k,v1,v2| results.push(logic(k,v1,v2)));
    }

    join_helper(&recent1, &recent2, |k,v1,v2| results.push(logic(k,v1,v2)));

    output.insert(Relation::from_vec(results));
}

and not this

pub fn join_into<Key: Ord, Val1: Ord, Val2: Ord, Result: Ord, F: Fn(&Key, &Val1, &Val2)->Result>(
    input1: &Variable<(Key, Val1)>,
    input2: &Variable<(Key, Val2)>,
    output: &Variable<Result>,
    logic: F) {

    let mut results = Vec::new();

    let recent1 = input1.recent.borrow();
    let recent2 = input2.recent.borrow();

    let mut closure = |k,v1,v2| results.push(logic(k,v1,v2));

    for batch2 in input2.stable.borrow().iter() {
        join_helper(&recent1, &batch2, closure);
    }

    for batch1 in input1.stable.borrow().iter() {
        join_helper(&batch1, &recent2, closure);
    }

    join_helper(&recent1, &recent2, closure);

    output.insert(Relation::from_vec(results));
}
nikomatsakis (May 19 2018 at 10:40, on Zulip):

actually I'm not sure that you must :)

Frank McSherry (May 19 2018 at 10:40, on Zulip):

I get a bunch of errors (can link) that seem to be about the lifetimes in the for <'r, 's, 't> HRLs.

nikomatsakis (May 19 2018 at 10:40, on Zulip):

oh

Frank McSherry (May 19 2018 at 10:41, on Zulip):
Echidnatron% cargo test
   Compiling datafrog v0.1.0 (file:///Users/mcsherry/Projects/datafrog)
error[E0631]: type mismatch in closure arguments
  --> src/join.rs:19:9
   |
16 |     let mut closure = |k,v1,v2| results.push(logic(k,v1,v2));
   |                       -------------------------------------- found signature of `fn(&Key, &Val1, &Val2) -> _`
...
19 |         join_helper(&recent1, &batch2, closure);
   |         ^^^^^^^^^^^ expected signature of `for<'r, 's, 't0> fn(&'r Key, &'s Val1, &'t0 Val2) -> _`
   |
note: required by `join::join_helper`
  --> src/join.rs:51:1
   |
51 | fn join_helper<K: Ord, V1, V2, F: FnMut(&K,&V1,&V2)>(mut slice1: &[(K,V1)], mut slice2: &[(K,V2)], mut result: F) {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0271]: type mismatch resolving `for<'r, 's, 't0> <[closure@src/join.rs:16:23: 16:61 results:_, logic:_] as std::ops::FnOnce<(&'r Key, &'s Val1, &'t0 Val2)>>::Output == ()`
  --> src/join.rs:19:9
   |
19 |         join_helper(&recent1, &batch2, closure);
   |         ^^^^^^^^^^^ expected bound lifetime parameter, found concrete lifetime
   |
note: required by `join::join_helper`
  --> src/join.rs:51:1
   |
51 | fn join_helper<K: Ord, V1, V2, F: FnMut(&K,&V1,&V2)>(mut slice1: &[(K,V1)], mut slice2: &[(K,V2)], mut result: F) {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
nikomatsakis (May 19 2018 at 10:41, on Zulip):

yeah you would need type annotations on the parameters

nikomatsakis (May 19 2018 at 10:41, on Zulip):

the reason is that, to infer which parameters have HRL, we use the "expected type"

nikomatsakis (May 19 2018 at 10:41, on Zulip):

that is, we look at the fn you are calling and the declared type on its parameter

nikomatsakis (May 19 2018 at 10:41, on Zulip):

if there is no expected type (as in the new code), then we will only infer a higher-ranked lifetime if it is manually given to us

nikomatsakis (May 19 2018 at 10:42, on Zulip):

I hope to improve this actually

nikomatsakis (May 19 2018 at 10:42, on Zulip):

but inferring higher-ranked things is a mite tricky so it may not work out

Frank McSherry (May 19 2018 at 10:42, on Zulip):

Re the Fn vs FnMut, no reason that I can think of. I think morally I imagined it would be a functional transformation of the inputs, but I can't see why FnMut shouldn't be permitted. Lemme fix that up too.

nikomatsakis (May 19 2018 at 10:42, on Zulip):

(I think that, if I could go back in time, I would require that closure's parameters have a type given if there is no expected type, but ...)

nikomatsakis (May 19 2018 at 10:43, on Zulip):

so iow something like this would probably work:

nikomatsakis (May 19 2018 at 10:43, on Zulip):
    let mut closure = |k: &Key, v1: &Val1, v2: &Val2 results.push(logic(k,v1,v2));
nikomatsakis (May 19 2018 at 10:43, on Zulip):

or whatever the right types are

Frank McSherry (May 19 2018 at 10:45, on Zulip):

okies, that seems likely to work. getting errors now about the fact that the closure has a &mut in it, so can't be cloned; will have to take closure by reference rather than owned, I think but should be ok.

Frank McSherry (May 19 2018 at 10:45, on Zulip):

Does logic: &mut impl Fn(&blah) work as well? (( also trying it out, but .. ))

nikomatsakis (May 19 2018 at 10:46, on Zulip):

yes that would work

nikomatsakis (May 19 2018 at 10:46, on Zulip):

however

nikomatsakis (May 19 2018 at 10:46, on Zulip):

you should not have to modify the callee I don't thnk

nikomatsakis (May 19 2018 at 10:46, on Zulip):

because FnMut is implemented for &mut F where F: FnMut

nikomatsakis (May 19 2018 at 10:46, on Zulip):

(iirc)

nikomatsakis (May 19 2018 at 10:46, on Zulip):

so you should be able to get by with just logic: impl FnMut(..)

nikomatsakis (May 19 2018 at 10:46, on Zulip):

and then on the caller side doing ..., &mut closure)

nikomatsakis (May 19 2018 at 10:47, on Zulip):

though sometimes I forget just which adapters we actually have and I could be wrong :)

Frank McSherry (May 19 2018 at 10:49, on Zulip):

Cool, this seems to work:

pub fn join_into<Key: Ord, Val1: Ord, Val2: Ord, Result: Ord>(
    input1: &Variable<(Key, Val1)>,
    input2: &Variable<(Key, Val2)>,
    output: &Variable<Result>,
    logic: &mut impl FnMut(&Key, &Val1, &Val2)->Result) {

    let mut results = Vec::new();

    let recent1 = input1.recent.borrow();
    let recent2 = input2.recent.borrow();

    {
        let mut closure = |k: &Key, v1: &Val1, v2: &Val2| results.push(logic(k,v1,v2));

        for batch2 in input2.stable.borrow().iter() {
            join_helper(&recent1, &batch2, &mut closure);
        }

        for batch1 in input1.stable.borrow().iter() {
            join_helper(&batch1, &recent2, &mut closure);
        }

        join_helper(&recent1, &recent2, &mut closure);
    }

    output.insert(Relation::from_vec(results));
}
Frank McSherry (May 19 2018 at 10:50, on Zulip):

Scope needed to allow closure to drop and release borrow on results, so that it can be moved into insert.

nikomatsakis (May 19 2018 at 10:50, on Zulip):

Well enough that I'm still blocked on them confirming that we've actually computed the same thing, before getting too sassy.

:rolling_on_the_floor_laughing:

Frank McSherry (May 19 2018 at 10:50, on Zulip):

Yeah, well you can't see how long it takes them. Want to guess the speed-up?

Frank McSherry (May 19 2018 at 10:51, on Zulip):

This is ASPLOS, one of the top architecture/systems conferences, right?

nikomatsakis (May 19 2018 at 10:51, on Zulip):

confirm

Frank McSherry (May 19 2018 at 10:52, on Zulip):

It's about a 200x speed-up. And it's totally the right answer. Same number of tuples as them, but they rounded to millions so hard to be 100% certain.

lqd (May 19 2018 at 10:52, on Zulip):

introducing :frog: COST

nikomatsakis (May 19 2018 at 10:52, on Zulip):

wow, I was going to guess 10x — then realized that must be too conservative

nikomatsakis (May 19 2018 at 10:52, on Zulip):

so I was going to up to 100x :)

Frank McSherry (May 19 2018 at 10:52, on Zulip):

they take 11mins to do it; 3 seconds of compute with :frog:.

nikomatsakis (May 19 2018 at 10:53, on Zulip):

reading :frog:'s implementation reminded me of some excellent lectures I was watching by Alex Stepanov

Frank McSherry (May 19 2018 at 10:53, on Zulip):

That's doing null pointer reachability on httpd. If you do it on postgres, it takes them 143.8 minutes. Guess for :frog:.

nikomatsakis (May 19 2018 at 10:53, on Zulip):

he often claimed "computers are array processing machines", and — paraphrased — using any other data structure is folly

nikomatsakis (May 19 2018 at 10:53, on Zulip):

uh..10s :)

nikomatsakis (May 19 2018 at 10:54, on Zulip):

/me just picks a number out of the air

Frank McSherry (May 19 2018 at 10:54, on Zulip):

5s to load the data, then 9s to complete the compute.

nikomatsakis (May 19 2018 at 10:54, on Zulip):

wow

lqd (May 19 2018 at 10:54, on Zulip):

amazing

Frank McSherry (May 19 2018 at 10:54, on Zulip):

Current "bee in bonnet" is "people who have to invent systems because the existing systems are so bad".

Frank McSherry (May 19 2018 at 10:55, on Zulip):

Arguably the ASPLOS authors are sympathetic characters here. They took a Stanford system, and instead of 3s, or even 11 minutes, it takes them (Stanford) four hours to do the httpd null pointer analysis. =/

nikomatsakis (May 19 2018 at 10:56, on Zulip):

I am definitely feeling a desire to read those papers now

Frank McSherry (May 19 2018 at 10:56, on Zulip):

So they (the ASPLOS authors) are in a bit of a bind, in that the "big data" systems are just not suitable for what they are trying to do. And so they learn a bit about how to do this, but are just learning as they go.

nikomatsakis (May 19 2018 at 10:57, on Zulip):

PS your link for "real paper" leads to https://github.com/frankmcsherry/blog/blob/master/posts

nikomatsakis (May 19 2018 at 10:57, on Zulip):

which is maybe not what you meant

nikomatsakis (May 19 2018 at 10:57, on Zulip):

are they distributing the computation over a big cluster or something?

nikomatsakis (May 19 2018 at 10:57, on Zulip):

i.e., is part of the problem that they are paying a lot of synchronization overhead, but the quantity of data doesn't justify that?

Frank McSherry (May 19 2018 at 10:58, on Zulip):

Oh, right a few broken links at the moment sorry. (typed while out and about w/o internet).

nikomatsakis (May 19 2018 at 10:58, on Zulip):

no worries

nikomatsakis (May 19 2018 at 10:58, on Zulip):

just thought I'd mention it

Frank McSherry (May 19 2018 at 10:58, on Zulip):

No, this is single machine stuff they are doing. https://www.ics.uci.edu/~guoqingx/papers/wang-asplos17.pdf

Frank McSherry (May 19 2018 at 11:00, on Zulip):

That impl Fn stuff is great. Much cleaner, even if just syntax. :)

lqd (May 19 2018 at 11:53, on Zulip):

We are going to reproduce their results!

Readers will be able to predict how this will end

lqd (May 19 2018 at 11:54, on Zulip):

very nice post

Last update: Nov 22 2019 at 00:55UTC