Stream: t-compiler/wg-polonius

Topic: Alternative `Relation` representations

ecstatic-morse (Jan 01 2021 at 19:51, on Zulip):

I alluded to this in #t-compiler/wg-polonius > Optimization PRs, but I don't see a fundamental reason why Relations must be stored as lists of tuples. One obstacle is the fact that Leaper currently operates on Vec<&'_ Val> instead of Vec<Val>, but this can be changed (see rust-lang/datafrog#19).

ecstatic-morse (Jan 01 2021 at 19:54, on Zulip):

We spend a lot of time binary searching cfg_edge while running the initialization rules, and I suspect there's other dense relations that would benefit from O(1) lookup on their keys

lqd (Jan 01 2021 at 22:27, on Zulip):

I don't think there are fundamental reasons for that, just that it's easy to implement a datafrog engine that way

lqd (Jan 01 2021 at 22:27, on Zulip):

some context which might be important: I think Frank did most of this in a weekend or something

lqd (Jan 01 2021 at 22:28, on Zulip):

and that was enough to be better than using DD (both in compile time and running time)

lqd (Jan 01 2021 at 22:37, on Zulip):

Leapers did seem to offer a way to separate the representation (or at least offer a way to make different representations work together), but now that I think about it I've only experimented with different relations for antijoins with bloom and counting quotient filters, without changing or abstracting the representation. This impression was probably wrong, I didn't even hit the API snag you mention

lqd (Jan 01 2021 at 22:57, on Zulip):

(also note: the move/init rules you mention above are incomplete, probably contain bugs, and are currently doing more work than necessary; albin and niko worked on them last sprint in this thread if you're interested in them specifically)

ecstatic-morse (Jan 01 2021 at 23:39, on Zulip):

lqd said:

some context which might be important: I think Frank did most of this in a weekend or something

Souffle also doesn't use an "adjacency list" representation, it's BTrees or a custom radix tree (called a brie), so it's not just datafrog, but point taken.

lqd (Jan 04 2021 at 13:40, on Zulip):

if we're optimizing binary search, I was thinking some Relations could also benefit from having a different layout while still using the same representation, like using van Emde Boas or Eytzinger layouts (classic paper from pvk for our joins with static inputs

ecstatic-morse (Jan 04 2021 at 21:14, on Zulip):

I wonder how much time we spend doing range queries vs. single-element lookups. sorted lists are optimal in terms of instructions once the first element is found, and should be close to cache-optimal as well. I wonder how vEB trees or B-trees stack up here.

ecstatic-morse (Jan 04 2021 at 21:14, on Zulip):

Also, I wonder how much of datafrog's performance is due to the way the stable part of a variable is optimized for writes

ecstatic-morse (Jan 04 2021 at 21:15, on Zulip):

(Not sure what to call the list of sorted lists with exponentially increasing lengths, it's kinda like an in-memory log-structured merge tree I guess?)

ecstatic-morse (Jan 04 2021 at 21:16, on Zulip):

It's gotta be a lot faster than writing to the middle of a B-tree.

ecstatic-morse (Jan 04 2021 at 21:19, on Zulip):

(One of the optimizations I was gonna investigate was a larger minimum capacity for the smallest list in stable. It should be a cache-line's worth of data, but I think it's 1 right now?)

ecstatic-morse (Jan 04 2021 at 21:22, on Zulip):

Anyways, I'm kind of off optimizing datafrog right now. Time would be better spent adding features (arithmetic predicates, automatic stratification, etc.) IMO.

lqd (Jan 04 2021 at 21:28, on Zulip):

it very well could be

lqd (Jan 04 2021 at 21:30, on Zulip):

(esp. if it were to be used in rustc for some of the various analyses?)

Last update: Jun 20 2021 at 01:00UTC