Stream: t-compiler/wg-polonius

Topic: Optimization PRs


ecstatic-morse (Dec 29 2020 at 19:34, on Zulip):

I haven't been around much lately, but I had some down time around the holidays and spent some of it reading large portions of both chalk and polonius. Polonius was a bit more approachable for me, and seems to have fewer contributors even during the sprints. I know the project currently in a fallow period, but I'm wondering if there's any point in filing PRs to optimize datafrog/polonius. Perhaps to be reviewed during the next sprint?

My impression from my quick review is that the rulesets will change quite a bit before they are production ready (perhaps this is more true for the initialization/liveness analysis, which seems pretty slow, than for the borrow checker). With that in mind, It would be better to focus on code that is totally encapsulated and are very likely to be hot even if the ruleset changes significantly. Relation::merge is a good example, and I have a few other spots in mind.

That said, I don't think it's the best use of WG-polonius time to be reviewing unsafe code and reproducing micro-benchmarks. Perhaps there's other issues I could work on asynchronously? For example, I think I could removeLocation::All, although I suspect it would be a lot harder for me than for members of this WG. Improving the datafrog documentation and experimenting with O(1) relations (mainly for cfg_edge) also came to mind.

lqd (Dec 29 2020 at 19:35, on Zulip):

hey Dylan :)

ecstatic-morse (Dec 29 2020 at 19:36, on Zulip):

Hey Rémy

lqd (Dec 29 2020 at 19:36, on Zulip):

the rulesets will likely change indeed, not all the upfront design work, nor integration with chalk has been worked out yet

lqd (Dec 29 2020 at 19:37, on Zulip):

my personal hunch is I feel it's unlikely we can get datalog production ready

lqd (Dec 29 2020 at 19:37, on Zulip):

but who knows

lqd (Dec 29 2020 at 19:38, on Zulip):

your experience with rustc's dataflow could say that looking at move/init it's unlikely as well

ecstatic-morse (Dec 29 2020 at 19:38, on Zulip):

Got it. I meant to ask, was there an initial prototype that just used Souffle?

lqd (Dec 29 2020 at 19:38, on Zulip):

probably not a prototype per se

lqd (Dec 29 2020 at 19:38, on Zulip):

however

lqd (Dec 29 2020 at 19:38, on Zulip):

the rules themselves were definitely tried there

lqd (Dec 29 2020 at 19:39, on Zulip):

and there was a timely/differential dataflow prototype

lqd (Dec 29 2020 at 19:39, on Zulip):

you couldn't use the soufflé rules inside rustc, only their tools were used IIRC

lqd (Dec 29 2020 at 19:40, on Zulip):

at least I never did rust + soufflé integration

lqd (Dec 29 2020 at 19:41, on Zulip):

I seem to recall (that was a couple years ago) there were some caveats where some of the things allowed in DD/datafrog were not acceptable datalog, but there probably was a workaround

ecstatic-morse (Dec 29 2020 at 19:41, on Zulip):

I was a bit surprised that development seemed to jump right to differential dataflow and then to datafrog.

ecstatic-morse (Dec 29 2020 at 19:41, on Zulip):

I was wondering if I had missed a step in the process

lqd (Dec 29 2020 at 19:42, on Zulip):

IIRC there were some other repos from Niko

lqd (Dec 29 2020 at 19:42, on Zulip):

but since that was part of NLL back then rather than polonius per se

lqd (Dec 29 2020 at 19:42, on Zulip):

it's hard for me to say

ecstatic-morse (Dec 29 2020 at 19:43, on Zulip):

I didn't see a triejoin/leapjoin equivalent in Souffle, so I thought maybe that was it. Other than that, they're both semi-naive solvers AFAICT.

lqd (Dec 29 2020 at 19:43, on Zulip):

right

lqd (Dec 29 2020 at 19:43, on Zulip):

they have richer data structures and parallelism

ecstatic-morse (Dec 29 2020 at 19:43, on Zulip):

Okay, but maybe it's better for me to not work on datafrog so much?

lqd (Dec 29 2020 at 19:44, on Zulip):

as for your other questions, whether it's worth our time to review microbenchmarks: the work you did is valuable and I'm personally very thankful

lqd (Dec 29 2020 at 19:44, on Zulip):

it's not wasted work at all

lqd (Dec 29 2020 at 19:44, on Zulip):

whether it's the best use of your time is another interesting question

lqd (Dec 29 2020 at 19:45, on Zulip):

there are some interesting things to do in datafrog for sure

lqd (Dec 29 2020 at 19:46, on Zulip):

especially wrt to data representation, possibly parallelism (the semi naive evaluation is trivially parallelisable but that won't probably work super well :) probabilistic data structures

lqd (Dec 29 2020 at 19:46, on Zulip):

(eg for the latter point some relations are only used in antijoins)

lqd (Dec 29 2020 at 19:47, on Zulip):

for the 1st, more compact data would also help, but since datafrog is super tied to binary search and sorting and all, it's not super clear how to tackle that at first

lqd (Dec 29 2020 at 19:48, on Zulip):

or ways to limit copying in dataflow-like situations, you know all this

ecstatic-morse (Dec 29 2020 at 19:49, on Zulip):

Yeah, the O(1) relation change is going would be pretty invasive, but I think it's worth doing if we're committed to datafrog as the solver going forward, and I think I could make it happen.

ecstatic-morse (Dec 29 2020 at 19:49, on Zulip):

lqd said:

my personal hunch is I feel it's unlikely we can get datalog production ready

But this comment makes it feel like that's maybe not the case?

lqd (Dec 29 2020 at 19:50, on Zulip):

that's my own feeling, I'm not sure it's shared by others

lqd (Dec 29 2020 at 19:50, on Zulip):

it's not going away that's for sure, whether it'll be the only way to do the analyses is another idea

lqd (Dec 29 2020 at 19:51, on Zulip):

like, if we need to run the rules on crater, and some of the huge functions out there, all this would still be worthwhile

lqd (Dec 29 2020 at 19:53, on Zulip):

whether it'd be improving polonius itself, or the rules, or datafrog

ecstatic-morse (Dec 29 2020 at 19:55, on Zulip):

lqd said:

that's my own feeling, I'm not sure it's shared by others

Is the problem with datafrog the frontend? You want something that accepts actual datalog rules so you can iterate faster?

lqd (Dec 29 2020 at 19:55, on Zulip):

it's also true that reviews of datafrog would probably be tougher

lqd (Dec 29 2020 at 19:56, on Zulip):

not especially to iterate faster, I have a datafrog generator to help with that https://github.com/lqd/datapond/, and Vytautas has revamped it completely for "rustql" (I forgot the new name of the project) with a proc-macro and everything

ecstatic-morse (Dec 29 2020 at 19:57, on Zulip):

Is it architectural (i.e. you want something that uses a different strategy or defaults to BTrees or works out-of-memory)? Or just general code quality?

lqd (Dec 29 2020 at 19:57, on Zulip):

just to be more efficient than allocations in a hot loop, for the cases when it can be avoided; say reachability queries, or TC over dynamic graphs

lqd (Dec 29 2020 at 19:59, on Zulip):

a different strategy could be interesting to avoid some of the problems of bottom-up evaluation, but I've been trying for a while to transform the rules to demand data like a top-down evaluation would

lqd (Dec 29 2020 at 20:00, on Zulip):

I know the literature always mention scalability to huge codebases, but I don't remember off hand what the actual numbers are, it seems like they're comparing "queries that resolve overnight" to "a few seconds"

lqd (Dec 29 2020 at 20:00, on Zulip):

a few seconds is orders of magnitude too slow

lqd (Dec 29 2020 at 20:01, on Zulip):

I'm not sure what is attainable :)

lqd (Dec 29 2020 at 20:02, on Zulip):

but it's likely that a kind of "graded borrow-checking" would be required, that is, there'd need to be a strong focus on the lowest grades, like the location-insensitive pass

ecstatic-morse (Dec 29 2020 at 20:02, on Zulip):

lqd said:

a different strategy could be interesting to avoid some of the problems of bottom-up evaluation, but I've been trying for a while to transform the rules to demand data like a top-down evaluation would

I'm way outside my area of expertise here, but bottom-up seemed like the correct paradigm to me since we want to find all errors in the function. It seemed like top-down (and optimizations like magic-set) are more for queries with one or more constants in the head.

lqd (Dec 29 2020 at 20:04, on Zulip):

both should find the same errors I feel, but you could also say that we're checking specific loans and origins

lqd (Dec 29 2020 at 20:04, on Zulip):

that is only the loans which are invalidated, and the origins through which they flow need tracking

ecstatic-morse (Dec 29 2020 at 20:05, on Zulip):

My long-term ideas were more in line with your first comment, using specialized strategies for TC and different storage for dense relations.

lqd (Dec 29 2020 at 20:05, on Zulip):

location-insensitivity seems super important to me, and my prototypes on this analysis, without datafrog, were promising

lqd (Dec 29 2020 at 20:05, on Zulip):

right

ecstatic-morse (Dec 29 2020 at 20:06, on Zulip):

I haven't read the location-insensitive or optimized rules at all, just the naive ones.

lqd (Dec 29 2020 at 20:06, on Zulip):

the location-insensitive ones are super simple thankfully

lqd (Dec 29 2020 at 20:07, on Zulip):

but it's ultimately simple reachability queries

lqd (Dec 29 2020 at 20:07, on Zulip):

which to me doesn't map nicely to datalog for efficiency ^^

lqd (Dec 29 2020 at 20:07, on Zulip):

but it does work :)

lqd (Dec 29 2020 at 20:08, on Zulip):

I don't know if I was really helpful answering your questions, but gotta go eat dinner

ecstatic-morse (Dec 29 2020 at 20:08, on Zulip):

Yeah, the "ideological purity" of Horn clauses isn't as appealing when you need to implement a custom solver for each one

ecstatic-morse (Dec 29 2020 at 20:10, on Zulip):

No worries, thanks for chatting

lqd (Dec 29 2020 at 20:10, on Zulip):

specialized relations would be interesting for sure, as well as cleverness in the rules, or fact generation

lqd (Dec 29 2020 at 20:10, on Zulip):

e.g IIRC we duplicate MIR locations into 2 polonius points, I think, to easily order effects

ecstatic-morse (Dec 29 2020 at 20:11, on Zulip):

I have a few more datafrog micro-optimizations that I'll post tonight since they're so easy and have pretty big wins, then I'll spend some time trying to grok the other rulesets. Seems like Jake Goulding might be able to do a preliminary review.

lqd (Dec 29 2020 at 20:11, on Zulip):

it's possible that with more clever rules, that wouldn't be needed at all, and severely limit the size of the cfg which is would be helpful on the whole computation

lqd (Dec 29 2020 at 20:12, on Zulip):

I have elsewhere prototypes about CFG compression and the likes

lqd (Dec 29 2020 at 20:12, on Zulip):

I forgot to say one thing: we were more focused on completeness and correctness at first

lqd (Dec 29 2020 at 20:13, on Zulip):

to be sure that the final rules would be good and bug-free (and that they don't, say, be too lenient in what they accept) before focusing on efficiency

lqd (Dec 29 2020 at 20:14, on Zulip):

that ties into what you were saying about whether the rules would change before optimizing

lqd (Dec 29 2020 at 20:15, on Zulip):

we're not there yet, in some sense we're not far, but there definitely are unknown unknowns :)

ecstatic-morse (Dec 29 2020 at 20:18, on Zulip):

Well lemme know if you think of any good discrete tasks. Otherwise I have some ideas I want to explore. And go eat XD.

lqd (Dec 29 2020 at 20:19, on Zulip):

it's also very likely that Niko would have ideas on what cool things you could do :)

lqd (Dec 29 2020 at 20:20, on Zulip):

alright, cheers, bbl

lqd (Dec 29 2020 at 20:20, on Zulip):

(also: thanks so much)

lqd (Dec 29 2020 at 21:04, on Zulip):

so yeah, datalog is not going away, and work on datafrog could be more stable than the rules per se (although both will be ultimately needed)

lqd (Dec 29 2020 at 21:30, on Zulip):

(also: many thanks to Jake Goulding for the early reviews on that PR :)

Josh Triplett (Dec 30 2020 at 07:12, on Zulip):

@ecstatic-morse Thank you for working on this. I'm excited to see more activity on polonius.

Josh Triplett (Dec 30 2020 at 07:12, on Zulip):

Including performance.

Josh Triplett (Dec 30 2020 at 07:13, on Zulip):

I've used it on nightly a few times, and I'm always impressed by what it can handle and how comfortable it feels.

lqd (Dec 30 2020 at 08:45, on Zulip):

hopefully it wasn't one of those cases where the parts that are not done yet missed actual errors :sweat: (we really should add the "this is an unfinished feature" warning like const generics had)

Vytautas Astrauskas (Dec 31 2020 at 13:54, on Zulip):

lqd said:

not especially to iterate faster, I have a datafrog generator to help with that https://github.com/lqd/datapond/, and Vytautas has revamped it completely for "rustql" (I forgot the new name of the project) with a proc-macro and everything

It was reborn as Qrates. Let me know if you have any questions about what is implemented.

ecstatic-morse (Jan 01 2021 at 20:12, on Zulip):

I wrote up some of my thoughts across a few thread, but I'm worried that I've been focused on the wrong problem. There's no point in optimizing things if we can't express these rules in datalog, and https://smallcultfollowing.com/babysteps/blog/2019/01/21/hereditary-harrop-region-constraints/ makes it seem like that question might not be settled. Has there been any progress on that front in the intervening years?

lqd (Jan 01 2021 at 23:16, on Zulip):

there has been some

lqd (Jan 01 2021 at 23:17, on Zulip):

the tentative plan (which is mostly inside niko's head, and possibly parts of matthew's) is to have chalk and polonius cooperate on that problem

lqd (Jan 01 2021 at 23:18, on Zulip):

<handwave>like chalk would see the higher-order constraints and spit out first-order constraints for polonius as-is to solve</handwave>

lqd (Jan 01 2021 at 23:20, on Zulip):

there has been some related work in chalk, I recall seeing branches from niko on that subject, but that's about all I remember

lqd (Jan 01 2021 at 23:25, on Zulip):

that being said, chalk is more active and with more contributors, and has had very little perf work done, that may be a better use of your time and skills

lqd (Jan 01 2021 at 23:26, on Zulip):

it would not be wasted at all

ecstatic-morse (Jan 01 2021 at 23:37, on Zulip):

Understood. I'm still drawn to polonius. I guess because it seems a bit narrower in scope than chalk. I'll keep trying to gain some intuition for the borrow checking rules. Maybe I'll be useful during the next sprint.

Last update: Jun 20 2021 at 00:00UTC