One of the reasons polonius is slower than
rustc is that it operates on a CFG where every node has exactly one (technically half of one) statement. This is hardly groundbreaking, but we could speed things up by operating on the CFG as it is stored in
rustc. To get more familiar with the language, I tried to express the way the dataflow solver operates as a set of datalog rules:
// Block transfer functions trans_gen(var, bb) :- gen(var, [bb, i]), !kill(var, [bb, j]), j > i. trans_kill(var, bb) :- kill(var, [bb, i]), !gen(var, [bb, j]), j > i. // Interblock (global) set_at_block_exit(var, bb) :- set_at_block_entry(var, bb), !trans_kill(var, bb). set_at_block_exit(var, bb) :- trans_gen(var, bb). set_at_block_entry(var, bb) :- set_at_block_exit(var, pred), block_edge(pred, bb). // Intrablock (local) set_at(var, [bb, i]) :- gen(var, [bb, i-1]). set_at(var, [bb, 0]) :- set_at_block_entry(var, bb). set_at(var, [bb, i]) :- set_at(var, [bb, i-1]), !kill(var, [bb, i-1]).
If you replace "gen" with "becomes_init", "kill" with "becomes_unint", and "set" with "is_init", you have the start of an initialization analysis (ignoring move paths).
I'm pretty sure this is legal datalog, although it includes arithmetic statements and record types, which are supported in Souffle but not in datafrog (although I suppose you could emulate them with relations).
Anyways, this is one way to do "CFG compression" as discussed in rust-lang/polonius#20. I think it's different than the approach that lqd took in their experiment.
While it's more invasive, it also maps nicely to the internal structure used by
What would be really cool is to generate rules for the basic block CFG from the ones for the single-node CFG. I'm not sure if there's a way to do this in general, however.
(we've indeed talked about doing the analyses on basic blocks before)
very interesting in any case
my own thoughts were: it'll likely require to clone the dataflow state between blocks, even when we don't need it for DAGs
(and even if that's somewhat off-topic for this thread, the way I did CFG compression was indeed different: I did not really try changing the rules, only removed points, and the other facts associated, if there were no meaningful contributions to the loan or subset propagations, which is similar in spirit to some of the
DatafrogOpt variant rules; especially with an eye towards computing errors rather than the loan contents of each origin at each point)
but also I suspect we could do some of the computation on SCCs -- probably for the subset graph here rather than the CFG's :) since location-insensitivity is about reachability -- and the iterated dominance frontier for propagating loans