At the moment, dataflow iteration begins processing blocks in order of ascending IDs (
BasicBlock(1), etc.). However, for maximum efficiency during forward dataflow, basic blocks should be processed in reverse post-order. My guess is that the current ID ordering is pretty close to RPO for most functions, but it might be worth doing a perf run anyways.
Additionally, there is currently a backward dataflow analysis for liveness which is also iterating over blocks in order of ascending IDs. I think this could be sped up by initializing the
WorkQueue in postorder (or alternatively reverse post-order on the reverse CFG).
Does anyone have an opinion on whether this is worth pursuing? The implementation is pretty simple, I would just need someone to trigger a perf run.
Worth doing, yes.
@nagisa, should I open an issue for this as well to solicit discussion? Or just a PR?
PR is sufficient.
Okay, I think I'll do the forward and reverse cases separately.
That way they can get benchmarked independently
Forward is #62062
Reverse is #62063
@nagisa bors failed for #62062. Both PRs include a change to the API for
WorkQueue since I assumed they both had to be based on
How should I proceed?
They both should be independently based on master, yes. This looks like an unrelated failure to me.
@nagisa Some tests failed since there were unreachable basic blocks in the
mir::Body. The easiest way to preserve the original behavior is to re-add all basic-block indexes after doing traversal, which will pick up any unreachable ones. A further optimization might be to add a dead block elimination pass that gets run before dataflow, but that is probably outside the scope of this PR.
Results are in for forward analysis: https://perf.rust-lang.org/compare.html?start=d6884aedd5b8709c44c849d055d454db2f78042e&end=0905d6a630cb4afc3f894f9e91c1a7a20c32416b
Seems pretty :shrug: to me? This suggests that either forward dataflow is a small component of perf, or that basic block ordering is pretty close to RPO.
deep-vector-opt tops the list for both forwards and backwards ordering, despite containing only a single basic block. I'm thinking maybe it starts with a panic edge for each
vec.insert which then gets optimized away?
There's the source for
remember that ordering the blocks in whatever order is also work.
though little enough of it that I think it does not matter.
I might try logging the number of dataflow iterations (maybe as a percentage of basic blocks as well). In the backwards case, I expect we're doing about twice as many iterations as we need to, but CFGs are small enough that it doesn't outweigh the overhead from doing an extra traversal in most cases.
I'll try testing locally.
Also, how do I print the time taken in each phase of the compiler? Is it granular enough to get time in dataflow?
Found it :smile: