Stream: t-compiler

Topic: Iteration order for dataflow


Dylan MacKenzie (Jun 22 2019 at 18:48, on Zulip):

At the moment, dataflow iteration begins processing blocks in order of ascending IDs (BasicBlock(0), BasicBlock(1), etc.). However, for maximum efficiency during forward dataflow, basic blocks should be processed in reverse post-order. My guess is that the current ID ordering is pretty close to RPO for most functions, but it might be worth doing a perf run anyways.

Additionally, there is currently a backward dataflow analysis for liveness which is also iterating over blocks in order of ascending IDs. I think this could be sped up by initializing the WorkQueue in postorder (or alternatively reverse post-order on the reverse CFG).

Dylan MacKenzie (Jun 22 2019 at 18:51, on Zulip):

Does anyone have an opinion on whether this is worth pursuing? The implementation is pretty simple, I would just need someone to trigger a perf run.

nagisa (Jun 22 2019 at 19:29, on Zulip):

Worth doing, yes.

Dylan MacKenzie (Jun 22 2019 at 19:33, on Zulip):

@nagisa, should I open an issue for this as well to solicit discussion? Or just a PR?

nagisa (Jun 22 2019 at 19:35, on Zulip):

PR is sufficient.

Dylan MacKenzie (Jun 22 2019 at 19:42, on Zulip):

Okay, I think I'll do the forward and reverse cases separately.

Dylan MacKenzie (Jun 22 2019 at 19:43, on Zulip):

That way they can get benchmarked independently

Dylan MacKenzie (Jun 22 2019 at 20:16, on Zulip):

Forward is #62062

Dylan MacKenzie (Jun 22 2019 at 20:48, on Zulip):

Reverse is #62063

Dylan MacKenzie (Jun 22 2019 at 20:59, on Zulip):

@nagisa bors failed for #62062. Both PRs include a change to the API for WorkQueue since I assumed they both had to be based on master.

Dylan MacKenzie (Jun 22 2019 at 20:59, on Zulip):

How should I proceed?

nagisa (Jun 22 2019 at 21:01, on Zulip):

They both should be independently based on master, yes. This looks like an unrelated failure to me.

Dylan MacKenzie (Jun 22 2019 at 21:01, on Zulip):

Ah, okay

Dylan MacKenzie (Jun 22 2019 at 21:50, on Zulip):

@nagisa Some tests failed since there were unreachable basic blocks in the mir::Body. The easiest way to preserve the original behavior is to re-add all basic-block indexes after doing traversal, which will pick up any unreachable ones. A further optimization might be to add a dead block elimination pass that gets run before dataflow, but that is probably outside the scope of this PR.

Dylan MacKenzie (Jun 23 2019 at 18:02, on Zulip):

Results are in for forward analysis: https://perf.rust-lang.org/compare.html?start=d6884aedd5b8709c44c849d055d454db2f78042e&end=0905d6a630cb4afc3f894f9e91c1a7a20c32416b

Dylan MacKenzie (Jun 23 2019 at 18:04, on Zulip):

Seems pretty :shrug: to me? This suggests that either forward dataflow is a small component of perf, or that basic block ordering is pretty close to RPO.

Dylan MacKenzie (Jun 23 2019 at 20:24, on Zulip):

And now for backward: https://perf.rust-lang.org/compare.html?start=a96ba969156d257e5d5b692946fa8fe40ed6543a&end=56b7200c037e2a958f7d907e778b87e58beb7079

Dylan MacKenzie (Jun 23 2019 at 20:29, on Zulip):

Interestingly, deep-vector-opt tops the list for both forwards and backwards ordering, despite containing only a single basic block. I'm thinking maybe it starts with a panic edge for each vec.insert which then gets optimized away?

Dylan MacKenzie (Jun 23 2019 at 20:29, on Zulip):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018

Dylan MacKenzie (Jun 23 2019 at 20:29, on Zulip):

There's the source for deep-vector

nagisa (Jun 23 2019 at 20:36, on Zulip):

remember that ordering the blocks in whatever order is also work.

nagisa (Jun 23 2019 at 20:37, on Zulip):

though little enough of it that I think it does not matter.

Dylan MacKenzie (Jun 23 2019 at 22:09, on Zulip):

I might try logging the number of dataflow iterations (maybe as a percentage of basic blocks as well). In the backwards case, I expect we're doing about twice as many iterations as we need to, but CFGs are small enough that it doesn't outweigh the overhead from doing an extra traversal in most cases.

Dylan MacKenzie (Jun 23 2019 at 22:10, on Zulip):

I'll try testing locally.

Dylan MacKenzie (Jun 23 2019 at 22:11, on Zulip):

Also, how do I print the time taken in each phase of the compiler? Is it granular enough to get time in dataflow?

Dylan MacKenzie (Jun 23 2019 at 22:12, on Zulip):

-Z time-passes

Dylan MacKenzie (Jun 23 2019 at 22:12, on Zulip):

https://forge.rust-lang.org/profile-queries.html

Dylan MacKenzie (Jun 23 2019 at 22:12, on Zulip):

Found it :smile:

Last update: Nov 22 2019 at 04:30UTC