Stream: t-compiler/wg-polonius

Topic: benchmark suite


Albin Stjerna (Mar 18 2019 at 09:56, on Zulip):

@nikomatsakis I'm a bit unsure of what to do about the benchmarking task (as well). First of all, are there any benchmarks already that I should use? Second, as I said, would I be free to use Criterion or do you work with some specific benchmarking infrastructure that I should follow? If so, is there anyone I can ask about that?

Albin Stjerna (Mar 18 2019 at 15:51, on Zulip):

I had a meeting with Tobias (my sort of shadow supervisor) today and he suggested I do something similar to what Carter does and generate benchmark input from a few popular crates (or preferably many).

nikomatsakis (Mar 19 2019 at 14:47, on Zulip):

@Albin Stjerna well, yes, we have an existing benchmark suite that the compiler uses, and it has tools for running it. You can view the "official" results on <https://perf.rust-lang.org/>. @Santiago Pastorino earlier ran this same suite, but with Polonius enabled (discussion here). Repeating this would give you a good "overall picture" of rustc compilation time, though with some caveats, in that the integration right now is not optimal.

However, I think Tobias is correct that it would be good to create more sets of benchmarking facts in polonius. Presently we have only one just set, clap-rs, which was a case that was historically problematic. The README for polonius talks about how to generate facts for a random rs file.

Which leaves the question of how to choose the cases we include. Probably we should pick some representatives from the perf.rust-lang.org benchmark suite.

nikomatsakis (Mar 19 2019 at 14:50, on Zulip):

Looking over that thread, @Santiago Pastorino posted some results already we could look at.

nikomatsakis (Mar 19 2019 at 14:52, on Zulip):

In terms of benchmark inputs, one challenge is that the polonius inputs are per fn, whereas the perf suite is testing the compilation time of an entire crate.

nikomatsakis (Mar 19 2019 at 14:54, on Zulip):

I would not pick tuple-stress (5000x overhead... hmm....) just yet, because I think that's a separate issue. Maybe we can start by picking out some of the smaller test cases that are relatively simple, e.g. keccak.

nikomatsakis (Mar 19 2019 at 14:54, on Zulip):

The sources for perf can be found here https://github.com/rust-lang-nursery/rustc-perf I believe

nikomatsakis (Mar 19 2019 at 14:55, on Zulip):

(@Santiago Pastorino is pretty busy these days but I bet they could help you with running the test suite on your own, if that thread doesn't cover it for you)

nikomatsakis (Mar 19 2019 at 14:55, on Zulip):

I guess we could start by trying to convert keccak into a benchmark just because I know it's one big fn, so that's an easy one. It's also an interesting edge case, iirc. I think it uses a macro to generate a really big control-flow graph.

nikomatsakis (Mar 19 2019 at 14:56, on Zulip):

I'm a bit surprised the location insensitive didn't help more there

nikomatsakis (Mar 19 2019 at 14:56, on Zulip):

(Actually, @Santiago Pastorino -- those results -- what exactly were they comparing, do you know? Was that "no NLL at all" vs "NLL with polonius enabled"?)

Santiago Pastorino (Mar 19 2019 at 16:07, on Zulip):

(Actually, Santiago Pastorino -- those results -- what exactly were they comparing, do you know? Was that "no NLL at all" vs "NLL with polonius enabled"?)

which ones specifically?

Santiago Pastorino (Mar 19 2019 at 16:07, on Zulip):

was everything against AST if I'm not wrong

Albin Stjerna (Mar 19 2019 at 16:08, on Zulip):

I did do a GitHub search for repositories with code enabling NLL and generated facts from about twelve of them or so, but nothing takes any measurable time to execute on my machine, except fact generation, which for some instances generated hundreds of megabytes of tuples

Albin Stjerna (Mar 19 2019 at 16:09, on Zulip):

(I figured it's not so interesting to analyse code not using NLL, in particular with the hybrid algorithm)

nikomatsakis (Mar 19 2019 at 16:53, on Zulip):

@Santiago Pastorino ok -- if they were comparing against the ast borrow checker, I guess what we really need to be doing is comparing AST vs NLL vs Polonius.

Santiago Pastorino (Mar 19 2019 at 17:14, on Zulip):

yeah, if I remember correct we had

Santiago Pastorino (Mar 19 2019 at 17:14, on Zulip):

AST vs NLL and AST vs Polonius

nikomatsakis (Mar 19 2019 at 17:46, on Zulip):

maybe we can get those numbers into a spreadsheet or something so we can compare better

Albin Stjerna (Mar 19 2019 at 18:25, on Zulip):

Do you mean mine or @Santiago Pastorino's?

Jake Goulding (Mar 19 2019 at 19:27, on Zulip):

to analyse code not using NLL

Everything in Rust 2018 will be using (some form of) NLL

Albin Stjerna (Mar 19 2019 at 20:00, on Zulip):

@Jake Goulding ok, so it means I should look for either edition = 2018 or the NLL feature flag?

Jake Goulding (Mar 19 2019 at 20:02, on Zulip):

I would think that's more accurate (it's not 100% complete as there's lots of ways to specify the edition)

Albin Stjerna (Mar 19 2019 at 21:16, on Zulip):

Here's a CSV file with the results of running Polonius' various algorithms against the crates I found, but as I mentioned they really are mostly at zero.

Albin Stjerna (Mar 19 2019 at 21:17, on Zulip):

Numbers are not per-crate but per-function as ususal

Albin Stjerna (Mar 19 2019 at 21:18, on Zulip):

(if I'm going with something like this, I should try to reverse engineer at least part of what Crater does and do that probably, but this was something of a proof-of-concept of sorts)

Albin Stjerna (Mar 20 2019 at 17:01, on Zulip):

@nikomatsakis I generated tuples on keccak and ran them through Polonius, but I still get roughly zero seconds on every function; I'm starting to suspect I'm doing something wrong when generating the inputs

Albin Stjerna (Mar 20 2019 at 17:02, on Zulip):

@Santiago Pastorino Could you share how you did to run rustc-perf with Polonius? I haven't managed to figure out how to do that from reading the READMEs in the repository

nikomatsakis (Mar 20 2019 at 17:44, on Zulip):

@Albin Stjerna did you read the topic, there was some discussion about how to do it in there

nikomatsakis (Mar 20 2019 at 17:48, on Zulip):

Have you tried just running the benchmark suite unedited?

nikomatsakis (Mar 20 2019 at 17:49, on Zulip):

You might ping @simulacrum -- if they're around, they might be able to help

nikomatsakis (Mar 20 2019 at 17:49, on Zulip):

It looks like to enable polonius, the idea was to edit the sources and add the command line parameter for now

Albin Stjerna (Mar 20 2019 at 18:42, on Zulip):

Have you tried just running the benchmark suite unedited?

I didn't manage to get the build command for collector to produce any binaries, but it compiles now.

Albin Stjerna (Mar 20 2019 at 18:43, on Zulip):

Ok, so the settings enabled when benchmarking are not set in perf/collector but in rustc itself? So I guess I should compile a local copy of rustc and run that

Albin Stjerna (Mar 20 2019 at 18:54, on Zulip):

Ah ok I need Linux to run the performance benchmarks

nikomatsakis (Mar 20 2019 at 19:09, on Zulip):

Ok, so the settings enabled when benchmarking are not set in perf/collector but in rustc itself? So I guess I should compile a local copy of rustc and run that

wait no

nikomatsakis (Mar 20 2019 at 19:10, on Zulip):

@Albin Stjerna they are set in the perf/collector -- you can add some custom flags

nikomatsakis (Mar 20 2019 at 19:10, on Zulip):

Ah ok I need Linux to run the performance benchmarks

oh :)

Albin Stjerna (Mar 20 2019 at 19:11, on Zulip):

Ah ok I need Linux to run the performance benchmarks

oh :)

It's fine, I already have a Vagrant box up

Albin Stjerna (Mar 20 2019 at 19:11, on Zulip):

Just surprised :)

Albin Stjerna (Mar 20 2019 at 19:12, on Zulip):

Albin Stjerna they are set in the perf/collector -- you can add some custom flags

Ah, right, and I took the link in the Zulip message you linked to a bit too literally, I see

simulacrum (Mar 20 2019 at 19:14, on Zulip):

:wave:

simulacrum (Mar 20 2019 at 19:14, on Zulip):

@Albin Stjerna I'll be around for the next ~hour or so if you have questions

simulacrum (Mar 20 2019 at 19:14, on Zulip):

The reason we need linux is that we only support perf as the stats collection backend

Albin Stjerna (Mar 20 2019 at 19:14, on Zulip):

Albin Stjerna I'll be around for the next ~hour or so if you have questions

Thanks! I'll let you know. :)

simulacrum (Mar 20 2019 at 19:14, on Zulip):

I think in the future it's plausible we'll support other things but I don't have the time to do the legwork

Albin Stjerna (Mar 20 2019 at 19:15, on Zulip):

That sounds like an incredibly reasonable set of priorities

Santiago Pastorino (Mar 20 2019 at 20:25, on Zulip):

Santiago Pastorino Could you share how you did to run rustc-perf with Polonius? I haven't managed to figure out how to do that from reading the READMEs in the repository

did you figure this out?

Santiago Pastorino (Mar 20 2019 at 20:25, on Zulip):

sorry but I have been very busy

Santiago Pastorino (Mar 20 2019 at 20:25, on Zulip):

after Rust Latam I'd be a new person :smiley:

Albin Stjerna (Mar 20 2019 at 20:40, on Zulip):

Santiago Pastorino Could you share how you did to run rustc-perf with Polonius? I haven't managed to figure out how to do that from reading the READMEs in the repository

did you figure this out?

I think so, but thanks!

Albin Stjerna (Mar 20 2019 at 20:47, on Zulip):

after Rust Latam I'd be a new person :smiley:

As I usually say to my friends when they are organising things (or otherwise seem overburdened); don't die! (I know it's a very low bar)

Albin Stjerna (Mar 21 2019 at 10:44, on Zulip):

Update: I think the benchmarks are running now. I get no output what so ever, but the fans on my computer sound like it is about to take off from my desk, so something is happening anyway

Santiago Pastorino (Mar 21 2019 at 14:08, on Zulip):

after Rust Latam I'd be a new person :smiley:

As I usually say to my friends when they are organising things (or otherwise seem overburdened); don't die! (I know it's a very low bar)

hehehe, it's a very time consuming thing to do :)

Albin Stjerna (Mar 25 2019 at 21:43, on Zulip):

Ok, I'm not sure if it was a good idea, but I have now tweaked rustc-perf to also run a Polonius pass, similar to the NLL pass, and it seems to be working: cargo-check wall clock time, cargo-check max-rss

Albin Stjerna (Mar 25 2019 at 21:44, on Zulip):

Tomorrow I'll try running the it for all of them and not just cargo and see how they behave; it would be interesting to do some clustering to figure out if the overhead for using Polonius over NLL is constant or if it only happens in some packages, hopefully I can use that information to extract a few types and construct some benchmarks

Albin Stjerna (Mar 26 2019 at 16:03, on Zulip):
Albin Stjerna (Mar 26 2019 at 16:04, on Zulip):

I guess a few of those in the middle would make good benchmarking targets?

Albin Stjerna (Mar 26 2019 at 16:38, on Zulip):

Just for kicks, I did some k-means clustering on the same data (that is, relative performance for RSS and wall-clock time), gave it 8 classes (a number I arbitrarily made up), and it came up with one each for html5ever, tuple-stress, wg-grammar, ucd, encoding, inflate, and clap-rs, and everything else it binned in the same category. This seems fairly reasonable to me, as those are outliers and the others have roughly similar performance.

Albin Stjerna (Mar 26 2019 at 17:05, on Zulip):

It's also interesting to note that inflate, ucd, and wg-grammar all have worse relative memory performance than in terms of time, for whatever that might be worth

Albin Stjerna (Mar 26 2019 at 17:05, on Zulip):

I guess that has to have something to do with the number of tuples shoveled between Polonius and rustc?

Albin Stjerna (Mar 27 2019 at 09:39, on Zulip):

Oh, and this is how the relative difference (in percent) look plotted

Albin Stjerna (Apr 09 2019 at 19:35, on Zulip):

Ok, so for everyone who wasn't at the meeting but who might be interested (that is, everyone except @nikomatsakis); I hacked together a set of scripts to a) scrape GitHub and crates.io for crates, in order of popularity (stars/recent downloads) and b) time cargo check under both NLL and Polonius on them.

I ran each cargo check three times, and recorded the smallest runtime (as measured in wall-clock time). I also used the three records to calculate p-values for the null hypothesis (that is, no difference between NLL and Polonius).

Here is a CSV with about 900 of the results

Albin Stjerna (Apr 09 2019 at 19:38, on Zulip):

I also have a list of the crates I discarded because they didn't compile (or, in some cases, because they had wonky pull URLs)

Albin Stjerna (Apr 15 2019 at 09:15, on Zulip):

I tried using Criterion for benchmarking the facts currently in the repo, under both the hybrid method and the optimised Datafrog algorithm, and the results seem to show a minor improvement for some cases with a minor overhead in others.

Albin Stjerna (Apr 15 2019 at 09:16, on Zulip):

(The clap-rs facts take too long to be viable; Criterion is only good for micro-benchmarks)

Albin Stjerna (Apr 15 2019 at 09:16, on Zulip):

Unless you want to wait literal hours for results

Albin Stjerna (Apr 15 2019 at 09:29, on Zulip):

Code is in the polonius-┬Ábench branch of my fork and is executed by just running cargo bench (works on non-nightly as well).

Albin Stjerna (Apr 19 2019 at 07:53, on Zulip):

OK, because I didn't sleep well and had to do something more brainless, I kept running the trial compilation and now I can report from building a little over 3000 crates that there seems to be very little difference between Polonius and NLL:

Albin Stjerna (Apr 19 2019 at 07:53, on Zulip):

Actually, this is suspiciously good, I'll have to investigate

Albin Stjerna (Apr 19 2019 at 07:55, on Zulip):

The most likely explanation is that the entire build process is timed and that dependencies are not built with Polonius enabled; the build process is a bit opaque to me

nikomatsakis (Apr 19 2019 at 19:44, on Zulip):

Actually, this is suspiciously good, I'll have to investigate

Heh, indeed

Last update: Nov 15 2019 at 20:05UTC