Stream: t-compiler/wg-nll

Topic: perf.rlo benchmarks


simulacrum (Jun 29 2018 at 15:37, on Zulip):

@nikomatsakis Do you think it's time to enable more of the currently disabled for NLL crates on perf? We've lowered the max to ~300% now, and I think you have further fixes in the works. I'm uncertain whether it'd be helpful to enable a few more -- what do you think?

nikomatsakis (Jun 29 2018 at 17:04, on Zulip):

can't hurt, I suppose =) I'm still only profiling a subset of what we have, but having a better "overall picture" would be good

nikomatsakis (Jun 29 2018 at 17:15, on Zulip):

btw, I guess some of those perf runs may have completed by now? I guess I should just test, eh?

nikomatsakis (Jun 29 2018 at 17:15, on Zulip):

it'd be so nice if the queue were visible :)

simulacrum (Jun 29 2018 at 17:16, on Zulip):

I don't think they have

nikomatsakis (Jun 29 2018 at 17:17, on Zulip):

looks like no

simulacrum (Jun 29 2018 at 17:19, on Zulip):

They should start in around an hour and then it'll be ~6 hours before they all finish

nikomatsakis (Jun 29 2018 at 17:19, on Zulip):

heh, ok

simulacrum (Jul 01 2018 at 16:38, on Zulip):

@nikomatsakis New project for you: tuple-stress 17,715,830,784.00 2,917,195,841,536.00 16,466.60%

simulacrum (Jul 01 2018 at 16:38, on Zulip):

cc @lqd

lqd (Jul 01 2018 at 16:39, on Zulip):

lol

simulacrum (Jul 01 2018 at 16:39, on Zulip):

80x worse time-wise

lqd (Jul 01 2018 at 16:40, on Zulip):

IIRC it's a very synthetic test, not extracted from real code ?

simulacrum (Jul 01 2018 at 16:40, on Zulip):

Yes, but I've seen cases of similar code in std library and elsewhere

simulacrum (Jul 01 2018 at 16:40, on Zulip):

(less extreme -- but still similar)

lqd (Jul 01 2018 at 16:41, on Zulip):

oh interesting

simulacrum (Jul 01 2018 at 16:42, on Zulip):

it's a program with ~65k lines of tuples; the type of the static is [(i32, (f64, f64, f64)); 0xffff]

simulacrum (Jul 01 2018 at 16:43, on Zulip):

But technically I don't see why NLL would need to spend so much time on verifying this since there's no particular behavior that looks relevant to NLL (i.e., it's all constant data)

lqd (Jul 01 2018 at 16:43, on Zulip):

at least this one doesn't OOM :) I'll try and see what callgrind says — that is to say the valgrind run will take at least 800x longer

simulacrum (Jul 01 2018 at 16:44, on Zulip):

it's probably doing the same thing over and over again

lqd (Jul 01 2018 at 16:45, on Zulip):

yeah, and not caching it

lqd (Jul 01 2018 at 16:45, on Zulip):

IIRC it was the case for the previous outlier (coercions?) which was at 1000% until niko's caching PR landed

simulacrum (Jul 01 2018 at 16:48, on Zulip):

The good news is that the other benchmark I enabled (crates.io) is only 123% of clean so much better

lqd (Jul 01 2018 at 16:49, on Zulip):

were they disabled because of performance problems at the time ?

lqd (Jul 01 2018 at 16:51, on Zulip):

good news indeed for crates.io, that's not so bad :)

simulacrum (Jul 01 2018 at 16:51, on Zulip):

Somewhat, I think so

simulacrum (Jul 01 2018 at 16:52, on Zulip):

Before we enable NLL by default we'd want to make sure it's performant enough across the suite of benchmarks we do have though I think

lqd (Jul 01 2018 at 16:53, on Zulip):

yeah, and thanks for looking at those on perf.rlo

lqd (Jul 02 2018 at 14:38, on Zulip):

for those following at home, tuple-stress took around 3 hours in callgrind :) and it seems 97% is in liveness (of which the work seems to be split in between a couple differentrustc::ty::fold::TypeFoldable visit fns, and a closure inside MIR borrowck's add_liveness_constraints)

nikomatsakis (Jul 02 2018 at 15:37, on Zulip):

@lqd huh. which SHA are you benchmarking?

lqd (Jul 02 2018 at 15:38, on Zulip):

this morning's 45935640f058405c95c96308f3acfd5ac1535698

nikomatsakis (Jul 02 2018 at 15:41, on Zulip):

ok, I wonder if https://github.com/rust-lang/rust/pull/51896 will help

nikomatsakis (Jul 02 2018 at 15:41, on Zulip):

I'm curious though why it's spending so much time in folding

nikomatsakis (Jul 02 2018 at 15:42, on Zulip):

perhaps you mean a different liveness code than I meant :)

nikomatsakis (Jul 02 2018 at 15:42, on Zulip):

ah, I guess you mean the "liveness constraint" code

nikomatsakis (Jul 02 2018 at 15:42, on Zulip):

I should do a quick perf profile

lqd (Jul 02 2018 at 15:42, on Zulip):

yeah somewhere in liveness indeed ;)

nikomatsakis (Jul 02 2018 at 15:43, on Zulip):

I rewrote region inference over the weekend ;) hoping to open a PR soon, I've just gotten it compiling (haven't even run the tests yet)... should be faster though, I would think...

nikomatsakis (Jul 02 2018 at 15:43, on Zulip):

now computing the SCCs instead of just iterating to fixed point

lqd (Jul 02 2018 at 15:44, on Zulip):

sounds fast just talking about it :)

lqd (Jul 02 2018 at 15:44, on Zulip):

do you want me to check with the dirty list PR ? (or do you already have a build with it?)

lqd (Jul 02 2018 at 15:45, on Zulip):

(I can look at the profile again to be more precise about the bits of liveness I was casually mentioning)

nikomatsakis (Jul 02 2018 at 15:46, on Zulip):

I don't think the PR will help

nikomatsakis (Jul 02 2018 at 15:46, on Zulip):

if you have the profile around though

nikomatsakis (Jul 02 2018 at 15:46, on Zulip):

and can cite the actual fn names

nikomatsakis (Jul 02 2018 at 15:46, on Zulip):

that might be helpful

lqd (Jul 02 2018 at 15:48, on Zulip):

its a bit mangled + it's a closure but this is the 12% of time spent in liveness pasted image

nikomatsakis (Jul 02 2018 at 15:48, on Zulip):

yep, ok. Curious.

lqd (Jul 02 2018 at 15:48, on Zulip):

the rest are all in TypeFoldable

lqd (Jul 02 2018 at 15:49, on Zulip):

(so I'm guessing visits of some kind, visit_with and super_visit_with)

nikomatsakis (Jul 02 2018 at 15:50, on Zulip):

could be the canonicalization logic

nikomatsakis (Jul 02 2018 at 15:50, on Zulip):

lots of stuff builds on type-foldable really

nikomatsakis (Jul 02 2018 at 15:50, on Zulip):

I guess I can do a profile of my own readily enough

lqd (Jul 02 2018 at 15:51, on Zulip):

btw did you see this weekend's other "interesting discovery" of the memory consumption in the html5ever benchmark ?

nikomatsakis (Jul 02 2018 at 15:52, on Zulip):

I saw that there was a lot of memory consumption

nikomatsakis (Jul 02 2018 at 15:52, on Zulip):

did we track down cause at all?

lqd (Jul 02 2018 at 15:52, on Zulip):

I don't think so no

nikomatsakis (Jul 02 2018 at 15:53, on Zulip):

ok

nikomatsakis (Jul 02 2018 at 16:35, on Zulip):

@lqd in case you are curious, this is the region inference work I was talking about https://github.com/rust-lang/rust/pull/51987

lqd (Jul 02 2018 at 16:52, on Zulip):

narrator "lqd was, in fact, curious"

lqd (Jul 02 2018 at 20:39, on Zulip):

@simulacrum @nikomatsakis do we want/need updated benchmark versions for 1) clap-rs ? (cargo check NLL on clap master almost is 3x faster than the perf.rlo version), 2) inflate ? (cargo check NLL on inflate master is almost 25x faster than the perf.rlo version) — and if so, what to call them since perf.rlo has switched from having the crate's version in the benchmark name ?

simulacrum (Jul 02 2018 at 20:41, on Zulip):

Generally this is a unanswered question -- we might want to add them, but our capacity for adding new benchmarks to perf.rlo is somewhat limited

simulacrum (Jul 02 2018 at 20:41, on Zulip):

We're already regularly behind 3-5 commits and probably have many holes in history because of how slow the server is

lqd (Jul 02 2018 at 20:42, on Zulip):

unfortunate

simulacrum (Jul 02 2018 at 20:44, on Zulip):

I think the long-term future here is shifting to some sort of architecture were we push this work off to multiple machines (but keep a single benchmark on a single machine)

lqd (Jul 02 2018 at 20:44, on Zulip):

in any case, if/when we decide we need and can afford them, I can take care of it so you don't have to do it all ;)

simulacrum (Jul 02 2018 at 20:44, on Zulip):

Sure, yeah -- clap and inflate should go fairly quickly if we want to add them I imagine

lqd (Jul 02 2018 at 20:44, on Zulip):

yeah IIRC adam (anp) wanted/needed a similar setup for lolbench

simulacrum (Jul 02 2018 at 20:44, on Zulip):

I'm not sure if it's worth the benefits, though, nor am I sure what the benefits are

lqd (Jul 02 2018 at 20:48, on Zulip):

I assume it was to track a more modern version so that it would be more up-to-date with what people would experience compiling them, rather than versions exercising particular slow paths in the compiler at the time we added the benchmarks

lqd (Jul 02 2018 at 20:48, on Zulip):

but yeah indeed, the cost / reward seems high for these cases and maybe in general — in any case, I have PRs ready if/when we want those 2, just let me know

simulacrum (Jul 02 2018 at 20:49, on Zulip):

sure, yeah -- it's a question though of how often we should then "update", sort of, I think

lqd (Jul 02 2018 at 20:49, on Zulip):

I wonder if "transient" benchmarks would do the job

lqd (Jul 02 2018 at 20:49, on Zulip):

not as data points to track how we improve on a specific rev

lqd (Jul 02 2018 at 20:50, on Zulip):

but just to say "the current version of clap, which changes all the time, how often TBD, now compiles this fast"

lqd (Jul 02 2018 at 20:51, on Zulip):

I guess it's a bit more telemetry-like than perf.rlo like

lqd (Jul 02 2018 at 20:53, on Zulip):

(also I forgot mw's specific needs/issues so this might be stupid :)

simulacrum (Jul 02 2018 at 21:02, on Zulip):

Hm, that might be an idea -- instead of benchmarking everything every commit have a "core set" and then do the others for every, say, 20th commit or something

lqd (Jul 02 2018 at 21:04, on Zulip):

and for "tracking" how these crate users experience compiling, we could just remember some points of interest, eg the data at Rust releases: rust version + benchmark crate version

nikomatsakis (Jul 02 2018 at 21:06, on Zulip):

I think that for the purpose of judging whether NLL is "fast enough to ship", it would be very useful to have numbers for modern crates

nikomatsakis (Jul 02 2018 at 21:06, on Zulip):

i.e, things representative of user's "soon to be" lived experiences

simulacrum (Jul 02 2018 at 21:06, on Zulip):

sure -- we already have https://perf.rust-lang.org/dashboard.html which is sort of like that

simulacrum (Jul 02 2018 at 21:06, on Zulip):

Anyway, @lqd -- I'd feel okay updating clap/inflate either in-place (with a rename) or by adding a new clap-sdfsdf where that would be the commit sha prefix

lqd (Jul 02 2018 at 21:07, on Zulip):

(yeah, I'm guessing the dashboard was an answer to me ;)

simulacrum (Jul 02 2018 at 21:07, on Zulip):

yes

lqd (Jul 02 2018 at 21:09, on Zulip):

I'll let you and niko decide whichever way you feel is best, for whichever of the crates we track, and I will do it (as you both know what we need and the tradeoffs involved)

lqd (Jul 02 2018 at 21:21, on Zulip):

(btw I think I showed this mostly to mw and haven't had the chance to continue since: an old-ish slow bad mockup I was working on to try and make the perf result "summarized" https://lqd.github.io/perf/report.html — it doesn't yet achieve this goal as I'd need to generate more of a useful summary so that you wouldn't need to glance at the results, but it's a starting point at least)

simulacrum (Jul 02 2018 at 21:26, on Zulip):

Looks quite interesting!

lqd (Jul 03 2018 at 06:33, on Zulip):

oh the SCC PR results are not available yet (https://github.com/rust-lang/rust/pull/51987#issuecomment-401876530) dit it fail maybe ? or just taking more time than expected probably

nikomatsakis (Jul 03 2018 at 09:19, on Zulip):

I think something's messed up, yeah.

lqd (Jul 03 2018 at 10:04, on Zulip):

@nikomatsakis would you rather we stop tracking the old versions of clap/inflate by updating the benchmark crates to a new version, or have the modern versions as additional benchmarks ? (clap is not _that old_ so the added cost of tracking each commit might be prohibitive vs us regularly checking a set of crates ? that is, until we have lighterweight tracking like we talked about yesterday)

nikomatsakis (Jul 03 2018 at 15:30, on Zulip):

hmm @lqd I think that on perf I would ideally want to track both

nikomatsakis (Jul 03 2018 at 15:30, on Zulip):

it seems good to have stress tests

lqd (Jul 03 2018 at 15:35, on Zulip):

ok then I'll prepare a PR adding those 2 -- but just out of curiosity, does having a set of benchmarks we track "casually" (and not commit by commit) be interesting to you ? (to have both precise tracking on a _bespoke set_ :simple_smile: of benchmarks for every commit + try builds, and a lighter weight different set, which could be to have an idea of users' "soon to be lived" experience)

nikomatsakis (Jul 03 2018 at 16:56, on Zulip):

@lqd ok looking at tuple-stress it seems like ~50% of the time is spent looking at the types of live variables; an additional 20% is spent iterating and doing other stuff around liveness.

nikomatsakis (Jul 03 2018 at 16:56, on Zulip):

so it may be that there are a few related sources to improve

lqd (Jul 03 2018 at 16:56, on Zulip):

(the silver lining is I think polonius handles itself shakespearly with tuple-stress)

nikomatsakis (Jul 03 2018 at 16:57, on Zulip):

well, all of this liveness code still executes even with polonius

lqd (Jul 03 2018 at 16:57, on Zulip):

lol

nikomatsakis (Jul 03 2018 at 16:57, on Zulip):

though maybe someday we'll rewrite it all to be datafrog based

lqd (Jul 03 2018 at 16:57, on Zulip):

I retract my previous statement!

lqd (Jul 03 2018 at 16:58, on Zulip):

then let's improve it until/if we do switch to :frog: :)

nikomatsakis (Jul 03 2018 at 17:00, on Zulip):

actually almost all of the optimization we've been doing would still benefit -Zpolonius

nikomatsakis (Jul 03 2018 at 17:00, on Zulip):

the SCC stuff is the one exception I think

nikomatsakis (Jul 03 2018 at 17:00, on Zulip):

at least as currently architected

lqd (Jul 03 2018 at 17:01, on Zulip):

since we're in a constant array, do those live variables have mostly the same type ?

lqd (Jul 03 2018 at 17:06, on Zulip):

the visit_ty should be mostly the same from point to point (but can't tell if caching here would be interesting)

nikomatsakis (Jul 03 2018 at 17:10, on Zulip):

I suspect that none of the types involved even have regions

nikomatsakis (Jul 03 2018 at 17:10, on Zulip):

I was going to open an issue to experiment with a small tweak that checks the type flags and skips the for_each_free_region call entirely in that case

lqd (Jul 03 2018 at 17:11, on Zulip):

is there an easy way to check if the types have regions in the mir output ? I do have part of it handy

nikomatsakis (Jul 03 2018 at 17:13, on Zulip):

not sure but I am doing a run now that should dump it w/ a debug!

lqd (Jul 03 2018 at 17:13, on Zulip):

(maybe the regions are dumped in the comments I mean ;)

lqd (Jul 03 2018 at 17:13, on Zulip):

oh good

nikomatsakis (Jul 03 2018 at 18:30, on Zulip):

filed https://github.com/rust-lang/rust/issues/52027

lqd (Jul 03 2018 at 18:32, on Zulip):

yay :) we can talk about it a today's meeting but I'll glady do it

nikomatsakis (Jul 03 2018 at 18:34, on Zulip):

probably a good idea to get it done. It's a simple change. I suspect we'll want to do more afterwards, though.

nikomatsakis (Jul 03 2018 at 18:34, on Zulip):

this is about 50% but we need something like 90% :)

lqd (Jul 03 2018 at 18:42, on Zulip):

sure :)

nikomatsakis (Jul 03 2018 at 22:05, on Zulip):

@lqd this is a good follow-up re: tuple-stress =) https://github.com/rust-lang/rust/issues/52034

lqd (Jul 03 2018 at 22:19, on Zulip):

nice :D

lqd (Jul 03 2018 at 22:33, on Zulip):

I opened the PR for the previous tuple-stress issue, will r? ping when it passes travis if I'm still awake

lqd (Jul 04 2018 at 00:04, on Zulip):

that is, https://github.com/rust-lang/rust/pull/52037

lqd (Jul 04 2018 at 00:21, on Zulip):

and matthew has found why webrender and piston-image didn't compile https://github.com/rust-lang/rust/issues/51372#issuecomment-402306996

lqd (Jul 04 2018 at 00:28, on Zulip):

hmm, didn't futures have issues as well ? answer: doesn't seem like it no — so, good news again :)

lqd (Jul 04 2018 at 00:33, on Zulip):

(apart from the CPU cost of tuple-stressand the memory cost of html5ever)

simulacrum (Jul 04 2018 at 01:06, on Zulip):

nll benchmarks for servo crates are in: 130/120% worse

lqd (Jul 04 2018 at 06:52, on Zulip):

@nikomatsakis I updated 52037 to remove super_ty

nikomatsakis (Jul 04 2018 at 09:10, on Zulip):

great! r+

lqd (Jul 04 2018 at 10:54, on Zulip):

oh but this new #52034 issue has already been requested by another contributor, too bad but I guess I'll focus on triaging the crater results when they are available

nikomatsakis (Jul 04 2018 at 13:01, on Zulip):

@simulacrum btw I'm going to want a perf run for https://github.com/rust-lang/rust/pull/51987 — I just started the bors try though

nikomatsakis (Jul 04 2018 at 13:16, on Zulip):

ok I got a new profile of clap-rs with the SCC stuff in place

nikomatsakis (Jul 04 2018 at 13:17, on Zulip):

solving regions is now 3% of MIR borrowck (and hence a very small percentage of total time)

nikomatsakis (Jul 04 2018 at 13:26, on Zulip):

type-checking remains 25% of MIR borrowck, split amongst:

nikomatsakis (Jul 04 2018 at 13:54, on Zulip):

(regarding dataflow, this does include @Pramod Bisht's improvements, though iirc they were not especially beneficial to clap-rs)

nikomatsakis (Jul 04 2018 at 13:54, on Zulip):

I should profile another case or two

Jake Goulding (Jul 04 2018 at 13:57, on Zulip):

I need a "go faster" emoji response

nikomatsakis (Jul 04 2018 at 13:59, on Zulip):

interesting; cargo looks similiar-ish but more time in liveness

nikomatsakis (Jul 04 2018 at 13:59, on Zulip):

#52034 seems more imp't than ever

nikomatsakis (Jul 04 2018 at 14:00, on Zulip):

oh, hmm, interesting

nikomatsakis (Jul 04 2018 at 14:01, on Zulip):

looking more closely at Liveness::defs_uses, I see more room for improvement

nikomatsakis (Jul 04 2018 at 14:01, on Zulip):

although... huh. I thought that njn optimized this

nikomatsakis (Jul 04 2018 at 14:01, on Zulip):

ah, https://github.com/rust-lang/rust/pull/51870 didn't land yet

lqd (Jul 04 2018 at 14:09, on Zulip):

@Jake Goulding Gankro's version of Ferris, Sonic-like "gotta go fast" comes to mind :)

lqd (Jul 04 2018 at 14:26, on Zulip):

\+ it looks like liveness is also linked to the html5ever OOM, from njn's comments on the issue

lqd (Jul 04 2018 at 16:47, on Zulip):

njn's 51870 has now landed :tada:

nikomatsakis (Jul 04 2018 at 17:03, on Zulip):

woot. Now if only the perf server would finish https://github.com/rust-lang/rust/pull/51987 :)

nikomatsakis (Jul 04 2018 at 17:03, on Zulip):

I doubt it will happen today though

simulacrum (Jul 04 2018 at 17:09, on Zulip):

@nikomatsakis Working on a queue now :)

nikomatsakis (Jul 04 2018 at 17:11, on Zulip):

what do you mean by a queue?

simulacrum (Jul 04 2018 at 17:12, on Zulip):

i.e. you can see what order, and probably estimate when, things will be benchmarked

nikomatsakis (Jul 04 2018 at 17:12, on Zulip):

nice :)

simulacrum (Jul 04 2018 at 17:12, on Zulip):

it will also mean soon-ish that people can request try builds for benchmarking, possibly with a priority

lqd (Jul 04 2018 at 17:15, on Zulip):

will these perf build requests possibly be using "craterbot" ?

simulacrum (Jul 04 2018 at 17:17, on Zulip):

could, but maybe just the same account -- it'll for now be a different service for sure

lqd (Jul 04 2018 at 18:42, on Zulip):

was it recently enabling some NLL benchmarks which caused problems for perf.rlo ?

nikomatsakis (Jul 04 2018 at 18:47, on Zulip):

it doesn't help, but perf's turnaround times have always been long-ish

nikomatsakis (Jul 04 2018 at 18:47, on Zulip):

I wonder if we could commission a second machine

nikomatsakis (Jul 04 2018 at 18:47, on Zulip):

(to be used for new benchmarks)

nikomatsakis (Jul 04 2018 at 18:48, on Zulip):

obviously for any one benchmark we probably prefer to keep a consistent piece of hardware

simulacrum (Jul 04 2018 at 19:05, on Zulip):

We can -- however, that's rather expensive on most cloud platforms because we need dedicated hardware

simulacrum (Jul 04 2018 at 19:05, on Zulip):

We're moving in the direction where that's easier on the code part, too

nikomatsakis (Jul 04 2018 at 19:09, on Zulip):

I thought perf stuff was running on an actual computer under somebody-or-other's desk?

simulacrum (Jul 04 2018 at 19:16, on Zulip):

It is

nikomatsakis (Jul 04 2018 at 19:17, on Zulip):

so presumably we could buy a second one

nikomatsakis (Jul 04 2018 at 19:17, on Zulip):

but anyway

nikomatsakis (Jul 04 2018 at 19:18, on Zulip):

or we can get NLL up to snuff so we don't have to run 2 rounds of profiles :)

simulacrum (Jul 04 2018 at 19:20, on Zulip):

Sure, though I don't know that NLL costs us all that much today. Now, eliminating Servo... that's an hour of savings.

simulacrum (Jul 04 2018 at 19:20, on Zulip):

(IIRC)

nikomatsakis (Jul 04 2018 at 19:21, on Zulip):

:)

simulacrum (Jul 04 2018 at 20:16, on Zulip):

@nikomatsakis Queue is live, https://perf.rust-lang.org/status.html (bottom of page)

simulacrum (Jul 04 2018 at 20:17, on Zulip):

Haven't worked on the on-demand adding of try commits but it's coming

lqd (Jul 04 2018 at 20:30, on Zulip):

nice :) (perf seems offline now but I'm sure it is nice)

simulacrum (Jul 04 2018 at 20:31, on Zulip):

@lqd DNS is changing over -- if you flush local cache that might be enough

nikomatsakis (Jul 04 2018 at 20:32, on Zulip):

@simulacrum that is very exciting!

nikomatsakis (Jul 04 2018 at 20:32, on Zulip):

I guess I have to figure out how to flush my DNS :)

simulacrum (Jul 04 2018 at 20:32, on Zulip):

sudo dscacheutil -flushcache;sudo killall -HUP mDNSResponder on macOS apparently

nikomatsakis (Jul 04 2018 at 20:33, on Zulip):

ok, that worked

nikomatsakis (Jul 04 2018 at 20:33, on Zulip):

I see something very ... text-y

nikomatsakis (Jul 04 2018 at 20:33, on Zulip):

but it does include a queue

nikomatsakis (Jul 04 2018 at 20:33, on Zulip):

ah, this is the "status" page, ok

lqd (Jul 04 2018 at 20:34, on Zulip):

what could be nice if it also showed the future "perf results URL"

nikomatsakis (Jul 04 2018 at 20:34, on Zulip):

oh dear god yes please :)

nikomatsakis (Jul 04 2018 at 20:35, on Zulip):

I take it from this that try commit 8e6d8db2bf93a is next?

nikomatsakis (Jul 04 2018 at 20:35, on Zulip):

if so, I think that's my SCC PR

nikomatsakis (Jul 04 2018 at 20:35, on Zulip):

which is excited to me :P

lqd (Jul 04 2018 at 20:35, on Zulip):

how timely

lqd (Jul 04 2018 at 20:36, on Zulip):

runs do take a couple hours right ?

simulacrum (Jul 04 2018 at 20:36, on Zulip):

Well, you're going to have to adjust your master URL or wait for like 12 more hours but yes

simulacrum (Jul 04 2018 at 20:36, on Zulip):

2.5 hours on average per commit, yes

nikomatsakis (Jul 04 2018 at 20:36, on Zulip):

Well, you're going to have to adjust your master URL or wait for like 12 more hours but yes

yeah, I know

nikomatsakis (Jul 04 2018 at 20:37, on Zulip):

but I think I can compare against more-or-less any recent commit and get useful-ish info

nikomatsakis (Jul 04 2018 at 20:37, on Zulip):

probably should not include njn's latest PR

nikomatsakis (Jul 04 2018 at 23:17, on Zulip):

ok, the updated results for the SCC branch are available

nikomatsakis (Jul 04 2018 at 23:18, on Zulip):

43% win for inflate :)

nikomatsakis (Jul 04 2018 at 23:18, on Zulip):

the rest kind of looks like noise to me

nikomatsakis (Jul 04 2018 at 23:19, on Zulip):

though I do see a fair amount of red

nikomatsakis (Jul 04 2018 at 23:19, on Zulip):

still seems probably worth it

nikomatsakis (Jul 04 2018 at 23:19, on Zulip):

(there are e.g. comparable swings in the NON-nll build times, which I cannot imagine being affected)

simulacrum (Jul 04 2018 at 23:21, on Zulip):

noise is small time-wise anyway

simulacrum (Jul 04 2018 at 23:21, on Zulip):

might be side-effects of me checking on it every couple minutes, honestly

nikomatsakis (Jul 04 2018 at 23:22, on Zulip):

heh :)

simulacrum (Jul 05 2018 at 03:36, on Zulip):

Fancy in-progress commit status display is in (but, who knows, could break): https://perf.rust-lang.org/status.html

simulacrum (Jul 05 2018 at 03:36, on Zulip):

I also need to test out the try commit on demand code -- so, basically, go run a try commit :)

simulacrum (Jul 05 2018 at 17:48, on Zulip):

@nikomatsakis Thought: Stop running NLL benchmarks on anything except check

simulacrum (Jul 05 2018 at 17:48, on Zulip):

I don't see how optimizations or debug can make any difference to NLL times/perf so that seems like a cheap win

nikomatsakis (Jul 05 2018 at 18:01, on Zulip):

@simulacrum :+1:

simulacrum (Jul 05 2018 at 18:17, on Zulip):

deployed

simulacrum (Jul 05 2018 at 21:06, on Zulip):

@nikomatsakis Try build requests are live: https://github.com/rust-lang/rust/pull/52083#issuecomment-402838238

simulacrum (Jul 05 2018 at 21:06, on Zulip):

(you should have permissions and I think that wall of comments resolved the quirks, mostly)

nikomatsakis (Jul 05 2018 at 21:08, on Zulip):

holy macarel

nikomatsakis (Jul 05 2018 at 21:08, on Zulip):

how do I use it exactly?

nikomatsakis (Jul 05 2018 at 21:08, on Zulip):

do I still have to schedule a try build?

nikomatsakis (Jul 05 2018 at 21:08, on Zulip):

or does it handle all of that?

lqd (Jul 05 2018 at 21:21, on Zulip):

apparently you schedule a try build, and ask the bot to queue the merge commit that bors answered

simulacrum (Jul 05 2018 at 21:58, on Zulip):

@nikomatsakis @lqd was correct -- you need to wait for the try build to go through in Travis but then you can schedule it with this bot and it'll queue in perf (visible on the status page). Right now it won't let you know when the build is complete, but at least it's a start

simulacrum (Jul 05 2018 at 21:58, on Zulip):

@rust-timer build <full commit hash> is the invocation

nikomatsakis (Jul 05 2018 at 22:21, on Zulip):

very cool

nikomatsakis (Jul 05 2018 at 22:21, on Zulip):

rustc-guide please :)

nikomatsakis (Jul 05 2018 at 22:22, on Zulip):

I am hypocritical, I have at least 2 or 3 things I should open PRs for

nikomatsakis (Jul 05 2018 at 22:22, on Zulip):

/me goes to do that now maybe

simulacrum (Jul 05 2018 at 22:23, on Zulip):

/me mumbles about wanting to add more before doing that

simulacrum (Jul 05 2018 at 22:23, on Zulip):

but yes, I should document it somewhere

nikomatsakis (Jul 06 2018 at 14:56, on Zulip):

@simulacrum can we close #51372 ?

simulacrum (Jul 06 2018 at 16:01, on Zulip):

@nikomatsakis Let me enable the relevant benchmarks on perf.r-l.o and then we can make sure they compile, I'll close after that

simulacrum (Jul 06 2018 at 16:17, on Zulip):

Enabled, should appear in the next run I think

simulacrum (Jul 06 2018 at 21:02, on Zulip):

@nikomatsakis Both finished successfully so I closed the issue

simulacrum (Jul 13 2018 at 20:27, on Zulip):

@nikomatsakis FYI, all benchmarks on perf.rlo are now enabled

simulacrum (Jul 13 2018 at 20:28, on Zulip):

https://perf.rust-lang.org/nll-dashboard.html

Matthew Jasper (Jul 13 2018 at 20:30, on Zulip):

5,731.76% :dizzy_face:

nikomatsakis (Jul 14 2018 at 05:04, on Zulip):

seems like html5ever-opt got worse due to my PR (#51987) but then worse again due to some other seemingly unrelated change (#52266)

nikomatsakis (Jul 16 2018 at 18:16, on Zulip):

@simulacrum did I do something wrong here? I'm trying to enqueue that build

nikomatsakis (Jul 16 2018 at 18:16, on Zulip):

I don't see it in the status page though

simulacrum (Jul 16 2018 at 18:16, on Zulip):

hm, it doesn't look like it, let me look at logs

simulacrum (Jul 16 2018 at 18:21, on Zulip):

I'll take care of it once I figure out what the bug is

simulacrum (Jul 16 2018 at 18:21, on Zulip):

Seems like it's probably not you

simulacrum (Jul 16 2018 at 18:25, on Zulip):

hm, okay, no idea what was wrong

simulacrum (Jul 16 2018 at 18:26, on Zulip):

Seems to have resolved itself though so "great"

simulacrum (Jul 16 2018 at 18:27, on Zulip):

oh, might be a GH problem https://status.github.com/messages

lqd (Jul 16 2018 at 18:28, on Zulip):

while using the playground earlier, the github gists API was very slow as well

lqd (Jul 16 2018 at 18:30, on Zulip):

that's going to be not so great for our bot heavy infra :)

simulacrum (Jul 16 2018 at 18:31, on Zulip):

Most bots are resilient internally, just not when interacting with humans :/

simulacrum (Jul 16 2018 at 18:31, on Zulip):

i.e perf will mostly work fine without GH I think, it'll go through the existing queue just fine

Eh2406 (Jul 18 2018 at 16:06, on Zulip):

Are there issues that I could jump in on to help get the https://perf.rust-lang.org/nll-dashboard.html to look happier?

nikomatsakis (Jul 18 2018 at 16:12, on Zulip):

we have one bigger idea for exploring, but I had another thought recently I've been meaning to write-up that might be a bit more targeted...

Eh2406 (Jul 18 2018 at 17:47, on Zulip):

Well https://github.com/rust-lang/rust/pull/52250#issuecomment-406015802 looks promising!

nikomatsakis (Jul 18 2018 at 17:51, on Zulip):

Certainly...

nikomatsakis (Jul 18 2018 at 17:51, on Zulip):

I'm not clear on how fully rebased that is though

Eh2406 (Jul 18 2018 at 17:58, on Zulip):

The try builds parent is https://github.com/rust-lang/rust/commit/cd5f5a129fab998a1ee7c72204d093dc475981d1
and looking at the changes https://github.com/rust-lang/rust/pull/52364
might be a problem?

nikomatsakis (Jul 18 2018 at 18:04, on Zulip):

hmm, that might PR might screw up my in-progress PR...

lqd (Jul 24 2018 at 14:25, on Zulip):

95% of the time in html5everis because of this huge static

davidtwco (Jul 24 2018 at 14:26, on Zulip):

That's quite the static.

lqd (Jul 24 2018 at 14:27, on Zulip):

AFAICT it's a phf map of the tokenizer's named entities

nikomatsakis (Jul 24 2018 at 14:27, on Zulip):

that was similar to tuple-stress

lqd (Jul 24 2018 at 14:27, on Zulip):

very

nikomatsakis (Jul 24 2018 at 14:27, on Zulip):

except that there are lifetimes -- just 'static

nikomatsakis (Jul 24 2018 at 14:27, on Zulip):

we could plausibly special case it...

lqd (Jul 24 2018 at 14:27, on Zulip):

as we were thinking for tuple stress :)

lqd (Jul 24 2018 at 14:27, on Zulip):

this is the thing which was taking 11GB to compile as well, before David's PR

davidtwco (Jul 24 2018 at 14:29, on Zulip):

There's still too much memory being used even post that-PR though, right?

lqd (Jul 24 2018 at 14:29, on Zulip):

2GB IIRC

nikomatsakis (Jul 24 2018 at 14:29, on Zulip):

so the main thing is

nikomatsakis (Jul 24 2018 at 14:29, on Zulip):

in the case of html5ever

nikomatsakis (Jul 24 2018 at 14:29, on Zulip):

we.. maybe? don't really need to compute liveness.. all regions are 'static

nikomatsakis (Jul 24 2018 at 14:29, on Zulip):

or at least we might be able to readily detect that

nikomatsakis (Jul 24 2018 at 14:30, on Zulip):

that also explains why the bit sets were so densely occupied

lqd (Jul 24 2018 at 14:49, on Zulip):

I was trying to get some logs to see which functions were the most problematic, but probably the ones njn has mentioned before

lqd (Jul 24 2018 at 17:05, on Zulip):

said log gathering is still running, 52GB of liveness logs so far :)

lqd (Jul 24 2018 at 17:12, on Zulip):

niko and santiago have been working around this area before, like 3 weeks ago

lqd (Jul 24 2018 at 19:21, on Zulip):

@nikomatsakis could we / would it help if we only push_type_live_constraintd once per (live_local_ty x location), rather than per (live_local x location) here

nikomatsakis (Jul 24 2018 at 19:23, on Zulip):

plausibly, depends how many locals there are w/ duplicate types I suppose :)

lqd (Jul 24 2018 at 19:23, on Zulip):

all of them :3

nikomatsakis (Jul 24 2018 at 19:23, on Zulip):

would probably be a relatively easy change to try

nikomatsakis (Jul 24 2018 at 19:23, on Zulip):

I see, I see

nikomatsakis (Jul 24 2018 at 19:23, on Zulip):

in this particular case you mean

lqd (Jul 24 2018 at 19:24, on Zulip):

I'm just seeing them through the logs so I might be mistaken

nikomatsakis (Jul 24 2018 at 19:24, on Zulip):

no I mean seems very plausible actually

nikomatsakis (Jul 24 2018 at 19:24, on Zulip):

particularly for big constants

lqd (Jul 24 2018 at 19:24, on Zulip):

right

nikomatsakis (Jul 24 2018 at 19:24, on Zulip):

where e.g. you are assembling a [Foo; N]

nikomatsakis (Jul 24 2018 at 19:24, on Zulip):

there are likely to be about N instances of type Foo =)

nikomatsakis (Jul 24 2018 at 19:25, on Zulip):

well we could just add a little hash set or whatever

lqd (Jul 24 2018 at 19:25, on Zulip):

here there's around 90k I think, which are all AFAICT tuples of (&str, (u32, u32))

lqd (Jul 24 2018 at 19:25, on Zulip):

and that's per location I think, which might be around 90k as well

lqd (Jul 24 2018 at 19:28, on Zulip):

and here there doesn't seem to be any live locals in those 92K points

nikomatsakis (Jul 24 2018 at 19:29, on Zulip):

hmm one problem probably is

nikomatsakis (Jul 24 2018 at 19:29, on Zulip):

each of those tuples has a distinct lifetime in its type

nikomatsakis (Jul 24 2018 at 19:29, on Zulip):

I suspect

lqd (Jul 24 2018 at 19:30, on Zulip):

do they show up in the logs by any chance (maybe would I need verbose mode) ?

nikomatsakis (Jul 24 2018 at 19:30, on Zulip):

that would probably help, yes

nikomatsakis (Jul 24 2018 at 19:30, on Zulip):

-Zverbose

lqd (Jul 25 2018 at 17:29, on Zulip):

@nikomatsakis is there anything I can do to help y'all investigate html5ever ? tracing / debugging / profiling (I have a callgrind profile) etc ?

nikomatsakis (Jul 25 2018 at 17:31, on Zulip):

hmm so can you summarize for me again — there are basically a ton of local variables, each with a type like &'X T?

nikomatsakis (Jul 25 2018 at 17:32, on Zulip):

I'm wondering what tricks we can play :)

lqd (Jul 25 2018 at 17:32, on Zulip):

there seem to be multiple areas this test shows

lqd (Jul 25 2018 at 17:32, on Zulip):

some quadratic behaviour njn reported

nikomatsakis (Jul 25 2018 at 17:32, on Zulip):

I guess the main trick would be

nikomatsakis (Jul 25 2018 at 17:32, on Zulip):

or, one trick we might do....

nikomatsakis (Jul 25 2018 at 17:33, on Zulip):

if we are in a static or constant, maybe we can just skip liveness somehow

nikomatsakis (Jul 25 2018 at 17:33, on Zulip):

I'm trying to remember though if that is true

lqd (Jul 25 2018 at 17:33, on Zulip):

and the thing I was seeing wrt to these locals having effectively &'static str strings, but in the liveness code it was seeing it as separate regions (I'm not sure the logs shoud/would show them as 'static at this point)

nikomatsakis (Jul 25 2018 at 17:33, on Zulip):

I was imagining essentially adding some kind of 'static bound to everything

nikomatsakis (Jul 25 2018 at 17:33, on Zulip):

the logs wouldn't show them that way

nikomatsakis (Jul 25 2018 at 17:34, on Zulip):

i.e., the first step in MIR land is that we erase all the regions and make them fresh variables

nikomatsakis (Jul 25 2018 at 17:34, on Zulip):

then find out the required relationships between them

nikomatsakis (Jul 25 2018 at 17:34, on Zulip):

at this stage, we won't have computed that they have to live for static lifetime

nikomatsakis (Jul 25 2018 at 17:34, on Zulip):

the question then is -- in a static -- is there some way to have a shorter-lived region that's important

lqd (Jul 25 2018 at 17:34, on Zulip):

(and I'm not even sure this would matter a lot even if this simulate_block was more efficient wrt to these common types)

nikomatsakis (Jul 25 2018 at 17:34, on Zulip):

but yes the other option is just to increase the efficiency

nikomatsakis (Jul 25 2018 at 17:35, on Zulip):

in some way

lqd (Jul 25 2018 at 17:35, on Zulip):

yeah and njn pointed other areas which looked interesting

lqd (Jul 25 2018 at 17:36, on Zulip):

let me get you a link to one of those

lqd (Jul 25 2018 at 17:37, on Zulip):

eg https://github.com/rust-lang/rust/blob/master/src/librustc_mir/borrow_check/places_conflict.rs#L32-L37

nikomatsakis (Jul 25 2018 at 17:37, on Zulip):

so that I have an idea for

nikomatsakis (Jul 25 2018 at 17:37, on Zulip):

I've been meaning to write it up

nikomatsakis (Jul 25 2018 at 17:37, on Zulip):

I may have left a cryptic comment somewhere

nikomatsakis (Jul 25 2018 at 17:37, on Zulip):

what %age of time does that fn account for?

nikomatsakis (Jul 25 2018 at 17:37, on Zulip):

the good news is that optimizing that fn would help across the board

nikomatsakis (Jul 25 2018 at 17:38, on Zulip):

right now we do a pretty naive comparison

nikomatsakis (Jul 25 2018 at 17:38, on Zulip):

my basic idea would be to reorganize how mir::Place is represented, to start

lqd (Jul 25 2018 at 17:38, on Zulip):

I think it's the hottest function of them all 28,184,735,859 instr

nikomatsakis (Jul 25 2018 at 17:38, on Zulip):

I think we would want something like Place { base: PlaceBase, projections: &[PlaceProjections] }

nikomatsakis (Jul 25 2018 at 17:38, on Zulip):

and then we would try to hash the "loans in scope" by the base

nikomatsakis (Jul 25 2018 at 17:38, on Zulip):

so that e.g. if yo have a borrow of the variable _X and an access to _Y

nikomatsakis (Jul 25 2018 at 17:39, on Zulip):

we don't ever have to compare them against one another

nikomatsakis (Jul 25 2018 at 17:39, on Zulip):

in particular when the number of loans is growing linearly as we traverse the code...

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

I've been wanting to write it up but also wanting to dig in and figure out in a bit more detail how that would work

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

actually just changing the structure of places

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

would already let us write that fn more efficiently

lqd (Jul 25 2018 at 17:40, on Zulip):

oh sounds interesting indeed

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

we wouldn't have to do the "unroll" trick

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

I know eddyb wanted it anyway

nikomatsakis (Jul 25 2018 at 17:40, on Zulip):

that I can write up right now

lqd (Jul 25 2018 at 17:42, on Zulip):

for %age I think the various unroll_place monomorphizations tally up to 12-13% of the total

nikomatsakis (Jul 25 2018 at 17:42, on Zulip):

fascinating

nikomatsakis (Jul 25 2018 at 17:42, on Zulip):

lots of loans :)

nikomatsakis (Jul 25 2018 at 17:42, on Zulip):

and lots of accesses :)

lqd (Jul 25 2018 at 17:42, on Zulip):

it's really called a lot for this static variable :)

nikomatsakis (Jul 25 2018 at 17:42, on Zulip):

without the hashing, you'd still have to do the comparisons, but they'd be a lot faster

lqd (Jul 25 2018 at 17:44, on Zulip):

njn also mentioned precompute_borrows_out_of_scope being problematic but I haven't looked at it yet

lqd (Jul 25 2018 at 17:47, on Zulip):

and ofc the push_type_live_constraint we mentioned before, where they all end up being the same type but we don't know/notice it here or before; indeed it seems like we could handle 'static in some quick/fast way somewhere :/

nikomatsakis (Jul 25 2018 at 17:47, on Zulip):

@lqd I wrote up https://github.com/rust-lang/rust/issues/52708

nikomatsakis (Jul 25 2018 at 17:48, on Zulip):

as for precompute_borrows_out_of_scope, I don't really know how to do that much better tbh

lqd (Jul 25 2018 at 17:49, on Zulip):

super, thanks for writing this up :)

nikomatsakis (Jul 25 2018 at 17:54, on Zulip):

do you think you might poke at it? :)

nikomatsakis (Jul 25 2018 at 17:55, on Zulip):

I could try to be more specific

nikomatsakis (Jul 25 2018 at 17:55, on Zulip):

it's one of those "nice, pure refactoring" sort of tasks I guess

nikomatsakis (Jul 25 2018 at 17:55, on Zulip):

maybe @Santiago Pastorino might be interested :)

Santiago Pastorino (Jul 25 2018 at 17:56, on Zulip):

@nikomatsakis sorry, wasn't following up, interested in what?

lqd (Jul 25 2018 at 17:56, on Zulip):

do you think it's something one can get done in a couple days (before I leave on vacation) ?

nikomatsakis (Jul 25 2018 at 17:56, on Zulip):

maybe more than a couple of days

nikomatsakis (Jul 25 2018 at 17:56, on Zulip):

hard to tell

DPC (Jul 25 2018 at 17:56, on Zulip):

i can chip in if you don't complete it

nikomatsakis (Jul 25 2018 at 17:56, on Zulip):

but it will affect a non-trivial chunk of code

nikomatsakis (Jul 25 2018 at 17:56, on Zulip):

@Santiago Pastorino interestedin #52708

Santiago Pastorino (Jul 25 2018 at 17:56, on Zulip):

sounds good

nikomatsakis (Jul 25 2018 at 17:56, on Zulip):

I'd have to think about the order of steps to take

nikomatsakis (Jul 25 2018 at 17:57, on Zulip):

I'm also curious to hear what @Eduard-Mihai Burtescu thinks

Santiago Pastorino (Jul 25 2018 at 17:57, on Zulip):

but didn't understand if @lqd was going to tackle it

nikomatsakis (Jul 25 2018 at 17:57, on Zulip):

I think it's still up in the air :)

Santiago Pastorino (Jul 25 2018 at 17:57, on Zulip):

I can do it if you want

lqd (Jul 25 2018 at 17:57, on Zulip):

apart from time constraint I'm game, that is we can collaborate on it :)

Santiago Pastorino (Jul 25 2018 at 17:57, on Zulip):

had no tasks right now

Santiago Pastorino (Jul 25 2018 at 17:57, on Zulip):

@lqd if you want, go ahead

Santiago Pastorino (Jul 25 2018 at 17:58, on Zulip):

I'm fine with whatever you guys want :)

lqd (Jul 25 2018 at 17:58, on Zulip):

is there something we can do about the 'statics specifically as well ?

nikomatsakis (Jul 25 2018 at 17:59, on Zulip):

it feels like there must be

nikomatsakis (Jul 25 2018 at 17:59, on Zulip):

so e.g.

nikomatsakis (Jul 25 2018 at 17:59, on Zulip):

any variable that winds up in the final value

nikomatsakis (Jul 25 2018 at 17:59, on Zulip):

clearly, all of its lifetimes must be 'static

nikomatsakis (Jul 25 2018 at 18:00, on Zulip):

so maybe we could skip computing liveness on it

nikomatsakis (Jul 25 2018 at 18:00, on Zulip):

I guess I should dump the html5ever MIR

nikomatsakis (Jul 25 2018 at 18:00, on Zulip):

@lqd do you have a minimized version of the troublesome code?

nikomatsakis (Jul 25 2018 at 18:00, on Zulip):

that would be helpful actually

lqd (Jul 25 2018 at 18:00, on Zulip):

sure

lqd (Jul 25 2018 at 18:00, on Zulip):

of the source not the MIR right ?

lqd (Jul 25 2018 at 18:00, on Zulip):

if so: https://play.rust-lang.org/?gist=c3e6c21848eb4c2aee3ec047e1b9d911&version=nightly&mode=debug&edition=2015

lqd (Jul 25 2018 at 18:01, on Zulip):

(rustc will OOM if you compile tho ;)

lqd (Jul 25 2018 at 18:03, on Zulip):

although I'm not sure one could call that "minimized" :/ (but easy to do so, reducing the number of entries in the arrays, à la https://play.rust-lang.org/?gist=c80f5a6133a739bcb084677f09ae4501&version=nightly&mode=debug&edition=2015)

nikomatsakis (Jul 25 2018 at 18:10, on Zulip):

right

nikomatsakis (Jul 25 2018 at 18:11, on Zulip):

that seems awfully minimal :)

lqd (Jul 25 2018 at 18:12, on Zulip):

it should at least show the &str regions :)

nikomatsakis (Jul 25 2018 at 18:13, on Zulip):

presumably something like this ought to show the general pattern, right

#![feature(nll)]

pub struct Map<K: 'static, V: 'static> {
    pub key: u64,
    pub disps: Slice<(u32, u32)>,
    pub entries: Slice<(K, V)>,
}

pub enum Slice<T: 'static> {
    Static(&'static [T]),
}

pub static NAMED_ENTITIES: Map<&'static str, (u32, u32)> = Map {
    key: 1897749892740154578,
    disps: Slice::Static(&[
        (1, 4),
        (1, 4),
        (1, 4),
        (1, 4),
        (1, 4),
        (1, 4),
        (1, 4),
    ]),
    entries: Slice::Static(&[
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
        ("GreaterSlan", (0, 0)),
    ]),
};


fn main() {

}
lqd (Jul 25 2018 at 18:13, on Zulip):

I believe so

nikomatsakis (Jul 25 2018 at 18:14, on Zulip):

ok so I could imagine doing a special-cased analysis targeting huge statics

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

which seems to occur with non-trivial frequency ;)

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

basically starting from _0, we walk backwards to find those locals whose data gets stored into _0

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

and we don't compute liveness for them

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

but rather for all their regions to 'static

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

something like that

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

we ought to be able to do this pretty fast using a union-find sort of thing

nikomatsakis (Jul 25 2018 at 18:15, on Zulip):

linear pass over the MIR I imagine

nikomatsakis (Jul 25 2018 at 18:17, on Zulip):

basically whenever you have _1 = _2 or _1 = Aggregate { _2 },

nikomatsakis (Jul 25 2018 at 18:17, on Zulip):

you would union _1 and _2

nikomatsakis (Jul 25 2018 at 18:17, on Zulip):

then we would look at the end at the set of things union'd with _0

nikomatsakis (Jul 25 2018 at 18:17, on Zulip):

something like that

lqd (Jul 25 2018 at 18:18, on Zulip):

would we do this right before the liveness:generate calls ?

nikomatsakis (Jul 25 2018 at 18:18, on Zulip):

presumably yes

lqd (Jul 25 2018 at 18:19, on Zulip):

I can investigate that

nikomatsakis (Jul 25 2018 at 18:24, on Zulip):

should I open another issue about it?

nikomatsakis (Jul 25 2018 at 18:25, on Zulip):

it'd also be good I think to look again at tuple-stress...

nikomatsakis (Jul 25 2018 at 18:25, on Zulip):

since IIRC it is still pretty slow

nikomatsakis (Jul 25 2018 at 18:26, on Zulip):

even though we basically eliminated liveness entirely there, right?

nikomatsakis (Jul 25 2018 at 18:26, on Zulip):

might give us a hint what the next bottleneck will be

lqd (Jul 25 2018 at 18:26, on Zulip):

opening an issue would be :thumbs_up:

lqd (Jul 25 2018 at 18:26, on Zulip):

I can check tuple-stress, I feel like liveness is not a bottleneck there anymore but can't remember for sure hmm

nikomatsakis (Jul 25 2018 at 18:27, on Zulip):

shouldn't be

lqd (Jul 25 2018 at 18:27, on Zulip):

agreed

lqd (Jul 25 2018 at 18:27, on Zulip):

but statics again could come into play

simulacrum (Jul 25 2018 at 18:27, on Zulip):

tuple stress is one big static, isn't it?

lqd (Jul 25 2018 at 18:28, on Zulip):

yeah

lqd (Jul 25 2018 at 18:29, on Zulip):

since it's surely not liveness there anymore, it would interesting to see what this static exercises so much

nikomatsakis (Jul 25 2018 at 18:32, on Zulip):

exactly

nikomatsakis (Jul 25 2018 at 18:32, on Zulip):

filed https://github.com/rust-lang/rust/issues/52713

lqd (Jul 25 2018 at 18:33, on Zulip):

thanks niko

nikomatsakis (Jul 25 2018 at 18:33, on Zulip):

whatever is affecting tuple-stress will probably affect html5ever once we do #52713 :)

lqd (Jul 25 2018 at 18:36, on Zulip):

I assume any big lump of statics will exercise regions renumbering

lqd (Jul 25 2018 at 18:37, on Zulip):

is this a different walk/visit, should we skip tys without regions here also ? (maybe it's just log heavy and doesn't contribute much to the overall time, I'll need to check)

nikomatsakis (Jul 25 2018 at 19:02, on Zulip):

I assume any big lump of statics will exercise regions renumbering

region renumbering? it shows up on the profiles, but usually very small

lqd (Jul 25 2018 at 19:09, on Zulip):

here, dataflow looks interesting, looking at logs while valgrind slowly runs

lqd (Jul 25 2018 at 19:09, on Zulip):

eg drop_flag_effects

lqd (Jul 25 2018 at 19:10, on Zulip):

of which there are a lot

nikomatsakis (Jul 25 2018 at 19:10, on Zulip):

hmmmmmmm

nikomatsakis (Jul 25 2018 at 19:10, on Zulip):

what on earth is that

lqd (Jul 25 2018 at 19:10, on Zulip):

and the logs are coming slooooowly

nikomatsakis (Jul 25 2018 at 19:10, on Zulip):

I've seen it on the profiles before

nikomatsakis (Jul 25 2018 at 19:11, on Zulip):

not very familiar to me

nikomatsakis (Jul 25 2018 at 19:11, on Zulip):

oh, it's just general manipulation of drop paths etc

nikomatsakis (Jul 25 2018 at 19:11, on Zulip):

so I guess this is just computing what is initialized etc hmm

nikomatsakis (Jul 25 2018 at 19:11, on Zulip):

we mgiht be able to speed that up, that code is kinda old and not (I don't think) esp optimized, iirc

lqd (Jul 25 2018 at 19:12, on Zulip):

I'm looking forward to the profile, if it has clear indicators, otherwise it's just talkative logs

lqd (Jul 25 2018 at 19:40, on Zulip):

the profile is available -- 35% memset, 43% bitslice::bitwise -- mostly called from here apparently, and from other places in dataflow, do_mir_borrowck -- all in all mostly from FlowsAtLocation: apply_local_effect, and reconstruct_statement_effect

tuple-stress-nll.txt
lqd (Jul 25 2018 at 19:40, on Zulip):

(bbl)

nikomatsakis (Jul 25 2018 at 19:43, on Zulip):

huh

nikomatsakis (Jul 25 2018 at 19:43, on Zulip):

interesting

lqd (Jul 25 2018 at 20:43, on Zulip):

I'm guessing Ariel would be the most familiar with this, slightly unfortunate :)

nikomatsakis (Jul 25 2018 at 20:44, on Zulip):

@pnkfelix too

nikomatsakis (Jul 25 2018 at 20:44, on Zulip):

but basically it sounds like

nikomatsakis (Jul 25 2018 at 20:44, on Zulip):

it's computing the initializtion sets

nikomatsakis (Jul 25 2018 at 20:44, on Zulip):

and the other various bits of moveck

nikomatsakis (Jul 25 2018 at 20:44, on Zulip):

those are a lot of bits and a lot of variables

nikomatsakis (Jul 25 2018 at 20:45, on Zulip):

not sure yet how we can avoid that... maybe there are some tricks we can do

lqd (Jul 25 2018 at 20:45, on Zulip):

yeah maybe Felix will have ideas as well

nikomatsakis (Aug 07 2018 at 20:38, on Zulip):

So folks: which benchmarks do you think we should focus on?

Here are the measurements that @lqd did: link. These compare the current master performance for most crates. Looking at these, cargo is the worst (at 16%). It's also part of bootstrap, so that seems important.

The latest clap-rs and html5ever are also ungreat, at about 15% each.

It'd be a nice goal to get the "master" versions of all crates under 10% overhead I think. Perhaps that should be the goal.

nikomatsakis (Aug 07 2018 at 20:38, on Zulip):

I'd also like to measure the html5ever that's on perf but with the liveness optimization applied, just to get a sense of that.

lqd (Aug 07 2018 at 20:48, on Zulip):

Do we know if/feel like the non-liveness parts of html5ever and tuple stress will happen in the wild ?

nikomatsakis (Aug 07 2018 at 20:51, on Zulip):

don't know

nikomatsakis (Aug 07 2018 at 20:51, on Zulip):

I mean I suppose other people will make massive constants :)

nikomatsakis (Aug 07 2018 at 20:51, on Zulip):

I think we should definitely optimize them

lqd (Aug 07 2018 at 20:53, on Zulip):

another round of profiling on the master crates might be worthwhile to find “in the wild” patterns we don’t know about yet, vs the ones on perf.rlo we already know about and are working on; cargo, clap, html5ever just to see if there’s low hanging fruit ?

nikomatsakis (Aug 07 2018 at 20:55, on Zulip):

I'm currently profiling cargo master

lqd (Aug 07 2018 at 21:00, on Zulip):

hopefully it matches with the other perf ideas you hand and mentioned recently. 10% seems an interesting goal (I wonder how much parallel queries improve things)

nikomatsakis (Aug 07 2018 at 21:01, on Zulip):

well, initial results on cargo...

nikomatsakis (Aug 07 2018 at 21:02, on Zulip):

most of the time is in nll::compute_regions (58% of borrowck) and in particular the typeck (48%)

nikomatsakis (Aug 07 2018 at 21:02, on Zulip):

canonicalization winds up at about 6% of borrowck (not a ton)

nikomatsakis (Aug 07 2018 at 21:02, on Zulip):

there's no glaringly obvious hot spots

lqd (Aug 07 2018 at 21:04, on Zulip):

on html5ever on perf, disabling liveness altogether yielded -25% IIRC, so with the liveness perf optimization it will probably still be too slow

nikomatsakis (Aug 07 2018 at 21:04, on Zulip):

well we know how fast it is

nikomatsakis (Aug 07 2018 at 21:05, on Zulip):

2,081.20%

nikomatsakis (Aug 07 2018 at 21:05, on Zulip):

instead of 3000%

nikomatsakis (Aug 07 2018 at 21:05, on Zulip):

so yeah a solid win but not there yet :)

lqd (Aug 07 2018 at 21:06, on Zulip):

true :) I had completely forgotten you had finished this line of work

lqd (Aug 07 2018 at 21:07, on Zulip):

the rest should be the moveck/tuple stress situation, I don't think Felix was able to look at it before leaving

lqd (Aug 07 2018 at 21:07, on Zulip):

so a promising areas albeit harder to tackle I guess

nikomatsakis (Aug 07 2018 at 21:08, on Zulip):

wins are definitely getting harder :)

nikomatsakis (Aug 07 2018 at 21:09, on Zulip):

it's interesting that we spent 11% of our time in prove_predicate

nikomatsakis (Aug 07 2018 at 21:10, on Zulip):

actually also kind of interesting...the typeck seems to have a lot of overhead that is just kind of "looking at the MIR" I guess

nikomatsakis (Aug 07 2018 at 21:10, on Zulip):

ok, well, I should look at some other tests

nikomatsakis (Aug 07 2018 at 21:10, on Zulip):

I don't think i'm seeing anything too obvious in cargo

nikomatsakis (Aug 07 2018 at 21:10, on Zulip):

I have some vague ideas but they are along the lines of "try this refactoring and see if we can clean it up"

lqd (Aug 07 2018 at 21:11, on Zulip):

are there such mir passes we can maybe combine so we "look at the mir" less in typeck ?

nikomatsakis (Aug 07 2018 at 21:12, on Zulip):

it does do 2 passes I think

nikomatsakis (Aug 07 2018 at 21:12, on Zulip):

not necessarily for a good reason

nikomatsakis (Aug 07 2018 at 21:12, on Zulip):

might help

nikomatsakis (Aug 07 2018 at 21:12, on Zulip):

iirc the first pass is a kind of sanity check

nikomatsakis (Aug 07 2018 at 21:12, on Zulip):

to let the later pass make more assertions

nikomatsakis (Aug 07 2018 at 21:18, on Zulip):

in case you are curious @lqd I'm taking some notes in this google spreadsheet

nikomatsakis (Aug 07 2018 at 21:20, on Zulip):

(pretty minimal, still it's interesting how similar the two are)

lqd (Aug 07 2018 at 21:20, on Zulip):

oh nice! I was indeed curious :)

nikomatsakis (Aug 07 2018 at 21:24, on Zulip):

liveness::generate is still a non-trivial percentage, interestingly

nikomatsakis (Aug 07 2018 at 21:24, on Zulip):

approx. 10%

nikomatsakis (Aug 07 2018 at 21:24, on Zulip):

@lqd to build html5ever, it seems like cargo-curl didnt' work for me

nikomatsakis (Aug 07 2018 at 21:24, on Zulip):

did you download from the main repo?

nikomatsakis (Aug 07 2018 at 21:25, on Zulip):

I guess they have tags

nikomatsakis (Aug 07 2018 at 21:25, on Zulip):

though they don't have a v0.5.4 tag :)

nikomatsakis (Aug 07 2018 at 21:26, on Zulip):

well I can just use the version in rustc-perf I guess

nikomatsakis (Aug 07 2018 at 21:34, on Zulip):

yeah so the html5ever (from perf) profile is completely different

nikomatsakis (Aug 07 2018 at 21:34, on Zulip):

each_borrow_involving_path is 41%

nikomatsakis (Aug 07 2018 at 21:35, on Zulip):

which means that https://github.com/rust-lang/rust/issues/53159 might be huge

lqd (Aug 07 2018 at 21:36, on Zulip):

html5ever I git cloned IIRC

nikomatsakis (Aug 07 2018 at 21:36, on Zulip):

basically all the rest is the "collect borrows in scope"

nikomatsakis (Aug 07 2018 at 21:36, on Zulip):

I don't yet have a smart idea how to do that much better

nikomatsakis (Aug 07 2018 at 21:36, on Zulip):

but we could get it down to 10x overhead ;)

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

ooh, I had an idea just now...

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

well, might not work

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

in particular, it might force us to compute liveness again :

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

but, for posterity, the idea is:

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

right now, we trace out the blocks where each borow is in scope

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

we do this in a kind fo silly way and I think I could get some wins from refactoring that

nikomatsakis (Aug 07 2018 at 21:37, on Zulip):

but leave that aside

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

well, what I want to do is: if, during this trace, we were also looking for accesses to paths that conflict with the borrow

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

then we could stop tracing as soon as we see that the borrowed path is dead

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

but .. I think that this might not help here, because the paths do not go dead until the very end

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

still, I also think...

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

hmm

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

for some borrows, there is no reason to record the borrow,

nikomatsakis (Aug 07 2018 at 21:38, on Zulip):

because there is no possible thing that the user could do

nikomatsakis (Aug 07 2018 at 21:39, on Zulip):

which could conflict with the borrow

nikomatsakis (Aug 07 2018 at 21:39, on Zulip):

this might be the case here

nikomatsakis (Aug 07 2018 at 21:39, on Zulip):

an example of such a borrow:

nikomatsakis (Aug 07 2018 at 21:39, on Zulip):

if you have x: &u32 and you do let y = &*x

nikomatsakis (Aug 07 2018 at 21:39, on Zulip):

there is no action that you can do with x that invalidates y

lqd (Aug 07 2018 at 21:40, on Zulip):

html5ever 0.5.4 is from perf though right ? if so, and for all other perf crates, I timed them from a rustc-perf repo clone, as the version is not enough a lot of times, that is perf had a specific commit rather than a released version

nikomatsakis (Aug 07 2018 at 21:40, on Zulip):

this is not quite true for let y = &x, even if x is not mutable, because we have to ensure that you don't move x etc

nikomatsakis (Aug 07 2018 at 21:40, on Zulip):

but if x is a static value

nikomatsakis (Aug 07 2018 at 21:40, on Zulip):

as is the case here

nikomatsakis (Aug 07 2018 at 21:40, on Zulip):

then I think maybe we can just not track this borrow at all

nikomatsakis (Aug 07 2018 at 21:40, on Zulip):

that would basically "crack" the case I think

nikomatsakis (Aug 07 2018 at 21:41, on Zulip):

actually maybe these cases are even just the simpler &* sort of thing

nikomatsakis (Aug 07 2018 at 21:42, on Zulip):

...looks like a bit of both, but most of the borrows are &* borrows

nikomatsakis (Aug 07 2018 at 21:42, on Zulip):

the other case is a bit trickier

nikomatsakis (Aug 07 2018 at 21:43, on Zulip):

we sometimes at least have StorageDead for temporaries; I don't quite remember the rules we wound up with there, @eddyb might

lqd (Aug 07 2018 at 22:08, on Zulip):

would hashing the borrows in #53159 depend on the new Place repr from #52708 ?

nikomatsakis (Aug 07 2018 at 22:09, on Zulip):

I thought at first it would but I realize now it doesn't

nikomatsakis (Aug 07 2018 at 22:09, on Zulip):

that would just make it mildly faster

nikomatsakis (Aug 07 2018 at 22:09, on Zulip):

I went ahead and implemented https://github.com/rust-lang/rust/issues/53176

nikomatsakis (Aug 07 2018 at 22:09, on Zulip):

it is part of https://github.com/rust-lang/rust/pull/53177

nikomatsakis (Aug 07 2018 at 22:09, on Zulip):

so we can measure the performance

lqd (Aug 07 2018 at 22:13, on Zulip):

already :) oh this one should be good

nikomatsakis (Aug 07 2018 at 22:23, on Zulip):

I'll do a local opt build to see if it works .. I've not really tested it very hard I guess

nikomatsakis (Aug 07 2018 at 22:24, on Zulip):

but I basically gotta go now

nikomatsakis (Aug 07 2018 at 22:35, on Zulip):

ok so assuming that this change is not wildly unsound...

nikomatsakis (Aug 07 2018 at 22:35, on Zulip):

my local measurements suggest that html5ever goes to 2983msec (NLL) vs 1912msc (AST)

nikomatsakis (Aug 07 2018 at 22:36, on Zulip):

ratio of 1.516 :tada:

nikomatsakis (Aug 07 2018 at 22:36, on Zulip):

still not great but ...

nikomatsakis (Aug 07 2018 at 22:37, on Zulip):

tuple-stress gets to a ratio of 2.29

lqd (Aug 07 2018 at 22:39, on Zulip):

nicely done :)

lqd (Aug 07 2018 at 22:40, on Zulip):

there seems to be a new bench since I did the measurements — keccak — which is at x1.8 now

nikomatsakis (Aug 07 2018 at 22:41, on Zulip):

hmm

nikomatsakis (Aug 07 2018 at 22:41, on Zulip):

never heard of it :)

lqd (Aug 07 2018 at 22:42, on Zulip):

me neither

lqd (Aug 07 2018 at 22:43, on Zulip):

but will be interesting to analyze :)

lqd (Aug 07 2018 at 22:47, on Zulip):

1.516 is huge for html5ever, awesome

nikomatsakis (Aug 07 2018 at 22:48, on Zulip):

I'm wondering what the rest is

nikomatsakis (Aug 07 2018 at 22:48, on Zulip):

I can do a perf record...

nikomatsakis (Aug 07 2018 at 22:48, on Zulip):

presuambly it's the dataflow stuff

lqd (Aug 07 2018 at 22:54, on Zulip):

It would make some sense that it is

nikomatsakis (Aug 07 2018 at 22:55, on Zulip):

actually, no

lqd (Aug 07 2018 at 22:56, on Zulip):

oh and perf.rlo is testing the latest "escaping paths" PR so we'll have more numbers soon

lqd (Aug 07 2018 at 22:56, on Zulip):

/me's world crumbles

nikomatsakis (Aug 07 2018 at 22:56, on Zulip):

at least, not just that

nikomatsakis (Aug 07 2018 at 22:56, on Zulip):

e.g. type-check is 19%

nikomatsakis (Aug 07 2018 at 22:56, on Zulip):

(of borrowck)

nikomatsakis (Aug 07 2018 at 22:57, on Zulip):

liveness is still relatively expensive... kind of interesting... I think we're still paying some cost to "walk the flow graph" or something

nikomatsakis (Aug 07 2018 at 22:57, on Zulip):

we should check for the cast where there are literally zero variables that we are computing liveness for

nikomatsakis (Aug 07 2018 at 22:57, on Zulip):

I believe that is the case here

nikomatsakis (Aug 07 2018 at 22:57, on Zulip):

that'd be like a 13% win :)

lqd (Aug 07 2018 at 22:58, on Zulip):

:)

nikomatsakis (Aug 07 2018 at 22:58, on Zulip):

otherwise I think some non-trivial percentage is just that the bit sets are really, really big

lqd (Aug 07 2018 at 23:00, on Zulip):

I would have not expected that it'd be so different from tuple-stress at this point :/

lqd (Aug 07 2018 at 23:01, on Zulip):

even though some percentage of the time for the rest of the crate is not coming from the big static ofc

nikomatsakis (Aug 07 2018 at 23:01, on Zulip):

something that occurred to me

nikomatsakis (Aug 07 2018 at 23:01, on Zulip):

if we have a constraint 'foo: 'static

nikomatsakis (Aug 07 2018 at 23:01, on Zulip):

it's kind of a special case

nikomatsakis (Aug 07 2018 at 23:02, on Zulip):

we could tweak the SCC computation

nikomatsakis (Aug 07 2018 at 23:02, on Zulip):

to always put them in the same SCC

nikomatsakis (Aug 07 2018 at 23:02, on Zulip):

that might dramatically reduce the number of regions we need to worry about

nikomatsakis (Aug 07 2018 at 23:02, on Zulip):

in the case of html5ever, anyway

nikomatsakis (Aug 07 2018 at 23:03, on Zulip):

(we'd basically be adding synthetic edges from 'static to every other region, from the POV of the SCC computation)

nikomatsakis (Aug 07 2018 at 23:03, on Zulip):

basically SCC is not aware that for<'a> 'static: 'a

nikomatsakis (Aug 07 2018 at 23:03, on Zulip):

grr I guess I should file an issue for that too

nikomatsakis (Aug 07 2018 at 23:03, on Zulip):

I really gotta go :)

lqd (Aug 07 2018 at 23:04, on Zulip):

:) we can file it later

nikomatsakis (Aug 07 2018 at 23:05, on Zulip):

filed https://github.com/rust-lang/rust/issues/53178

nikomatsakis (Aug 07 2018 at 23:05, on Zulip):

ok, ciao!

lqd (Aug 07 2018 at 23:06, on Zulip):

:wave:

lqd (Aug 08 2018 at 02:20, on Zulip):

as expected from previous results #53168 improves html5ever's cpu and memory usage nicely :) dashboard for convenience

nikomatsakis (Aug 08 2018 at 08:55, on Zulip):

@lqd I just did a profile of keccak

nikomatsakis (Aug 08 2018 at 08:55, on Zulip):

one thing jumps out immediately: dominators computation at 17% !

nikomatsakis (Aug 08 2018 at 08:56, on Zulip):

type-check only 9%

nikomatsakis (Aug 08 2018 at 08:56, on Zulip):

it must have some crazy control flow

nikomatsakis (Aug 08 2018 at 08:56, on Zulip):

also, lots of time in unroll_place

davidtwco (Aug 08 2018 at 08:56, on Zulip):

There's not a lot in keccak: https://github.com/rust-lang-nursery/rustc-perf/blob/master/collector/benchmarks/keccak/src/lib.rs

nikomatsakis (Aug 08 2018 at 08:57, on Zulip):

well, those unroll24! macros etc

nikomatsakis (Aug 08 2018 at 08:57, on Zulip):

presumably are generating just a lot of code

nikomatsakis (Aug 08 2018 at 08:58, on Zulip):

but basically zero long-term borrows

lqd (Aug 08 2018 at 09:00, on Zulip):

unroll_place should be bypassed with the « hash borrows » issue right ?

nikomatsakis (Aug 08 2018 at 09:02, on Zulip):

that was the goal, yes

nikomatsakis (Aug 08 2018 at 09:02, on Zulip):

I updated the google drive spreadsheet

nikomatsakis (Aug 08 2018 at 09:03, on Zulip):

one thing that occurs to me with the hash borrows issue.. at least for the simplistic variant I proposed to do first ... if that if you have a lot of small borrows of the same place, then every access to that place will iterate over all of them

nikomatsakis (Aug 08 2018 at 09:03, on Zulip):

I had an idea that required a lot more refactoring that would solve that though

nikomatsakis (Aug 08 2018 at 09:04, on Zulip):

I should try to write it up, the idea was basically to combine determining the borrow's scope with checking for errors

nikomatsakis (Aug 08 2018 at 09:04, on Zulip):

essentially, we walk forward from the point of each borrow until it goes out of scope (as today)

nikomatsakis (Aug 08 2018 at 09:04, on Zulip):

we stop if we encounter something that kills the borrow

nikomatsakis (Aug 08 2018 at 09:05, on Zulip):

and — as we go — we check each place that is accessed for conflicts with just that borrow

nikomatsakis (Aug 08 2018 at 09:05, on Zulip):

instead of walking the set of places, and checking against all active borrows

nikomatsakis (Aug 08 2018 at 09:05, on Zulip):

sort of similar to the "invert liveness" idea

nikomatsakis (Aug 08 2018 at 09:05, on Zulip):

I think this would be better but it requires us doing some non-trivial refactoring

nikomatsakis (Aug 08 2018 at 09:05, on Zulip):

among other things, if you have many borrows with small duration, you will only check the things that occur in their (small) durations against them

nikomatsakis (Aug 08 2018 at 09:06, on Zulip):

e.g., here, in keccak, we have many accesses to array[x][y]

nikomatsakis (Aug 08 2018 at 09:06, on Zulip):

those may or may not be borrows, not sure

lqd (Aug 08 2018 at 09:17, on Zulip):

oh interesting

nikomatsakis (Aug 08 2018 at 09:18, on Zulip):

from keccak:

    bb25993: {
lqd (Aug 08 2018 at 09:18, on Zulip):

wow

nikomatsakis (Aug 08 2018 at 09:19, on Zulip):

there may be no borrows at all...

nikomatsakis (Aug 08 2018 at 09:19, on Zulip):

emacs is very slow to search this file :)

nikomatsakis (Aug 08 2018 at 09:20, on Zulip):

well, there must be some

nikomatsakis (Aug 08 2018 at 09:20, on Zulip):

but still:

rg '= &' mir_dump/rustc.f1600.-------.nll.0.mir
nikomatsakis (Aug 08 2018 at 09:20, on Zulip):

but then how could unroll_place take so much time....?

nikomatsakis (Aug 08 2018 at 09:22, on Zulip):

oh...

nikomatsakis (Aug 08 2018 at 09:22, on Zulip):

check_for_invalidation_at_exit also calls it

nikomatsakis (Aug 08 2018 at 09:22, on Zulip):

as does check_if_reassignment_to_immutable_state

nikomatsakis (Aug 08 2018 at 09:22, on Zulip):

hmm

nikomatsakis (Aug 08 2018 at 09:23, on Zulip):

that last one is atrocious

nikomatsakis (Aug 08 2018 at 09:24, on Zulip):

check_for_invalidation_at_exit is only called for each borrow

nikomatsakis (Aug 08 2018 at 09:24, on Zulip):

something about check_if_reassignment_to_immutable_state seems very wrong to me

nikomatsakis (Aug 08 2018 at 09:24, on Zulip):

it's iterating over all things that have been initialized

nikomatsakis (Aug 08 2018 at 09:25, on Zulip):

there must be some faster way to map to the bit we are interested in

nikomatsakis (Aug 08 2018 at 10:07, on Zulip):

filed https://github.com/rust-lang/rust/issues/53189 — I think that check_if_reassignment_to_immutable_state is both slow and wrong

nikomatsakis (Aug 08 2018 at 10:07, on Zulip):

a good combination :)

nikomatsakis (Aug 10 2018 at 08:40, on Zulip):

OK, I was doing some more benchmarks...

nikomatsakis (Aug 10 2018 at 08:40, on Zulip):

it seems like tuple-stress -- as @lqd has noted in the past-- is entirely dominated by the "initialization" checks for some reason that is not clear to me

nikomatsakis (Aug 10 2018 at 08:41, on Zulip):

i.e., I don't know why it is so different from html5ever but it appears to be?

nikomatsakis (Aug 10 2018 at 08:42, on Zulip):

though reconstruct_statement_effect and friends are still prominent in html5ever

nikomatsakis (Aug 10 2018 at 08:43, on Zulip):

it feels like we ought to be able to skip some of that work "by construction"

nikomatsakis (Aug 10 2018 at 08:44, on Zulip):

for example, the compiler temporaries that we generate

nikomatsakis (Aug 10 2018 at 08:44, on Zulip):

we do not need to do the full initialization checks on them

nikomatsakis (Aug 11 2018 at 00:35, on Zulip):

After https://github.com/rust-lang/rust/pull/53258, I see html5ever as a 1.40 ratio on my machine

nikomatsakis (Aug 11 2018 at 01:40, on Zulip):

before:

Screen-Shot-2018-08-10-at-9.27.01-PM.png

after:

Screen-Shot-2018-08-10-at-9.27.31-PM.png

nikomatsakis (Aug 11 2018 at 01:40, on Zulip):

nicht schlim

nikomatsakis (Aug 11 2018 at 01:41, on Zulip):

now if only the try build would finish on https://github.com/rust-lang/rust/pull/53258 so I could run perf...

nikomatsakis (Aug 11 2018 at 01:41, on Zulip):

/me -> bed

Jake Goulding (Aug 11 2018 at 01:42, on Zulip):

I see html5ever as a 1.40 ratio on my machine

This buries the lede a bit. It previously was 31.5x! Now html5ever is no longer the slowest!

Jake Goulding (Aug 11 2018 at 01:43, on Zulip):

At least a few percent improvement on almost every other test as well

lqd (Aug 11 2018 at 07:31, on Zulip):

wow awesome job! (and perf is running the PR as I write this)

davidtwco (Aug 11 2018 at 07:42, on Zulip):

That's awesome.

lqd (Aug 11 2018 at 09:03, on Zulip):

I’ll rerun the up-to-date versions once I’m back, and try and help simulacrum with the lightweight/daily tracking

nikomatsakis (Aug 11 2018 at 09:59, on Zulip):

#53258 looks pretty good. Not quite as good as I had hoped...

nikomatsakis (Aug 11 2018 at 09:59, on Zulip):

but very good on keccak

lqd (Aug 11 2018 at 10:22, on Zulip):

the escaping paths and redundant borrows optimization worked beautifully kisses fingers

nikomatsakis (Aug 11 2018 at 10:36, on Zulip):

yeah, I'm happy with that

nikomatsakis (Aug 11 2018 at 10:36, on Zulip):

I'm toying a bit with https://github.com/rust-lang/rust/issues/52460 locally

nikomatsakis (Aug 11 2018 at 10:36, on Zulip):

I've been looking at cargo and some of the other cases

nikomatsakis (Aug 11 2018 at 10:36, on Zulip):

vs the outliers

nikomatsakis (Aug 11 2018 at 10:36, on Zulip):

and I have to admit that it is getting hard to see places to improve :/

nikomatsakis (Aug 11 2018 at 10:38, on Zulip):

I still think we'll be able to make further progress, but I think the best avenue is sort of stepping back and seeing if we can do things smarter

nikomatsakis (Aug 11 2018 at 10:38, on Zulip):

though I'd like to get njn's opinion, they have an eye for hotspots

nikomatsakis (Aug 11 2018 at 10:38, on Zulip):

I'll ping them on irc maybe (done)

lqd (Aug 11 2018 at 12:42, on Zulip):

they do have an eye for hotspots, said eye which they put into writing the cachegrind tool :D

lqd (Aug 11 2018 at 12:43, on Zulip):

so good

memoryruins (Aug 20 2018 at 17:40, on Zulip):

IMG_0748.PNG

memoryruins (Aug 20 2018 at 17:41, on Zulip):

keccak no longer up top and tuple-stress is so far down the list

lqd (Aug 20 2018 at 17:43, on Zulip):

(html5ever is missing from the list since it doesn't currently build)

memoryruins (Aug 20 2018 at 17:44, on Zulip):

(rip)

memoryruins (Aug 20 2018 at 17:53, on Zulip):

curious if this will have noticeable change of html5ever build memory usage https://github.com/rust-lang/rust/pull/53384#issuecomment-414089811

lqd (Aug 20 2018 at 17:57, on Zulip):

it doesn't seem like max-rss changed

Wesley Wiser (Aug 20 2018 at 18:03, on Zulip):

max-rss for html5ever is down significantly in #53327: https://perf.rust-lang.org/compare.html?start=50503497492e9bab8bc8c5ad3fe403a3a80276d3&end=ed285b7a46c0949465c4c1af1d968de39cc1dbbc&stat=max-rss

lqd (Aug 20 2018 at 18:04, on Zulip):

wow, down to 600MB :)

Wesley Wiser (Aug 20 2018 at 18:07, on Zulip):

Yeah, still a ways to go :)

Jake Goulding (Aug 20 2018 at 20:14, on Zulip):

640MB ought to be enough for anybody

memoryruins (Aug 20 2018 at 20:54, on Zulip):

except for my vps ;-; (really should upgrade that node..)

nikomatsakis (Aug 21 2018 at 15:20, on Zulip):

(html5ever is missing from the list since it doesn't currently build)

wait what

nikomatsakis (Aug 21 2018 at 15:20, on Zulip):

I missed that...

nikomatsakis (Aug 21 2018 at 15:20, on Zulip):

why doesn't it build?

kennytm (Aug 21 2018 at 15:22, on Zulip):

the errors encountered by perf can be found in https://perf.rust-lang.org/status.html

kennytm (Aug 21 2018 at 15:22, on Zulip):

caused by use_extern_macro i think

kennytm (Aug 21 2018 at 15:23, on Zulip):
error: macro-expanded `macro_export` macros from the current crate cannot be referred to by absolute paths
  --> /home/alex/.cargo/registry/src/github.com-1ecc6299db9ec823/string_cache-0.2.29/src/lib.rs:71:13
   |
71 |     pub use atom;
   |             ^^^^
   |
note: the macro is defined here
  --> /tmp/.tmpVgWdo3/target/debug/build/string_cache-3124df44a494850b/out/atom_macro.rs:2:1
   |
2  | / macro_rules! atom {
3  | | (\"sdev\") => { $crate::Atom { unsafe_data: 0x2 } };
4  | | (\"onstart\") => { $crate::Atom { unsafe_data: 0x100000002 } };
5  | | (\"overflow\") => { $crate::Atom { unsafe_data: 0x200000002 } };
...  |
127| | (\"stroke-miterlimit\") => { $crate::Atom { unsafe_data: 0x4f400000002 } };
127| | }
   | |_^
nikomatsakis (Aug 21 2018 at 16:24, on Zulip):

@kennytm hmm -- seems .. bad? That was compiling on stable, I believe? Are we tracking this?

simulacrum (Aug 21 2018 at 16:29, on Zulip):

@nikomatsakis Yes, @Vadim Petrochenkov said it was needed but we can make it a future compat warning for now

nikomatsakis (Aug 21 2018 at 16:29, on Zulip):

ok

nikomatsakis (Aug 21 2018 at 16:29, on Zulip):

can we workaround for the purpose of perf? it's a shame to not be tracking html5ever...

simulacrum (Aug 21 2018 at 16:31, on Zulip):

Yeah I need to write up a patch, hoping for sometime today

simulacrum (Aug 21 2018 at 16:32, on Zulip):

(it's actually in a dependency, string-cache)

simulacrum (Aug 21 2018 at 16:32, on Zulip):

I believe it should be trivial to fix

nikomatsakis (Aug 22 2018 at 14:19, on Zulip):

so I did a bit of measurements on clap-rs. After #53314, I get the following results (very coarse-grained, intentionally) for clap-rs:

athena. perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 5 --tree-max-depth 2 --relative
Matcher    : {do_mir_borrowck}
Matches    : 91
Not Matches: 378
Percentage : 100%

Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (41% total, 3% self)
: : | <rustc_mir::dataflow::DataflowAnalysis<'a, 'tcx, BD>>::propagate (15% total, 1% self) [...]
: : | <rustc_mir::dataflow::DataflowAnalysis<'a, 'tcx, D>>::propagate_bits_into_entry_set_for (14% total, 13% self) [...]
: | rustc_mir::borrow_check::nll::compute_regions (37% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (28% total, 0% self) [...]
nikomatsakis (Aug 22 2018 at 14:20, on Zulip):

based on this, it seems like #53328 might be a win (since it reduces the amount of dataflow we do), but it's a bit hard to tell (in particular, it'd be useful to separate out the various dataflows)

nikomatsakis (Aug 22 2018 at 14:20, on Zulip):

improving the compute_regions and type_check code is the other avenue; I guess more digging is in order there

nikomatsakis (Aug 24 2018 at 00:48, on Zulip):

did some more measurements. This time I broke down the dataflow by inserting dummy fns

nikomatsakis (Aug 24 2018 at 00:49, on Zulip):

It looks like flow_inits, flow_ever_inits, and flow_move_outs each compromise about 1/3 of the dataflow time

nikomatsakis (Aug 24 2018 at 00:49, on Zulip):

this explains why #53394 is a win (which @Santiago Pastorino you and I have to get working on again)

nikomatsakis (Aug 24 2018 at 00:50, on Zulip):

it is kind of amazing that flow_ever_inits is so expensive

nikomatsakis (Aug 24 2018 at 00:50, on Zulip):

I think we are computing way more bits there than we probably have to

nikomatsakis (Aug 24 2018 at 00:50, on Zulip):

however #53328 is less obviously going to be a win -- it might be, but it won't really come from skipping the dataflow per se

nikomatsakis (Aug 24 2018 at 00:51, on Zulip):

Screen-Shot-2018-08-23-at-8.51.01-PM.png

nikomatsakis (Aug 24 2018 at 00:51, on Zulip):

that's the relevant part of the profile

nikomatsakis (Aug 24 2018 at 00:51, on Zulip):

(those are "percent of time spent in borrowck")

nikomatsakis (Aug 24 2018 at 00:52, on Zulip):

also, this is a handy macro

macro_rules! inline_never {
    ($name:ident, $body:expr) => {
        {
            #[inline(never)]
            fn $name<R>(arg: impl FnOnce() -> R) -> R { arg() }
            $name(|| $body)
        }
    }
}
nikomatsakis (Aug 24 2018 at 00:53, on Zulip):

have to keep that in my back pocket...

nikomatsakis (Aug 24 2018 at 00:53, on Zulip):

you can basically do let foo = inline_never!(bar, something); and the time spent evaluating something will show up in the profile labeled bar

nikomatsakis (Aug 24 2018 at 00:54, on Zulip):

yeah so we only ever lookup locals in the ever_inits map

nikomatsakis (Aug 24 2018 at 00:54, on Zulip):

though this could plausibly change in the future

Santiago Pastorino (Aug 24 2018 at 02:06, on Zulip):

@nikomatsakis let’s get into it whenever you want/like :slight_smile:

lqd (Aug 24 2018 at 16:16, on Zulip):

@simulacrum does this look correct to you ? https://github.com/rust-lang-nursery/rustc-perf/pull/278

lqd (Aug 24 2018 at 16:16, on Zulip):

oh but there's CI so we'll see soon enough :)

simulacrum (Aug 24 2018 at 16:17, on Zulip):

@lqd Left a comment, but looks good

lqd (Aug 24 2018 at 16:18, on Zulip):

only one run coming up

lqd (Aug 24 2018 at 16:20, on Zulip):

done

simulacrum (Aug 24 2018 at 16:21, on Zulip):

Ping me when CI passes but I'll try to check in later today as well

lqd (Aug 24 2018 at 16:21, on Zulip):

how long does CI usually take ?

lqd (Aug 24 2018 at 16:24, on Zulip):

(answering my own question, and for people at home, from travis' history, around 30-40mins)

simulacrum (Aug 24 2018 at 16:28, on Zulip):

And this is with servo's and cargo removed because benchmarks are so slow....

lqd (Aug 24 2018 at 16:59, on Zulip):

@simulacrum I waited patiently for it ... to fail, because even though I knew a Cargo.lock was needed, the crate .gitignore'd and I didn't notice :)

lqd (Aug 24 2018 at 18:04, on Zulip):

@simulacrum CI's green https://github.com/rust-lang-nursery/rustc-perf/pull/278

lqd (Aug 28 2018 at 19:29, on Zulip):

ok so I automated a bit of the timings similar to rustc-perf (but I can't use perf so it's just manual), it's more destined to be informative (as I don't have the same exact profiles as perf.rlo, and I might have made mistakes) but this is on the nightly from 2 days ago, looking good I think (if accurate...):

crate version cargo check range NLL cargo check range min ratio
cargo perf.rlo – 0.29.0 69d61e 8158 ms - 8879 ms 8953 ms - 9139 ms 1.09
cargo master – 0.30.0 4e53ce 8512 ms - 8856 ms 9342 ms - 9374 ms 1.09
clap-rs perf.rlo – 2.29.0 4241 ms - 4486 ms 5342 ms - 5552 ms 1.26
clap-rs latest – 2.32.0 2674 ms - 2720 ms 2936 ms - 2968 ms 1.09
html5ever perf.rlo – 0.5.4 2203 ms - 2394 ms TODO TODO
html5ever latest – 0.22.3 1416 ms - 1449 ms 1532 ms - 1564 ms 1.08
hyper perf.rlo – 0.5.0 1584 ms - 1650 ms 1687 ms - 1773 ms 1.06
hyper latest – 0.12.7 2311 ms - 2464 ms 2454 ms - 2742 ms 1.06
inflate perf.rlo – 0.1.0 1836 ms - 1882 ms 2286 ms - 2369 ms 1.24
inflate latest – 0.4.3 507 ms - 537 ms 540 ms - 629 ms 1.06
piston-image perf.rlo – 0.10.3 2279 ms - 2490 ms 2438 ms - 2597 ms 1.07
piston-image latest – 0.19.0 2727 ms - 2944 ms 2912 ms - 2948 ms 1.06
ripgrep perf.rlo – 0.8.1 a383d5 1284 ms - 1326 ms 1374 ms - 1465 ms 1.07
ripgrep master – 0.8.1 d857ad 1310 ms - 1315 ms 1404 ms - 1551 ms 1.07
serde perf.rlo – 1.0.37 6e206c 5273 ms - 5324 ms 5886 ms - 6241 ms 1.11
serde master – 1.0.70 4e54aa 5785 ms - 6277 ms 6023 ms - 6088 ms 1.04
style-servo perf.rlo 32209 ms - 33289 ms 34631 ms - 35339 ms 1.07
syn perf.rlo – 0.11.11 1131 ms - 1206 ms 1217 ms - 1258 ms 1.07
syn latest – 0.14.5 2031 ms - 2138 ms 2111 ms - 2259 ms 1.04
ucd perf.rlo 6533 ms - 6809 ms 54169 ms - 55417 ms I don't wanna talk about it ok it's 8.29
webrender perf.rlo – 0.57.2 bb354a 4152 ms - 4369 ms 4480 ms - 4826 ms 1.08
webrender master – 0.57.2 cf9b7803 4312 ms - 4609 ms 4618 ms - 4816 ms 1.07
nikomatsakis (Aug 28 2018 at 19:29, on Zulip):

\o/ nice

lqd (Aug 28 2018 at 19:32, on Zulip):

(that was with 3 warmups builds + 5 timed builds, and the min ratio between those, a mean would be more realistic but less good looking ;)

Last update: Nov 22 2019 at 00:15UTC