@lqd (cc @Frank McSherry) — it looks like some new benchmarks were recently added to perf — in particular
webrender seems to exhibit ungreat NLL perf, maybe we should add that one for testing.
did Reed's PR to have the invalidates facts land in rustc ?
at least so I can quickly try 1) leapfrog on it, 2) the loc-insensitive prepass on it
/me tries -Znll-facts :3, result: it does have them all yay
one challenge: figuring out which fns are slow. Might be just the biggest ones. Before I did timing measurements. (Would also be worth looking again at clap to see which other fns are slow.) I don't know if those timing measurements are in nightly or not though (you could see the results for indiv functions with
-Ztime-passes, I probably would have to recreate the patch)
this is what was inside my reply box but didn't hit send :) webrender has 4-5 subcrates, dependencies, etc & must have a lot of functions
More benchmarks is great. Is there any reason a :frog: based non-NLL phase (perhaps this is the location-invariant version) which does the old LL borrow checking should be any slower? Probably not worth worrying too much about the perf on problems well addressed by that, and rather just on the gap between the two.
(( Also, just responded to @nikomatsakis's mention, and then realized that there is already some chitchat, sorry. ))
this is basically the location-insensitive variant (although it is not lexical)
the location-insensitive variant is actually less expressive in some ways — notably it doesn't track where a borrow was introduced and limit its effects to locations reachable from there — but I think likely still serves as an effective pre-screen
keep in mind though that the NLL numbers we see on perf — at least based on the profiles i've done elsewhere, i'll have to repeat for the new cases — are often registering overhead that occurs before/after the "core analysis" that polonius models anyway
(although a major source of the overhead for clap is still squarely in the dataflow that polonius subsumes)
(if needed, we could make the location-insensitive variant somewhat more precise of course)
just trying it on cargo/webrender would answer if we need to make it more precise right ?
or would the imprecisions be in what it doesn't output, most likely ?
My intuition (based only on programs that I've written) is that most borrowing / lifetimes don't require NLL, and for the borrows that can be dispatched early with traditional reasoning, they just all get dropped from the NLL input and hooray.
if we try it and find it emits no errors, then we are satisfied (for now)
if it does emit errors, then either we try and adopt the approach where we use those errors to limit the work of the location-sensitive variant — or else we make it more precise so that it screens out errors
Sort of related Q: are there any/many benchmarks of programs that exercise the NLL nature of NLL? Like, large programs that don't pass current borrow_ck but should.
I guess they will come in time :)
So, hypothetically if NLL gets turned on there may be a spike in "whoa, we didn't see this sort of behavior before."
(relating to your intution, clearly every extant rust program does not need NLL...)
I'm not sure exactly what you mean by "this behavior" — like, these sorts of compile times? oh, just "programs exhibiting these properties"?
I guess I was thinking "performance defects in NLL reasoning" akin to whatever might be stressing out webrender.
I certainly expect a period — after turning on NLL — of bug reports related to it, whether they be perf or correctness...
can we get facts when using cargo ?
cargo rustc -- -Znll-facts probably works
It would be pretty not-hard to add in a bit of diagnostic code: in each of the
join things, one can run a timer and attribute the resulting
Duration to the destination relation, and print everything out in
Drop code. I've got something that does this for tuple counts already, but .. should the need arise for more consistent diagnosis stuff (a la souffle).
unfortunately facts with cargo doesn't produce anything, this is going to be tougher than I expected :) it's time to bring out cargo -v :3
@lqd I think you need to add
#![feature(nll)] too (or the suitable
oh interesting, rustc-perf is a bit obscure from the outside :)
good to know, indeed cargo + the feature = :thumbs_up:
interesting, just checking the facts, it might also be not a single slow function, they seem small-ish for webrender itself (the biggest 20 fns combined are less than the clap dataset) so maybe time-passes would be indeed worthwhile (or could also be shared in slow dependencies)
@nikomatsakis what I seeing is this: 1)
time: 130.060; rss: 292MB MIR borrow checking 2) a couple thousand
solve_nll_region_constraintstimed at 0.000 3) 2 or 3 timed at 0.001 -- should I be looking at something in particular ?
(btw, is rustc doing the NLL analysis in parallel, e.g. $nb_cores functions at a time ? if not, could we now ? there must some intricacies collating results, but at least spinning multiple datafrog computations at the same time seems doable)
We are not. I would encourage you not to think about parallelism: I think we should strive to make it work on a single core.
1. we are actively working on adding parallelism within crates and queries, which would mean taht we would process N functions at once.
2. we often compile N crates at once
3. once we have those pieces in place, we can yes imagine doing parallel sorts and so forth — but we would want to balance resource usage overall
I think we have a story there, but we shouldn't look to parallelism alone as the "salvation", I guess is what i'm saying .. often the cores will be busy elsewhere :)
agreed, it was just a random thought :)
that said, we should probably try it out
so let me weaken my statement :)
that is, once we have those pieces in place — in particular, rustc will have a fork of rayon that will hopefully eventually be the real rayon — it'd be nice if we already knew how best to take advantage of it!
@nikomatsakis btw did you see the previous "130s mir borrow checking" without easy to notice slow subtasks ? is there maybe a way to have more information about the times passes (besides profiling rustc)
@lqd so time-passes is basically useless
and you should ignore it
that is, it is not telling you what it looks like it is telling you
we are in the process of replacing it with something that will give realistic numbers
e.g., in that output, it is not clear what composes those 130s
it includes at least mir borrow checking...but quite possibly other things, like mir construction
that said, I had a locally extended version of the compiler
that hijacked time-passes to dump per-fn information in a very narrow way :)
and I was using that to identify slow functions
so @lqd what info were you looking for exactly? (before I went on my rant...)
I guess the short answer is no, there is no easy way to get info — profiling rustc (e.g., with perf) is the way to do it
I tried compiling webrender with NLL and it was indeed "slow", so I was looking for a way to narrow down where this time was spent
more precisely which "use case" could be extracted for benchmarking in polonius
(before I leave for rustfest until tuesday)
let me go looking for my patch
If anyone feels like setting up LTTNG on their machine, I do have a local patch that dumps some stats about variable updates over the their user-space tracing (UST) channels.
Should be possible to upstream it, it's all behind a feature flag but I'm not sure how useful it would be
@Reed Koser LTTNG is an interesting project! Do you happen to know if it supports macOS or an alternate that might?
I believe it's linux-only unfortunately. Probably not for technical reasons (i.e. you could port it) but just because there is only a relatively small number of contributors. I don't know what OSX has for tracing unfortunately
most of the really robust tracing tools are deeply hooked in to the kernels of their respective systems (general purpose tracers) or bespoke (things like Chrome/V8's internal profiler/Gecko's profiling tools, etc.)
even LTTNG started as a kernel tracing tool, and TraceCompass (the officially sanctioned graphical frontend) is... sub par. You define visualizations using this weird and feature-incomplete XML DSL or by using
babeltrace to pull the traces into Python and then using some of the stuff from Python's data science community to generate imagery
cross-platform userspace tracing is on the (extremely long...) list of yaks I want to shave some day :upside_down_face: