We're holding a second team meeting in this stream on Wednesday, May 8th at 7:00 AM EDT/13:00 CEST (click for more time zones). We anticipate this meeting to be about 1 hour long.
Proposed meeting agenda:
- Discuss the current state of the tool
- Validate that the next step in the tracking issue (#58967) regarding changes to
rustc-perf are correct and still valid.
- Discuss requested features, enhancements, and changes with a focus on beginner level issues to help onboard new contributors.
- Other topics?
All are welcome to attend!
So the agenda was:
Is there anything we should add?
I guess we'll just run down the agenda top to bottom then
So status update on the tool
We've got a library
measureme which provides a fast binary encoding for profiler events.
As of ~1 month ago, this got merged into
rustc so any recent nightly is using it
We also have a few tools utilizing the binary format
@Wesley Wiser do you want to give the update?
mmview which is basically just
cat but for the binary format
Yeah, feel free to chime in though :)
summarize which gives a more detailed version of the old
-Z self-profile output
And also provides a json export format
And then a few graphical tools
collapse-stacks gives query/activity info in a format that the flame graph tools understand
crox translates the event dump into the Chrome profiler format
We've also had some recent contributions for some documentation and some enhancements from other community members.
I think that's the current state
yes, maybe some details on the compiler side:
but we do keep all the data in memory until the process exits (at least on linux)
so max-rss information is skewed
only 2-3% bump which is fine by me
yeah, it's not that bad
the question is, does it make the collected data less valuable
probably not really
max-rss is noisy anyway
anyway, the compiler side is almost ready for the MVP
still lots of future improvements possible of course :)
I think that's all my additions to the status update
Ah, yes. I forgot about the output directory discussion. I'll push up a PR for that this week.
Does the filename need to be "predictable" as well?
Currently it's like
I don't think so
For perf.r-l.o, we need to be able to find it
it should include the crate name for readability
for perf.rlo I'd expect there to be a new temp directory for each invocation
there'll be 3 files anyway
sure, seems reasonable-ish (I don't _think_ we have temp dirs today but I could imagine doing that, I think)
Ok. I can make those changes this week.
(ideally the temp dir would be in a RAM drive, to keep disk IO at a minimum)
we can probably do that, the current machine is 32GB ram iirc
but we can discuss this in more detail later, if this seems too complicated for perf.rlo
I'd need to investigate to be sure, will ping folks
So I guess the next question is, can we just turn on
-Z self-profile for per.rlo runs as we were hoping to?
Or is the overhead still too high?
niko asked about this yesterday and had some thoughts in this stream as well https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile/topic/overhead
I think so, yes -- looking at wall time the biggest differences are a second or two
one thing to consider: this is "virtual" overhead anyway
the user won't see it
i.e. it's not a performance regression
right, yeah -- that's part of why I'm not too concerned
the main question is: will it somehow skew the results?
i.e. will it make our measurements inaccurate in some way
by favoring big benchmarks for example
I would say no. Any cost that measureme imposes is approximately linear I think with respect to "work"
or certain test cases
I think so too
we do write a lot of data to the disk though
(it certainly hurts large test cases more than small ones, but that's consistent with that theory)
so we might be more I/O bound?
if we're writing at the end of the session on linux then that's not a problem
There's probably some overhead which isn't equal over recording every event. For example, when we reach the end of one of the
mmap'd pages and the OS has to allocate another. But that's still "constant" skew over the entire run
so, I'd also say that the data is as useful as it is now
And again, it would scale linearly with the total number of events recorded
one downside might be that we never measure what the user actually executes
but that's fine
we don't ... really do that today anyway, for the most part
our data isn't super accurate anyway because it's exactly one machine on one platform :)
alright, it sounds like we agree that turning on by default is what we want?
So, I think I'm seeing consensus that we should enable it on perf.rlo and that the overhead is within acceptable margins?
how large are the artifacts being produced?
e.g. for librustc would we expect 1GB, 10GB, or what for the profile data?
tens to hundreds of MBs, I think
ah, okay, so actually not that big -- that sounds good.
@Wesley Wiser do you have numbers around?
I don't think we've actually measured librustc
I've mostly just been playing with the
script-servo would be interesting as an extreme case
Which is ~10mb I believe
I'm relatively unconcerned then
rustc-perf configured on my desktop here, I can do a test run and see how big the artifacts are
should we make an action item to gather some data here?
@Wesley Wiser yes, please
I think that'd be good (it sort of ties in to some questions I have on the next few bullet points)
Ok. I'll also do that hopefully this week
@simulacrum Do you want to go over those questions now?
Is the profile data itself useful beyond running tooling over it locally? That is, would it make sense to want to keep it around (uploading it to S3, for example)?
I'd say: no
it does compress rather well, otoh
but still, I'd say, do all the post-processing locally and only keep around the results of that
The reason I ask is that if the answer is no then we'd presumably want to run tools on the perf server, which might not be ideal (i.e., need to run N tools and upload N results vs. just local data)
I guess it depends how much functionality we want to expose from perf.rlo
summarize's output is minuscule and that's what the MVP has in it
we could also upload the entire data, but only keep it around for a few days
collapse-stacks output is also very small and pretty useful IMO
So that would be very nice to have available
the other reason I ask is that I suspect it might be far easier to implement blindly uploading these artifacts than trying to figure out other ways to expose it
my initial vision was: run benchmarks -> post-process data -> upload post-processing artifacts -> delete all local data
where post-processing is just running a few of the measureme tools
If there were an easy way to retrieve the artifacts, then I could see people using it. If not, then it might not be that important to keep around.
hm, okay -- how long does it take to run the measureme tools?
< 1 second each
okay, so that's not a problem then -- I ask because
perf script takes minutes on big crates' perf record dump
it depends on the data size
(in --release mode :laughing: )
@Wesley Wiser can you collect some data on that as well?
Even for a massive crate, I'd image it's < 10 seconds
I'll just test the largest output from
So long as it's less than a minute I'm not too worried, I don't want to be spending an additional half hour per run though just running measureme tooling :)
like, just making sure it's not more than a few seconds for script-servo would be enough
how parallelizable is this?
you mean running N tools simultaneously?
i.e. if we did it all at the end, then wall-time would be lower
or yes, if we have N different tools, we can do that too
that would be the easiest thing to do
let's wait on @Wesley Wiser's results and decide then
so here's another interesting thing:
most benchmarks are run through N iterations, right?
we somehow need to combine that data
min is taken for measurements today
ok, so we'd only combine (via min) the summarize report
historically my thought has been to just use one run's data to start with
for flamegraph etc, we just take the shortest one, maybe?
We could just use whichever run was the MIN's results
well, we do it per statistic... which is maybe not great but
so like max-rss might be a different run than walltime
Oh I see
ok, sticking to the current behavior seems fine to me
is the summary output text? JSON? csv?
okay, that's nice
Yeah, we have json
We're also going to support outputting a JSON diff as well
Between two different runs
yeah, I've been thinking: that's not so useful for perf.rlo
because the user can ask for the diff between arbitrary commits
so the diffing has to be done on the fly
in the browser I mean
ah, we send the data for all compare pages from Rust so I don't see why it'd be client side
I guess it could be
ah, ok, so you have a webservice implemented in Rust for that?
yeah, perf has a raw hyper backend that serves all the pages and data
(this runs separately from the collection infrastructure)
anyway, it's probably more important what we want the result to look like
yeah -- I think that result is clickable links on every benchmark run
I had something like this in mind: https://hackmd.io/s/rkERN4lnE
with tables sortable by column, maybe?
and a separate comparison table for each thing we measure
seems feasible -- this might also be too much data, though, not sure
too cluttered, you mean?
we could only show lines above a certain threshold by default
makes sense to me
@Wesley Wiser how does that sound to you?
There was similar discussion actually about the default
Whatever value we decide probably makes sense to apply there as well
maybe we don't need to show all tables
we will want the summarize output to include everything inside perf and then filter (maybe client side) for perf.r-l.o
--json returns all the data
The other options are just for the stdout display
I think I'm happy then wrt to a plan for perf -- I should also have more time after tomorrow to maybe start looking into an implementation
one more thing:
@simulacrum Feel free to ping me with work items. I'm happy to work on them
the postprocessing tools and the compiler must be in sync, kind of
how do we make sure they are?
does perf.rlo use rustup or something like that?
we use... well, proto rustup
(read: download artifacts from CI ourselves)
I think it would make sense to have measureme tools be a rustup component anyway
but that doesn't contain cargo.lock or anything we could use to check for equal versions; I think there was some discussion of the tools "knowing" that they should not run or otherwise erroring out
would that help here too?
in theory, yes, in practice it's likely more work than just installing them because we don't actually support components today
though we do want to (windows check builds for winapi being a prominent use case)
Should we try to migrate to using rustup then? Or is that too big a project to block on?
I'd not block on it
we probably shouldn't block, yeah
presumably for now we can just depend on the right version which seems easy enough
I guess we could add a flag to rustc to have it tell us what version of measureme it's using?
(are these tools all CLI or cargo-deps?)
yeah, the format shouldn't change too often
I'm not sure what you mean by cargo-deps
crates for Cargo.toml
They're separate crates in the measureme repo/Cargo workspace
But they're not uploaded to crates.io
right, but libraries vs. binaries is the distinction I'm asking about
Yeah they're all binaries
okay, sounds good -- I wouldn't worry then about versioning anything just yet as a blocking bit
we can update as needed
let's try to add the version header to the file format asap, so we know when somethings wrong
I will try to get some issues or something opened on perf over the next weekish to assign to @Wesley Wiser
(or, well, anyone in the community if we want to diversify)
alright, we should make a list of action items and add them to the tracking issue
(or make existing points there more concrete)
I think that's it?
nothing else I can think of
Since we're over time, lets call this meeting unless @mw has anything to add?
(They seem to have dropped off)
happy to call the meeting, yes
Thanks all! \o
Sorry, yes, I was actually late to the next meeting already