(placeholder for upcoming meeting)
I took some notes for what the meeting could be about: https://hackmd.io/@michaelwoerister/Sy7D3pxPH/edit
contains a suggested agenda
I'll go have lunch now :)
I think @pnkfelix wanted to attend too
i'm here now
let's start then
so my proposed agenda would be:
1. establish the current status
2. find out what we want to focus on next
3. come up with action items
shall we do that? did I forget something important?
Sounds good to me
so, the big project is still the "MVP"
i.e. getting self-profiling onto perf.rlo
we lost a bit of steam over the summer :)
which is fine
it's pretty far along actually
Indeed, yes -- I got the impression when I last looked at this (I think a few months ago) that we are pretty much ready to go ahead and add support to perf.rlo, just that we haven't quite gotten around to it mostly
did we properly specify what it should look like on perf.rlo
I know we chatted about it
I think no -- I recall lots of discussion -- but I think we also decided that an initial bit could be to just get the output of
measureme summarize or so on the page (via links or something)
the todo list in the tracking issue is still up-to-date, I think
I remember that one should be able to "zoom into" a specific benchmark
and have a comparison of time spent in each query
This may or may not be related: I would like the freedom to add new measurements that capture data at a finer grain, but @mw was worried about that perturbing the UX for people just trying to see the overall picture (presumably via
i.e. you click on a comparison in the comparison view and get to a new page comparing each query
hm, yes, I do recall something to this effect
So we may want to say up front what events will be filtered out from the perf.rlo presentation?
yes, filtering is already implemented and perf.rlo would only do stuff that doesn't add much overhead
(We talked about the details view around here https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile/topic/meeting.202019-05-08/near/165155575 I think)
ah, well, I think @pnkfelix is talking about visual filtering, not collection wise
I suppose both topics are relevant
I think perf.rlo would basically give you what
summarize does at the moment
but basically, I want to know what the process is for me to add new instrumentation for my own use that won't perturb what perf.rlo shows
plus comparison between two runs
I am not too concerned with collection overhead, bors is so slow these days that we can keep up no problem (especially with one of the servo crates broken...)
it sounds like the answer is "don't change
I suspect that we'll probably want a
summarize-perf or equivalent at some point soonish, too, for that reason
perf.rlo has to deal with event types being added and removed anyway, right?
because new queries get added
or removed or renamed
Not today, but yes, in general it does
So, I did land
summarize diff over the summer which can handle new/missing events between runs https://github.com/rust-lang/measureme/tree/master/summarize#the-diff-sub-command
diff likely won't work well for us as we don't want to keep the raw event files around I presume? Though we could plausibly upload them to S3 if they're not too big
(I recall some size estimates earlier)
yeah, we only want the post-processed artifacts, I think
The diff actually can work with the processed json files as well as the raw events files.
that would be a lot of data otherwise
ah, cool, okay -- so maybe we start by getting those processed json files stored as a first step
then we can figure out the rest once we have the raw data
presuming the processed json is "sufficient" for ~all uses?
I think it is for the MVP
anyway, I can likely invest time into this
I think it's sufficient for anything we'd want to expose from perf.rlo at this time
(likely with support from @Wesley Wiser if I have questions about self-profile filtering and such)
I'd be glad to help out!
so, I like roughly the table that
summarize diff produces as a web page
with the ability to sort by the different columns
I think that would be pretty useful
indeed, yes -- that seems like a pretty good target for MVP (wouldn't be surprised if hardest bit is the sorting :)
the second part of the MVP is reviewing if we are recording the right events
we already have queries and some LLVM and front-end stuff
but trait selection would be interesting too, I think
I can do a review of the kinds of data we are recording
I know we're not currently recording anything for trait selection beyond what's captured by the general query infrastructure.
and make a PR, and maybe in the process, come up with guidelines for when someone wants to add their own stuff
I think one concern from my end is that getting sort of into trait selection etc might be a bit prone to tracking a moving target? I don't really have a good sense of how quickly we expect polonius (so, regionck changes), chalk, etc. to start coming in
But generally speaking the review and PR sounds good!
hm, I guess chalk is still some way off
and we have to expect changes anyway
True, yes, I was just not sure if "some way" was weeks or months
from the outside it would mostly look like a query got removed or added
/me thinks time until chalk is the basis of implementation is measured in months
@nikomatsakis ^ ?
If months then yeah, I think makes sense to invest
We also don't really have a strategy for allowing dependent crates to use
it's not much of an investment and it's no big deal if it goes away again soon, I'd say
can you elaborate, @Wesley Wiser ?
eg. polonius might like to use
measureme and have their instrumentations included in the overall
I never thought about that, actually
As it stands, I think doing this currently would result in separate trace files.
hm, can polonius integrate into the rustc event system? maybe via a feature flag?
well, everything goes through the
Profiler object, right?
It sounds like we sort of want something similar to
though not necessarily decoupled so much
so, any other crates would just need a way of taking a
Profilerfrom the outside instead of instantiating their own
the api is pretty much string based, so that would be no problem
That's fair. I think there are some options currently, we just haven't really thought through them. It would be good to do so and have some documentation so that crates like polonius have a guide to integrating with
@simulacrum, I think you are suggesting having some kind of "interface" crate (i.e.
log) and then an implementation crate, right?
But I don't think we need to talk about that right now.
Yes -- I agree to not needing to discuss right now
I wonder if it would make sense to split out some of the stuff into
but yes, let's table that for now
alright, so I think that is the status of the MVP
the two other things that come up are:
1. people don't know how to use selfprofiling
2. they want a feature that tells them where the compiler spends it's time (i.e. on what functions)
topic 1, ergonomics, is pretty important, I think
although, the MVP would take away some of the pressure because perf.rlo would do the hard stuff
it's still rather important though :)
Personally for the ergonomics side I think it might be worth looping in Cargo folks, they've been doing some pretty good work with cargo -Ztiming and we might be able to integrate self-profile into that fairly easily (since Cargo knows about pids, etc)
It's a small thing, but the rustc guide on profiling didn't mention
-Z self-profile so I added a pointer to our docs https://rust-lang.github.io/rustc-guide/profiling.html
interesting. I need to look up what cargo -Ztiming even is :)
-Ztiming flag is really cool
see e.g. benchmarks we definitely see being removed/added over time
how much do we care with parity with
-Z time-passes, in terms of how much information is captured?
I assume it doesn't really matter for MVP
since people who want the
-Z time-passes data can continue to use it
I think "not at all" personally, time-passes is sort of similar but queries are more precise
(at least, for parts where we have queries, and my understanding is we can approximate them for the areas where we don't with the current profiling infra
nice! cargo -Ztiming definitely seems to overlap with what I had in mind for multi-crate self-profiling
I guess my point is more that: If our plan is to eventually tear out the
-Z time-passes code, then it might be good to first ensure that
-Z self-profile is covering at least the important cases that
-Z time-passes did (which may not necessarily mean all the cases it covers)
we are in no hurry to remove -Ztime-passes though, I think
but again, that does not seem to be a requirement for the MVP itself.
so, I think the big task for ergonomics is coming up with a description of what ergonomic usage would even look like
i.e. what is the workflow we want to have
I don't think that is something we should discuss here in detail
but it would be interesting to know if somebody would want to look into that
I'd like to hear from people who've used or tried to use measureme before. We have a lot of data available now but we should know what people are trying to do with it. Is it just finding slow queries or are there other things our tools should optimize for?
yes, that sounds smart :)
so you mean do a general survey?
(maybe on users.rlo ?)
Are general rust users the target audience for measureme?
(I assumed it would be more people hacking on rustc)
internals might be the better venue
oh I suppose its more an internals.rlo thing
I think at this point no, though we may want to pivot to that a little once we have the span-based tracking of cost
I know a couple people have tried to use measureme but I don't know if they found it useful or not.
@pnkfelix you mentioned using it to investigate incremental performance the other day
i had to add more calls (events?) to get the data I needed
I think it still makes sense to come up with a vision ourselves first, right?
I know @nikomatsakis was trying to use it over the summer as well.
(which is why I've been talking about the workflow for that)
or rather a set of use-cases and then drill down into those
that's a good point
hm, coming up with the set of use cases might be something that would benefit from asking on internals
in particular, I can imagine using this for comparing:
1. two different versions of the compiler
but, let's not forget, we are users too :)
2. two runs of same compiler with different flags
3. a single run of the compiler on some interesting benchmark
4. I suppose one could also use it to observe two runs of same compiler with same flags on two variants of a fixed benchmark.
shall we open a thread on internals, where we just publicly collect use cases?
... does that sound ... plausibly exhaustive?
and others can chime in
I think opening a thread sounds good, though I do feel like that's pretty exhaustive :)
I think concrete examples of problems people tried to investigate might still be helpful
i.e. the use cases are still pretty abstract
@Wesley Wiser, would you be up for starting that thread?
alright, we are at 50 minutes
shall we talk about the other feature people want?
i.e. assigning compilation time to specific items in your code?
Yes, I think that's probably related to some of the feedback we're going to get on irlo
I think that one would need quite a bit of planning
In some sense, we're already tracking it (or could be) as an "input" to queries via tcx.at(span).query(...)
in some sense, yes :)
so one option might be to just dump all those spans into the data we're collecting and then dump the span table at compilation end or so
it's still complicated though, because queries are cached
I would personally be really happy with an MVP that ignored incremental to start
but queries are cached even in non-incremental mode
sorry, what does "ignore incremental" mean?
as in, you can't trust the output you get when incremental compilation is turned on?
it seems like we can just attribute to all spans loosely that a query is run with, right?
query invocations form a directed acyclic graph
I think this topic is too complicated for this meeting
I think the issue is that if, for example, we ask for
optimized_mir(foo) twice, the first time the query runs, it will take some amount of time. But the second time, it will already be cached in the DepGraph so it will be basically free.
So counting all of the time for the first place that asked for the query and one for the second isn't correct.
Because removing the first site might not actually improve compilation time.
so, we know there is interest in this feature, but when do we want to spend time on it?
hm, but presumably it's rarely the case that the span changes across those two invocations?
I think it is sort of MVP 2.0
I think the issue is that if, for example, we ask for
optimized_mir(foo)twice, the first time the query runs, it will take some amount of time. But the second time, it will already be cached in the DepGraph so it will be basically free.
Isn't this info incorporated into the presentation in some fashion? Or am I misremembering?
yes, it is
we can track of the whole graph, including cache hits
the information is there, we just need to find out how to usefully interpret it
so is this really a matter of improving the presentation, so that user expectations are managed better?
I think it's mostly a matter of communicating that to the user
Most people seem to want to see something like
$ summarize code-profile /path/to/rustc-results Total time in rustc: 120 seconds ---------------------------------------- | % time | Item | | ------ | ----------------------------- | 20.4% | example::foo::bar::clone() | | 10.2% | example::baz::widget::bla() | (more rows)
Accurately communicating where time was spent but also how to actually improve compilation time in such a simple format seems very difficult.
the feature is pretty extensive, actually. a whole sub-product kindof
I also think most people might be okay with us not initially helping with the latter
Like, if I know the function then I can get LLVM IR, etc and do that work myself more readily
I get the sense that (some) people want this to make changes to their code so it compiles faster. If we don't provide data that helps them achieve that, the tool won't be useful to them.
would it be helpful to have a place where we can just collect ideas and thoughts about this feature?
but where? :)
topic here ?
Maybe a dedicated repo? Similar to unsafe code guidelines etc
Is this feature on topic for the irlo post?
I think that has worked well for us as a community
@Wesley Wiser, not quite
although it will probably come up
(There is an issue for the feature here https://github.com/rust-lang/measureme/issues/51)
We could discuss on that issue for now
@simulacrum , in that repo, where would the discussion happen?
Issues for the most part
yes, let's stick to the existing GH issue for now
and ask folks like @nikomatsakis to post their wishlist/ideas there
alright, does everybody feel that they've got enough action items for the immediate future?
I have the event review + PR for me
I think so
perf.rlo updates for @simulacrum and @Wesley Wiser
(my action item is to start the pre-triage for the rustc meeting. oh wait, that's probably not what you meant)
@Wesley Wiser will post about ergonomics on internals
I'll post the irlo thread, write up meeting notes, and work with @simulacrum on perf.rlo
((i will try to at least provide feedback on my own user case on the internals post))
@pnkfelix, very good!
if you want to be extra helpful then you could also describe there how using self-profiling would have looked like in the ideal world for your use case
/me merges back into PGO debugging for the rest of the week :ghost:
thanks for attending the meeting everyone !!! :heart: