Stream: t-compiler

Topic: -Zself-profile


nikomatsakis (Nov 30 2018 at 19:22, on Zulip):

BTW @Wesley Wiser are you currently working on extending -Zself-profile in any way?

I ran against the style-servo crate and I got this:

| Phase            | Time (ms)      | Queries        | Hits (%) |
| ---------------- | -------------- | -------------- | -------- |
| Parsing          | 143            |                |          |
| Expansion        | 3532           |                |          |
| TypeChecking     | 16340          | 95991164       | 97.21    |
| BorrowChecking   | 1525           | 718742         | 64.41    |
| Codegen          | 33423          | 3779295        | 87.41    |
| Linking          | 614            | 474496         | 91.10    |
| Other            | 7901           | 85816990       | 97.52    |

Seem like one obvious thing is it'd be nice to have %ages for the time in each of those categories

Wesley Wiser (Nov 30 2018 at 19:23, on Zulip):

@nikomatsakis @mw asked for that the other day

Wesley Wiser (Nov 30 2018 at 19:23, on Zulip):

It's on my TODO list :)

nikomatsakis (Nov 30 2018 at 19:23, on Zulip):

heh ok

Wesley Wiser (Nov 30 2018 at 19:24, on Zulip):

I'm in the middle of tracking down an issue with the inline changes I've been working on but after I've fixed this one, I'll take a break on work on the profiler

nikomatsakis (Nov 30 2018 at 19:25, on Zulip):

ok

nikomatsakis (Dec 28 2018 at 14:15, on Zulip):

So @Wesley Wiser I think the next steps would be:

Longer term:

nikomatsakis (Dec 28 2018 at 14:16, on Zulip):

@mw may have thoughts, though

nikomatsakis (Dec 28 2018 at 14:16, on Zulip):

although I think they are on PTO :)

Wesley Wiser (Dec 28 2018 at 14:23, on Zulip):

Display percentages and perhaps a bit more info -- but I suppose if we can emit JSON or CSV we can kind of play with that independently

Do you mean like this? https://github.com/rust-lang/rust/pull/56702

nikomatsakis (Dec 28 2018 at 14:37, on Zulip):

I suppose I do =)

nikomatsakis (Dec 28 2018 at 14:37, on Zulip):

@Wesley Wiser do we have JSON or CSV output?

Wesley Wiser (Dec 28 2018 at 14:38, on Zulip):

JSON

Wesley Wiser (Dec 28 2018 at 14:38, on Zulip):

Thanks to @simulacrum, it's actually being collected on perf.rlo https://github.com/rust-lang-nursery/rustc-perf/issues/299#issuecomment-444736451

Wesley Wiser (Dec 28 2018 at 14:39, on Zulip):

There's no visualization yet though

simulacrum (Dec 28 2018 at 14:39, on Zulip):

Yep, we're waiting on a proposal though mainly. FWIW I have the time to invest in the next ~week or two so proposals are welcome :)

mw (Jan 16 2019 at 12:50, on Zulip):

I'd like to re-evaluate what we want from -Zself-profile and then set some goals from there.

mw (Jan 16 2019 at 12:52, on Zulip):

Ideally self-profile would answered all questions about "where is the compiler spending its time". perf and cachegrind are not so good for that (for various reasons).

mw (Jan 16 2019 at 12:55, on Zulip):

So some of the requirements for this are:

Björn Steinbrink (Jan 16 2019 at 12:58, on Zulip):

A better break-down of which queries are actually executed would be nice. Just seeing Other or Type checking taking 80% of the time isn't all that useful.

mw (Jan 16 2019 at 12:59, on Zulip):
mw (Jan 16 2019 at 12:59, on Zulip):

A better break-down of which queries are actually executed would be nice.

Agreed, but I think want to go even a step further and record actual start and end timestamps.

Björn Steinbrink (Jan 16 2019 at 12:59, on Zulip):

I also noticed that the time attribution seems to be a bit off. Time increased in one area, but the query count in another. Most likely this is because the timing only starts if the query is not answered from cache?

Björn Steinbrink (Jan 16 2019 at 13:01, on Zulip):

How much more info does -Zprofile-queries already provide? The only time I tried to use that, rustc got OOM'd, 32GB just wasn't enough

mw (Jan 16 2019 at 13:02, on Zulip):
mw (Jan 16 2019 at 13:03, on Zulip):

How much more info does -Zprofile-queries already provide?

I think it collects a lot of data but is not optimized enough for actual profiling.

mw (Jan 16 2019 at 13:04, on Zulip):

I probably stores query keys too.

mw (Jan 16 2019 at 13:05, on Zulip):

but to be honest, I've never used it in earnest and would opt for removing it from the compiler

mw (Jan 16 2019 at 13:07, on Zulip):
mw (Jan 16 2019 at 13:09, on Zulip):
mw (Jan 16 2019 at 13:13, on Zulip):
mw (Jan 16 2019 at 13:22, on Zulip):

I think with detailed profiles like described above we can finally easily answer questions like:

mw (Jan 16 2019 at 13:25, on Zulip):
mw (Jan 16 2019 at 13:30, on Zulip):

Can anyone think of anything else? Does anyone see a problem with my proposed requirements?

Wesley Wiser (Jan 16 2019 at 14:21, on Zulip):

It would be nice to have a way to see the query stacks per query. IE: query a was invoked from query b which was invoked from query c.

mw (Jan 16 2019 at 14:22, on Zulip):

yes, that would be cool

mw (Jan 16 2019 at 14:22, on Zulip):

I mean, this can be inferred once you have time-stamps

Wesley Wiser (Jan 16 2019 at 14:23, on Zulip):

Probably only in single threaded mode right?

mw (Jan 16 2019 at 14:23, on Zulip):

I'd like to record time stamps per thread

Wesley Wiser (Jan 16 2019 at 14:23, on Zulip):

Oh ok

nikomatsakis (Jan 16 2019 at 14:24, on Zulip):

I was thinking that we can have a "profile" mode that records basically everything about execution as a series of "per thread events", right? I thnk this is probably what @mw is describing too. Then we can reconstruct whatever view we want from that

mw (Jan 16 2019 at 14:24, on Zulip):

yes

mw (Jan 16 2019 at 14:25, on Zulip):

it should not be invasive performance-wise

nikomatsakis (Jan 16 2019 at 14:25, on Zulip):

e.g., you might log:

nikomatsakis (Jan 16 2019 at 14:25, on Zulip):

I feel like..even if it is... we can kind of "account for it"

nikomatsakis (Jan 16 2019 at 14:25, on Zulip):

but it probably shouldn't that expensive

mw (Jan 16 2019 at 14:25, on Zulip):

but if we only record the data but don't process it immediately that should be fine

nikomatsakis (Jan 16 2019 at 14:25, on Zulip):

yes, that

nikomatsakis (Jan 16 2019 at 14:25, on Zulip):

right now at least our compilation times are sufficiently big that I am not very worried about this at all =)

mw (Jan 16 2019 at 14:26, on Zulip):

:)

Wesley Wiser (Jan 16 2019 at 14:26, on Zulip):

EDIT: If there is a cheap way to also record the net time spent in a query (i.e. total time minus time spent in sub-queries) that would be nice to have too. The detailed profiling data has the information via timestamps, but maybe we can also do it for summary queries somehow.

This is actually what the summary times currently show. We need to start tracking at the query level though instead of just the category level to get most of the other requirements.

mw (Jan 16 2019 at 14:28, on Zulip):

so if a "TypeCheck" query calls an "Other" query, the execution time of the "Other" query is substracted from the "TypeCheck" query?

Wesley Wiser (Jan 16 2019 at 14:30, on Zulip):

Yes, exactly

nikomatsakis (Jan 16 2019 at 14:33, on Zulip):

it seems like producing the log is step 1

Wesley Wiser (Jan 16 2019 at 14:35, on Zulip):

It would be great if this also worked out of the box with parallel queries.

Wesley Wiser (Jan 16 2019 at 14:35, on Zulip):

I don't really have a handle on how parallel queries works though

nikomatsakis (Jan 16 2019 at 14:50, on Zulip):

I don't think that's a big problem

nikomatsakis (Jan 16 2019 at 14:50, on Zulip):

as long as we make the logs "thread-local" basically

Wesley Wiser (Jan 29 2019 at 22:34, on Zulip):

I have an initial (incomplete) implementation which gives this output

Wesley Wiser (Jan 29 2019 at 22:35, on Zulip):

To make the output easier to understand, I decided to omit queries that took less than 1ms in total. Feel free to push back if you don't like that.

Zoxc (Jan 29 2019 at 22:36, on Zulip):

Could you right align the time columns? =P

Wesley Wiser (Jan 29 2019 at 22:37, on Zulip):

Probably :)

mw (Jan 30 2019 at 10:54, on Zulip):

Nice!

mw (Jan 30 2019 at 10:56, on Zulip):

It would be great if there was a way to easily spot the most expensive queries. Maybe omit the category and just sort them?

mw (Jan 30 2019 at 10:59, on Zulip):

You know what would be awesome for the verbose profiling mode: If we store absolute timestamps (i.e. not relative to the beginning of the rustc process) we can profile the entire crate graph compilation, not just a single crate.

Wesley Wiser (Jan 31 2019 at 00:45, on Zulip):

@mw They are sorted most-to-least expensive under the categories. I think I'll also sort the categories by total time so the most expensive query should be at the top.

We're also storing absolute times in the event log so that is possible :)

Wesley Wiser (Jan 31 2019 at 03:27, on Zulip):
Self profiling results:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| Other                                     |            417 |    54.02 |             -1 |       -1
| - {time spent not running queries}        |            352 |    45.60 |             -1 |       -1
| - const_eval                              |             16 |     2.07 |             -1 |       -1
| - type_of                                 |             10 |     1.30 |             -1 |       -1
| - adt_def                                 |              8 |     1.04 |             -1 |       -1
| - const_eval_raw                          |              8 |     1.04 |             -1 |       -1
| - visible_parent_map                      |              7 |     0.91 |             -1 |       -1
| - item_attrs                              |              6 |     0.78 |             -1 |       -1
| - item_children                           |              6 |     0.78 |             -1 |       -1
| - adt_dtorck_constraint                   |              3 |     0.39 |             -1 |       -1
| - adt_destructor                          |              1 |     0.13 |             -1 |       -1
| Linking                                   |            245 |    31.74 |             -1 |       -1
| - {time spent not running queries}        |            245 |    31.74 |             -1 |       -1
| TypeChecking                              |             62 |     8.03 |             -1 |       -1
| - trait_impls_of                          |             10 |     1.30 |             -1 |       -1
| - typeck_item_bodies                      |              9 |     1.17 |             -1 |       -1
| - typeck_tables_of                        |              9 |     1.17 |             -1 |       -1
| - {time spent not running queries}        |              9 |     1.17 |             -1 |       -1
| - evaluate_obligation                     |              7 |     0.91 |             -1 |       -1
| - const_is_rvalue_promotable_to_static    |              5 |     0.65 |             -1 |       -1
| - is_copy_raw                             |              5 |     0.65 |             -1 |       -1
| - rvalue_promotable_map                   |              5 |     0.65 |             -1 |       -1
| - dropck_outlives                         |              2 |     0.26 |             -1 |       -1
| - layout_raw                              |              1 |     0.13 |             -1 |       -1
| Codegen                                   |             30 |     3.89 |             -1 |       -1
| - {time spent not running queries}        |             17 |     2.20 |             -1 |       -1
| - collect_and_partition_mono_items        |             11 |     1.42 |             -1 |       -1
| - mir_const                               |              1 |     0.13 |             -1 |       -1
| - mir_validated                           |              1 |     0.13 |             -1 |       -1
| Expansion                                 |             17 |     2.20 |             -1 |       -1
| - {time spent not running queries}        |             17 |     2.20 |             -1 |       -1
| BorrowChecking                            |              1 |     0.13 |             -1 |       -1
| - borrowck                                |              1 |     0.13 |             -1 |       -1
| Parsing                                   |              0 |     0.00 |             -1 |       -1

Optimization level: No
Incremental: off
mw (Jan 31 2019 at 10:31, on Zulip):

Yeah, I think that's a good solution. There is no clear chronological order to categories anyway.

nikomatsakis (Jan 31 2019 at 14:26, on Zulip):

@Wesley Wiser are you currently capturing stack trace information btw?

nikomatsakis (Jan 31 2019 at 14:26, on Zulip):

(Just curious, this is awesome work regardless! :tada:)

nikomatsakis (Jan 31 2019 at 14:26, on Zulip):

I'm wondering if we can use this to gather some data on some benchmarks before all hands

blitzerr (Jan 31 2019 at 14:37, on Zulip):

This is really awesome @Wesley Wiser

Wesley Wiser (Jan 31 2019 at 14:37, on Zulip):

Thanks!

Wesley Wiser (Jan 31 2019 at 14:39, on Zulip):

@nikomatsakis Not currently but I don't think it would be difficult to add that. I'm not sure how to aggregate and display the stack trace data though.

Do you just want them dumped to a file or something? Or do we want to do some kind of automated deeper analysis?

blitzerr (Jan 31 2019 at 14:42, on Zulip):

@Wesley Wiser
Time not spent running queries what does that indicate ?

Wesley Wiser (Jan 31 2019 at 14:43, on Zulip):

@blitzerr So, for example, in Codegen, we have some queries which are tagged as being codegen related. But we also just a general timer that runs when we call into LLVM. The "Time not spent running queries" represents that time spent in LLVM which isn't covered by a query.

blitzerr (Jan 31 2019 at 14:44, on Zulip):

Is there a PR I can look at ?

blitzerr (Jan 31 2019 at 14:44, on Zulip):

Ahh! Makes sense. thank you

Wesley Wiser (Jan 31 2019 at 14:45, on Zulip):

Not yet. It's all changes I have locally that I haven't pushed yet because it regresses the cache hits/misses data and the json output.

Wesley Wiser (Jan 31 2019 at 14:45, on Zulip):

I'm hoping to have a PR up in the next day or two

blitzerr (Jan 31 2019 at 14:45, on Zulip):

So this profiling is for query based compilation ? I think, at this point not all compiler stages are query based. So we don't have profiling information for that ?

Wesley Wiser (Jan 31 2019 at 14:46, on Zulip):

We start a timer at the beginning of compilation in the "Other" category. So we should have coverage of 99% of time that rustc runs. It just won't be categorized super helpfully.

nikomatsakis (Jan 31 2019 at 14:51, on Zulip):

I'm not sure how to aggregate and display the stack trace data though.

I wouldn't try. I would want an option that dumps out all the data in some format for others to piece through.

nikomatsakis (Jan 31 2019 at 14:51, on Zulip):

Basically, we can provide the "flat data" in a convenient way, but if you want to do more advanced queries, we can provide a tool for that

nikomatsakis (Jan 31 2019 at 14:51, on Zulip):

e.g., I'd probably adapt something like perf-focus

nikomatsakis (Jan 31 2019 at 14:52, on Zulip):

(Also, depending how huge that data is, we could maybe even publish it from perf, which would be amazing -- we might need to adopt some compression techniques, though)

Wesley Wiser (Jan 31 2019 at 14:54, on Zulip):

(Also, depending how huge that data is, we could maybe even publish it from perf, which would be amazing -- we might need to adopt some compression techniques, though)

perf the Linux tool or perf.r-l.o?

nikomatsakis (Jan 31 2019 at 14:56, on Zulip):

I meant publish it from http://perf.rust-lang.org/

mw (Jan 31 2019 at 14:56, on Zulip):

perf.rlo I'd guess

mw (Jan 31 2019 at 14:57, on Zulip):

we'd have to make sure though that profiling doesn't add too much overhead

Wesley Wiser (Jan 31 2019 at 14:57, on Zulip):

Would plain text files in a folder structure be ok? Something like perf_data/{category_name}/{query_name}/stack_traces/{timestamp}.txt

Wesley Wiser (Jan 31 2019 at 14:58, on Zulip):

I guess at that point there could be a perf_data/log.txt at the root which has the raw data in it with timestamps like @mw wanted

mw (Jan 31 2019 at 15:01, on Zulip):

I think that there aren't many restrictions on how the data is written to disk because there'll be a post-processing step outside the compiler anyway

nikomatsakis (Jan 31 2019 at 15:01, on Zulip):

@Wesley Wiser I would prefer one big file I think

nikomatsakis (Jan 31 2019 at 15:01, on Zulip):

but I guess whatever is fine

nikomatsakis (Jan 31 2019 at 15:02, on Zulip):

maybe it's better to have a directory; seems a bit harder to load up

Wesley Wiser (Jan 31 2019 at 15:03, on Zulip):

I don't feel strongly either way. Just want to get it to be whatever you find useful :)

mw (Jan 31 2019 at 15:05, on Zulip):

maybe allowing for one file per thread would make the implementation simpler

nikomatsakis (Jan 31 2019 at 15:05, on Zulip):

yeah I mean ultimately anything is fine

nikomatsakis (Jan 31 2019 at 15:05, on Zulip):

so i'd sort of be guided by the impl

nikomatsakis (Jan 31 2019 at 15:05, on Zulip):

I'm semi-inclined not to do a JSON-representation

nikomatsakis (Jan 31 2019 at 15:06, on Zulip):

because that will require serde etc etc, and I found with perf-focus that time to construct the final data structure is kind of important

nikomatsakis (Jan 31 2019 at 15:06, on Zulip):

so something that let me just charge through line-by-line and push onto some vectors was good

nikomatsakis (Jan 31 2019 at 15:06, on Zulip):

but then against JSON is kind of nice :P and serde is of course fast, it's more that then you have this intermediate thing you have to convert from. There are probably ways to do this in a streaming fashion with serde, I don't really know

mw (Jan 31 2019 at 15:06, on Zulip):

yeah, having some like JSON for something simple like this is probably overkill

Wesley Wiser (Jan 31 2019 at 15:08, on Zulip):

Ok. I'll try the "one large file" approach for now and we can see how that feels

Wesley Wiser (Jan 31 2019 at 15:08, on Zulip):

I'm trying to a PR with these changes up for review before I leave on Saturday for All Hands

nikomatsakis (Jan 31 2019 at 15:09, on Zulip):

(random question, and definitely not for the first pass: I wonder if we can easily emit a gzip'd version in the end, and then easily gunzip when loading; I imagine there will be a lot of repetition. Presumably we have some rust bindings to some such compression library.)

mw (Jan 31 2019 at 15:10, on Zulip):

is that something the compiler would need to do?

Wesley Wiser (Jan 31 2019 at 15:11, on Zulip):

Seems possible https://crates.io/crates/flate2

QuietMisdreavus (Jan 31 2019 at 15:11, on Zulip):

worth noting, i dunno how many files you were thinking of creating if you didn't go with the "one huge file" approach, but emitting loads of files is itself the biggest performance bottleneck in rustdoc

QuietMisdreavus (Jan 31 2019 at 15:12, on Zulip):

so it may slow down the collection process to have a lot of little files

nikomatsakis (Jan 31 2019 at 15:13, on Zulip):

is that something the compiler would need to do?

not necessarily

Wesley Wiser (Jan 31 2019 at 15:14, on Zulip):

@QuietMisdreavus That's a good point. Right now this all runs at the very end of compilation so there shouldn't be IO performance issues during collection. Perhaps it won't be feasible to continue capturing everything into memory directly while getting stack traces though...

Last update: Nov 20 2019 at 01:30UTC