Stream: t-compiler/wg-rls-2.0

Topic: profiling rust-analyzer

michalt (Jan 03 2020 at 12:26, on Zulip):

@matklad I've had a look at profiles of rust-analyzer (when adding just whitespace) and I have a hypothesis wrt. the unaccounted time that we saw (or at least some of it). I looked at Module::from_file that regularly takes >60ms and it seems that this is coming from db.crate_def_map even though I haven't seen crate_def_map_query itself being called. I'm wondering if this time could be spent in salsa -- trying to figure out if something needs to be recomputed or not? (and then deciding that it doesn't, so in the end create_def_map_query is never called)
My experiment:
If my understanding is correct then it seems that this is a bit of a blind spot for how we currently use ra_prof -- we don't know if some unaccounted recomputation is taking long or if the time is spent in salsa. We could potentially add a bunch of more profile around calls to db, but it does feel that this should belong to salsa itself, which has all the information needed to account the time to the right place.

What do you think? Perhaps I'm misunderstanding something? (also do you know whether salsa has some similar profiling mechanism that we could use?)

matklad (Jan 03 2020 at 12:30, on Zulip):

Yeah, that does sound like a resaonable hypothesis. Specifically, salsa might spend quite some time validating that the answer has not changed, without actually recomputing the function. I think the best way to test this hypothesis is to point rust-analyzer to a custom build of salsa (using [patch] section in Cargo.toml) and adding profile calls to salsa itself.

Salsa checks happen here:

matklad (Jan 03 2020 at 12:33, on Zulip):

A further steps here would be:

michalt (Jan 03 2020 at 12:38, on Zulip):

Cool, sounds like a plan! Thanks for the pointers re using a custom salsa build!

matklad (Jan 03 2020 at 12:38, on Zulip):

Oh, one useful thing to do would be merge the changes in your gist. I think we can do this nicely in this way:

matklad (Jan 03 2020 at 12:39, on Zulip):

That way, we'll see somthing like

crate_def_map 9000
  compute_crate_def_map 9000


crate_def_map 600
// nothing here, we know it's validation!
michalt (Jan 03 2020 at 12:40, on Zulip):

Oh, nice! I was thinking how we could do this nicely, but didn't know about salsa::transparent.

Florian Diebold (Jan 03 2020 at 12:57, on Zulip):

crate_def_map probably does have a huge amount of dependencies, doesn't it

matklad (Jan 03 2020 at 13:04, on Zulip):

I wouldn't expect a huge amount: most deps should be const, so we should be left with a raw_items per file

michalt (Jan 03 2020 at 13:08, on Zulip):

Here's what I came up with: -- do let me know if I misunderstood your suggestion! :)

michalt (Jan 03 2020 at 18:32, on Zulip):

I've added some logging to salsa: and it does seem that it's mostly validation. But it's not trivial to understand if some particular thing is expensive or if it's just the sheer number of transitive dependencies. (LMK if you have any suggestions/suspicions) Perhaps salsa should also collect some profiling stats itself?
Anyway, I'll try to do some profiling with perf over the weekend.

matklad (Jan 03 2020 at 19:10, on Zulip):

or if it's just the sheer number of transitive dependencies.

A good thing would be to check how many dependencies are there. As I've mentioned above, I wouldn't expect that many dependencies for a crate_def_map query

Laurențiu Nicola (Jan 03 2020 at 19:26, on Zulip):

Would it be worth increasing RA_LRU_CAP to be sure that's not causing it?

matklad (Jan 03 2020 at 19:28, on Zulip):

We might try to do this to rule it out for sure, but I don't think its an issue: otherwise, we'd see a number of lru queries in the profile

michalt (Jan 05 2020 at 18:32, on Zulip):

I made some progress with understanding what is taking so much time and it seems that the graph is much larger than I expected. It seems that the single ComputeCrateDefMapQuery triggers checking about 62k memoized values in salsa:
We process about ~1.5k FileIds (even though rust-analyzer has less than 600 Rust files..?) and ~19k MacroFiles (!!!), which seem to be the main culprit here (we end up with roughly ~20k checks for each of AstIdMapQuery, RawItemsQuery and MacroArgQuery)
I have no idea why we have so manyFileIds compared to the actual Rust files in rust analyzer. Regarding MacroFile, it seems that we create a separate MacroFile per macro call -- is that correct? What is the reason for this?
Anyway, please do have a look if I missed/misunderstood something!

matklad (Jan 05 2020 at 18:42, on Zulip):

we create a separate MacroFile per macro call -- is that correct? What is the reason for this?

Yes, each macro call creates a virtual file. We use this setup to allow incremental macro exansion, so that in

top_level_macro_call! {
  fn foo() { body }

changes to the body do not invalidate the raw items for the containing file

matklad (Jan 05 2020 at 18:43, on Zulip):

And yes, I would expact more macro files than real files, but the numbers you show do seem excessive. Especially given that there are fewer real files at all :D

matklad (Jan 05 2020 at 18:43, on Zulip):

I guess with this logging the next logical step is to minize example and to see why we get such amplification

matklad (Jan 05 2020 at 18:44, on Zulip):

You can use featureFlags: { "withSysroot": false } flag to exclude standard library, which should help with minimization

michalt (Jan 05 2020 at 19:23, on Zulip):

Quick question: what happens for nested macros (a macro expands into another macro call) -- do we create 1 or more such virtual files?

matklad (Jan 05 2020 at 19:25, on Zulip):


matklad (Jan 05 2020 at 19:25, on Zulip):

So, a recursive macro can indeed create a deep tower of files

michalt (Jan 05 2020 at 19:28, on Zulip):

Oh interesting!

michalt (Jan 05 2020 at 19:30, on Zulip):

I'll play around with some small crate (as you suggested) and see what we can learn. Thanks!

matklad (Jan 06 2020 at 21:46, on Zulip):

For the record, the result of michalt 's investigation is this this epic win of a PR:

Last update: Jan 21 2020 at 09:10UTC