Stream: t-compiler/rust-analyzer

Topic: Is this correct?


kev (Dec 22 2020 at 15:57, on Zulip):

Hi,

Ignoring the use of macros, I was wondering if the following is a correct understanding of the repo.

CrateGraph structure

Creation : Created manually usually from cargo information. 



Usage: Shows the dependencies between crates and metadata such as crate edition, crate config flags, the root file for the crate (lib.rs/main.rs) usually


CrateDefMap structure


Creation: This is used quite a lot in r-a and to be created it needs to go over the whole source file.


Usage: Shows the definitions with a crate. For a given crate, this will essentially be the modules. There is also an extern prelude which I believe is for connecting external crates together. This is different to the prelude which is also a field in this structure.



Def Collector structure

Creation: This is created along with the CrateDefMap creation.



Usage: In order to populate the crate def map, we need to collect the definitions. This is what this data structure does with the collect_defs method.


collect_defs method

Usage : As mentioned above, this is used to create a DefCollector and collect definitions.

Notes: This method will first fetch the crate graph, and get the dependencies for the crate you are trying to fetch definitions for. For each dependency, you will create a CrateDefMap. This means that this is a recursive call. Luckily, the CrateGraph prevents cyclic dependencies between crates.

After collecting the CrateDefMap for the dependency, the dependency name and it’s root moduleID is added to the CrateDefMap of the crate in question. It is added to the extern prelude field. This is how this crate will have access to definitions in an external crate.

Of interest, at the end of the method, a DefCollector structure is instantiated with the def_map of the crate in question. This DefMap now has the DefMap of all of its dependencies indirectly.

def_collector.collect() is then called.


DefCollector::collect method

usage : As mentioned above, the main job of the DefCollector is to collect Definitions in a crate. We could not do this if we did not collect the definitions of it’s dependencies first. We have done this, so now we have all of the information needed.



Note:

The first thing we do is we create an ItemTree from the root_file_id. In most cases the root_file_id will be (lib.rs/main.rs).



Now since a crate is just a bunch of modules, we give the ItemTree to a ModCollector data structure, which collects all of the definitions within the Module using the item tree for that root file. If the root module declares another module ie mod foo, then we also recursively collect that module in this procedure



We then need to do import resolution, ie the use statements. We do not do Name Resolution here it seems.


ModCollector::collect method



usage: Collects all definitions within a module.



Note:

This holds a mut ref to the def collector, because whenever we see a use statement here, we simply add it to the list of unresolved imports and resolve those after. See DefCollector::collect where I mention import resolution

kev (Dec 22 2020 at 15:59, on Zulip):

not sure who to ping, maybe @matklad ?

kev (Dec 22 2020 at 16:12, on Zulip):

If any of this is correct, I'm wondering also:

let k = a + 1; // a is not defined

The part which gives the above error for example. Or are definitions in the CrateDefMap not necessarily resolved lexically yet?

matklad (Dec 22 2020 at 16:21, on Zulip):

This is correct in broad strokes, yeah @kev

matklad (Dec 22 2020 at 16:21, on Zulip):

Btw, if you intend to pursue this investigation furhter, it woudl be really helpful if you can write this all down and submit as a pull reqeust to the architercture.md document :-)

matklad (Dec 22 2020 at 16:22, on Zulip):

Couple of clarification:

DefCollector is only a temporary, ephemeral data structure, and its sole goal is producing the CrateDefMap

matklad (Dec 22 2020 at 16:23, on Zulip):

After DefMap is ready, the collector is gone.

matklad (Dec 22 2020 at 16:24, on Zulip):

Important point about CrateGraph is that it allows several logical crates share the same source file. Ie, you can have CrateId(0) and CrateId(1) both point to src/lib.rs, with the difference that 0 is --cfg=test, and 1 is not

matklad (Dec 22 2020 at 16:25, on Zulip):

Variable resolution happens here: https://github.com/rust-analyzer/rust-analyzer/blob/b28322eed6b977eb566444e8edbb229f4f610cd6/crates/hir_def/src/body/scope.rs

matklad (Dec 22 2020 at 16:26, on Zulip):

TYpe-dependent resolution x.meth() happens during the rest of the type inference, over here: https://github.com/rust-analyzer/rust-analyzer/blob/b28322eed6b977eb566444e8edbb229f4f610cd6/crates/hir_ty/src/infer/expr.rs#L835-L840

matklad (Dec 22 2020 at 16:26, on Zulip):

As for the ItemTree, in the broad strokes its sole purpose is performance optimization

kev (Dec 22 2020 at 16:27, on Zulip):

matklad said:

Btw, if you intend to pursue this investigation furhter, it woudl be really helpful if you can write this all down and submit as a pull reqeust to the architercture.md document :-)

Yep!

matklad (Dec 22 2020 at 16:28, on Zulip):

It would be totally possible to construct CrateDefMap directly from SyntaxNodes, bypassing the item tree.

We, however, insert an ItemTree as a mediator there. ItemTree is build per-file (so, it doesn't know about modules, cfg flags or crates). It's core property is that, if you edit code inside a function's body, the item tree for the corresponing file stays the same.

kev (Dec 22 2020 at 16:29, on Zulip):

matklad said:

Important point about CrateGraph is that it allows several logical crates share the same source file. Ie, you can have CrateId(0) and CrateId(1) both point to src/lib.rs, with the difference that 0 is --cfg=test, and 1 is not

Ah right, I noticed that from the tests, where the same FileID was used multiple times

matklad (Dec 22 2020 at 16:29, on Zulip):

That means that, when you edit inside functions (so, most of the time), we can avoid re-computing name resolution, because name resolution looks only at the ItemTrees, and not at the raw text/sytnax trees

kev (Dec 22 2020 at 16:31, on Zulip):

matklad said:

Variable resolution happens here: https://github.com/rust-analyzer/rust-analyzer/blob/b28322eed6b977eb566444e8edbb229f4f610cd6/crates/hir_def/src/body/scope.rs

If I have an item tree, can I be sure that variables have been resolved?

kev (Dec 22 2020 at 16:33, on Zulip):

matklad said:

It would be totally possible to construct CrateDefMap directly from SyntaxNodes, bypassing the item tree.

We, however, insert an ItemTree as a mediator there. ItemTree is build per-file (so, it doesn't know about modules, cfg flags or crates). It's core property is that, if you edit code inside a function's body, the item tree for the corresponing file stays the same.

Oh this makes sense to me, as you pass the ItemTree to the ModuleCollector which then iterates the items and resolves the module declarations

Jonas Schievink [he/him] (Dec 22 2020 at 16:36, on Zulip):

No names are resolved in the item tree

Jonas Schievink [he/him] (Dec 22 2020 at 16:36, on Zulip):

It's a simplified view of the syntax tree

matklad (Dec 22 2020 at 16:37, on Zulip):

If I have an item tree, can I be sure that variables have been resolved?

The opposite actually: item tree simply doesn't contain information about local variables. This is why it doesn't change when you add new local variable!

matklad (Dec 22 2020 at 16:38, on Zulip):

that's the whole trick behind bening incremental -- you try to actively ignore and avoid knowing more than you stricktly need :)

kev (Dec 22 2020 at 16:38, on Zulip):

Jonas Schievink said:

No names are resolved in the item tree

Not sure why I keep erroring on this fact :(

kev (Dec 22 2020 at 16:40, on Zulip):

matklad said:

If I have an item tree, can I be sure that variables have been resolved?

The opposite actually: item tree simply doesn't contain information about local variables. This is why it doesn't change when you add new local variable!

So If I have

mod foo;
mod bar;

const K : &str = "hello";

fn hello() {

let j = 1u64;
}

The ItemTree would only contain the information about the module declarations, the hello function, and the constant K ?

Jonas Schievink [he/him] (Dec 22 2020 at 16:41, on Zulip):

yeah, and it only knows about item signatures, not bodies

kev (Dec 22 2020 at 16:43, on Zulip):

Jonas Schievink said:

yeah, and it only knows about item signatures, not bodies

Oh so it only know that there is a constant K of type &Str.

If I were to connect what you and matklad are saying, when you say "it only knows about", do you mean it has a stable ID to the definition, but it does not know the contents?

kev (Dec 22 2020 at 16:45, on Zulip):

So even if

     const K : &str = "hello";

changes to

     const K : &str = "world";

This is fine since, what was being stored was:

Item {
   constant : K
  type : &str,
  body_id : 2
}

And body_id : 2 does not change between compilations

Jonas Schievink [he/him] (Dec 22 2020 at 16:46, on Zulip):

yes, see: https://rust-analyzer.github.io/rust-analyzer/hir_def/item_tree/struct.Const.html

kev (Dec 22 2020 at 16:47, on Zulip):

Jonas Schievink said:

yes, see: https://rust-analyzer.github.io/rust-analyzer/hir_def/item_tree/struct.Const.html

Oh thanks, it points to an AST_ID so if I were to move const K down the page, then it would recompile since it is a different AST Node number?

Jonas Schievink [he/him] (Dec 22 2020 at 16:48, on Zulip):

yeah, if you move it past another item

kev (Dec 22 2020 at 16:50, on Zulip):

Oh interesting, that the span changing does not cause it to recompile

kev (Dec 22 2020 at 16:51, on Zulip):

I guess since span is only really needed for error reporting, it is something you can put in a side table or recompute from the absolute width of each node starting from the top...

Thanks for correcting by the way :) matklad and Jonas

Laurențiu (Dec 29 2020 at 17:05, on Zulip):

So ItemTreeData has an exprs field that's not used, what is it supposed to do? Can I remove it?

Jonas Schievink [he/him] (Dec 29 2020 at 17:10, on Zulip):

For now, yeah

Jonas Schievink [he/him] (Dec 29 2020 at 17:10, on Zulip):

I think it was supposed to be used for array lengths

Jonas Schievink [he/him] (Dec 29 2020 at 17:11, on Zulip):

Are you looking into ItemTree memory usage?

Laurențiu (Dec 29 2020 at 17:15, on Zulip):

I tried to, but couldn't find much. I'm not even sure where the memory goes. The query shows 315MB on RA, but if I duplicate all of the ItemTree members (the method from https://rust-analyzer.github.io/blog/2020/12/04/measuring-memory-usage-in-rust.html#amdahls-estimator), it only grows to 445 MB. 430 MB when duplicating only ItemTreeData.

Jonas Schievink [he/him] (Dec 29 2020 at 17:17, on Zulip):

Are you cloning the entire thing?

Laurențiu (Dec 29 2020 at 17:19, on Zulip):

Yes, I added fields with copies of the original ones. But I'm sure I'm missing something.

Jonas Schievink [he/him] (Dec 29 2020 at 17:19, on Zulip):

Hmm, interesting, so that's 200 MB of unaccounted memory

bjorn3 (Dec 29 2020 at 17:19, on Zulip):

Does it contain an Arc or similar?

Jonas Schievink [he/him] (Dec 29 2020 at 17:20, on Zulip):

Shame that there's no good memory profiler, would remove a lot of the guesswork

Laurențiu (Dec 29 2020 at 17:21, on Zulip):

I'm tempted to dump the whole thing to disk, in case something catches my eye -- if I could only figure out how to.

Laurențiu (Dec 29 2020 at 17:27, on Zulip):

I also think we're not calling shirk_to_fit on those HashMaps, but adding them didn't work or didn't do anything

Jonas Schievink [he/him] (Dec 29 2020 at 19:26, on Zulip):

One thing that would be interesting to investigate is to skip the item tree if a macro-generated file consists only of a single macro invocation (something that happens often with self-invoking macros)

Jonas Schievink [he/him] (Dec 29 2020 at 19:27, on Zulip):

I've taken some measurements a while ago that indicated that roughly half of all itemtrees come from macros. Much fewer than I expected, but it could still be worth it.

Laurențiu (Dec 29 2020 at 19:28, on Zulip):

That's a lot of macros

Jonas Schievink [he/him] (Dec 29 2020 at 19:29, on Zulip):

Just tried the profiling thing again and neither Massif nor DHAT nor heaptrack are usable on r-a

Laurențiu (Dec 29 2020 at 19:30, on Zulip):

[Disabling MBE] seems to speed up parsing by some 45%

Forgot about that one :smile: But that's mostly unrelated to the memory usage.

Laurențiu (Dec 29 2020 at 19:31, on Zulip):

I wonder if profiling would work better on some no_std code

Dirkjan Ochtman (Dec 29 2020 at 19:54, on Zulip):

Maybe try nnethercote's Rusty DHAT thing?

Laurențiu (Dec 29 2020 at 19:56, on Zulip):

Dirkjan Ochtman said:

Maybe try nnethercote's Rusty DHAT thing?

https://github.com/rust-analyzer/rust-analyzer/pull/6831

Dirkjan Ochtman (Dec 29 2020 at 21:34, on Zulip):

fair enough

Jonas Schievink [he/him] (Jan 03 2021 at 19:23, on Zulip):

22975 ItemTrees, 20519 from macro expansion, 2456 from normal files

Oops, looks like I was grossly misremembering the numbers

Jonas Schievink [he/him] (Jan 03 2021 at 19:23, on Zulip):

Maybe last time I looked at memory usage, not count

matklad (Jan 04 2021 at 16:05, on Zulip):

oh wow

matklad (Jan 04 2021 at 16:06, on Zulip):

Hm, I wish we could do something about expansions...

matklad (Jan 04 2021 at 16:07, on Zulip):

Like, "there are few files" is an important optimization tool, and macros break that

Edwin Cheng (Jan 06 2021 at 13:54, on Zulip):

Some stats about mbe: https://github.com/rust-analyzer/rust-analyzer/issues/5549#issuecomment-755311449

Jonas Schievink [he/him] (Jan 18 2021 at 13:35, on Zulip):

On nrf-hal, with RSS of about 1.5 GB:

Per-query memory usage:
   471mb ItemTreeQuery
    76mb ImplDataQuery
    52mb HygieneFrameQuery

Yesterday I saw r-a using a total of 3.5 GB, even (but the memory usage reported there was broken)

Jonas Schievink [he/him] (Jan 18 2021 at 13:35, on Zulip):

Perhaps the most realistic way to reduce this is to start doing periodic GC again?

matklad (Jan 18 2021 at 13:38, on Zulip):

Seems reasonable!

matklad (Jan 18 2021 at 13:38, on Zulip):

But also ItemTree feels like it should be smaller perhaps...

Laurențiu (Jan 26 2021 at 09:18, on Zulip):

Do we ever rebuild the item tree?

matklad (Jan 26 2021 at 09:33, on Zulip):

when the file itself chages

Laurențiu (Jan 26 2021 at 09:36, on Zulip):

Oh, then we don't indefinitely accumulate old items

matklad (Jan 26 2021 at 09:38, on Zulip):

Might still be good to observe the counts...

Jonas Schievink [he/him] (Feb 04 2021 at 18:25, on Zulip):

I've stared at DHAT logs and saw a lot of path lowering, so this reduces memory usage of ItemTreeQuery by ~20% https://github.com/rust-analyzer/rust-analyzer/pull/7557

Laurențiu (Feb 04 2021 at 18:46, on Zulip):

Should we also add Counts to these?

Jonas Schievink [he/him] (Feb 04 2021 at 19:00, on Zulip):

we could, though that would make them even larger

Laurențiu (Feb 04 2021 at 19:01, on Zulip):

Count is a ZST, I think

Laurențiu (Feb 04 2021 at 19:01, on Zulip):

https://github.com/matklad/countme/blob/master/src/lib.rs#L70

Jonas Schievink [he/him] (Feb 04 2021 at 19:02, on Zulip):

oh, right, it's a global hashmap

Laurențiu (Feb 05 2021 at 12:37, on Zulip):
hir_def::item_tree::Const               12_225       12_225       12_225
hir_def::item_tree::Enum                 1_360        1_360        1_360
hir_def::item_tree::ExternCrate            804          456          455
hir_def::item_tree::Field               14_995       14_995       14_995
hir_def::item_tree::Function            64_000       64_000       64_000
hir_def::item_tree::Impl                35_714       35_714       35_714
hir_def::item_tree::Import              26_796       26_796       26_796
hir_def::item_tree::ItemTree            46_191       46_191       46_191
hir_def::item_tree::MacroCall           63_149       63_149       63_149
hir_def::item_tree::MacroDef                20           20           20
hir_def::item_tree::MacroRules           1_057        1_057        1_057
hir_def::item_tree::Mod                  3_710        3_710        3_710
hir_def::item_tree::Static                 864          864          864
hir_def::item_tree::Struct               5_252        5_252        5_252
hir_def::item_tree::Trait                  870          870          870
hir_def::item_tree::TypeAlias            7_746        7_746        7_746
hir_def::item_tree::Union                   11           11           11
hir_def::item_tree::Variant              5_666        5_666        5_666
Laurențiu (Feb 05 2021 at 12:39, on Zulip):

Well, that doesn't seem too useful

matklad (Feb 05 2021 at 12:41, on Zulip):

huh?

Laurențiu (Feb 05 2021 at 12:42, on Zulip):

I asked about adding counts to the item tree structs

matklad (Feb 05 2021 at 12:42, on Zulip):

why there are so many consts? I'd expect structs to be more frequent

Laurențiu (Feb 05 2021 at 12:42, on Zulip):

910 in libstd

Laurențiu (Feb 05 2021 at 12:42, on Zulip):
hir_def::item_tree::Const                  910          910          910
hir_def::item_tree::Enum                   211          211          211
hir_def::item_tree::ExternCrate             10            8            7
hir_def::item_tree::Field                1_327        1_327        1_327
hir_def::item_tree::Function            14_233       14_233       14_233
hir_def::item_tree::Impl                 9_390        9_390        9_390
hir_def::item_tree::Import               3_192        3_192        3_192
hir_def::item_tree::ItemTree             4_764        4_764        4_764
hir_def::item_tree::MacroCall            8_205        8_205        8_205
hir_def::item_tree::MacroDef                20           20           20
hir_def::item_tree::MacroRules             228          228          228
hir_def::item_tree::Mod                    606          606          606
hir_def::item_tree::Static                  66           66           66
hir_def::item_tree::Struct                 639          639          639
hir_def::item_tree::Trait                  203          203          203
hir_def::item_tree::TypeAlias            3_092        3_092        3_092
hir_def::item_tree::Union                    3            3            3
hir_def::item_tree::Variant                866          866          866
matklad (Feb 05 2021 at 12:43, on Zulip):

Be a rebel, use a Union XD

Last update: Jul 28 2021 at 03:00UTC