Stream: t-compiler/wg-rls-2.0

Topic: Memory Profiling


matklad (Jun 02 2019 at 09:38, on Zulip):

One of the major problems with rust-analyzer is high memory consumption (unlike rustc, rust-analyzer needs to work with the whole workspace).

I wonder if we can improve visibility into what takes memory?

Currently, analyzer status shows the total heap size (using jemalloc).

I think breakdown by the type of heap object in memory would be extremely useful. However, I don't know how can we get those stats?

Does anyone have an idea?

Florian Diebold (Jun 02 2019 at 09:52, on Zulip):

I've been intending to try out https://github.com/nokia/memory-profiler with RA

matklad (Jun 02 2019 at 09:54, on Zulip):

Looks interesting! I've tried using heaptrack, but it wasn't too useful. I think both heaptrack and memory-profiler analyze allocations by call-stack, and what I really want, I thinks, is by object type.

matklad (Jun 02 2019 at 09:55, on Zulip):

Ideally, this should also work in the analysis-status page...

Florian Diebold (Jun 02 2019 at 09:55, on Zulip):

as for actually tracking counts of objects... just a random idea, but maybe it'd be possible to have a zero-sized AllocationTracker struct that increments a counter on creation/clone and decrements it on drop? which you'd then put into types you want to track :thinking:

Florian Diebold (Jun 02 2019 at 09:56, on Zulip):

the counter would have to be global by tracked type somehow

matklad (Jun 02 2019 at 09:56, on Zulip):

Yeah, might be a good tool to have!

matklad (Jun 02 2019 at 09:57, on Zulip):

I'll also try serveo::heap_size_of. This seems like the right approach overall, but I am not sure if it works in practice

matklad (Jun 02 2019 at 10:04, on Zulip):

Perhaps we should just bite the bullet and add #[derive(DeepSizeOf)] to each type in ra https://docs.rs/deepsize/0.1.1/deepsize/index.html ?

Florian Diebold (Jun 02 2019 at 10:25, on Zulip):

it's worth a try

matklad (Jun 02 2019 at 15:27, on Zulip):

I think I've found semi-effective strategy to figure out where the memory is

matklad (Jun 02 2019 at 15:27, on Zulip):

pasted image

matklad (Jun 02 2019 at 15:42, on Zulip):

So, looks like macro expansion is by far the largest contributor to the current memory usage of rust-analyzer

matklad (Jun 02 2019 at 15:42, on Zulip):
        self.query(hir::db::MacroDefQuery).sweep(sweep);
        self.query(hir::db::MacroArgQuery).sweep(sweep);
        self.query(hir::db::MacroExpandQuery).sweep(sweep);
        self.query(hir::db::ParseMacroQuery).sweep(sweep);

These four queries consume about 1GB for me, from about total 1.5 GB allocated

matklad (Jun 02 2019 at 15:47, on Zulip):

cc @Edwin Cheng I think the biggest contributor here is the syntax trees, and I am working on collecting them better.

However, other queries occupy quite a bit of space due to tt::Subtree as well. I wonder

Edwin Cheng (Jun 02 2019 at 15:52, on Zulip):

First, maybe it is because the current TokenBuffer struct is not optimized, it is basically double up the memory usage but it should be temporarily and not affect the salsa storage. (We still use TokenTree to store it)

Edwin Cheng (Jun 02 2019 at 15:56, on Zulip):

if we have some bug in macro expansion that produces expanded code much bigger that it should be (we have that limit on maximum size of expanded code, could it be the case that we hit it sometimes?

You can enable RUST_LOG=ra_hir=WARN to see if there is any warning there but i am doubt it is the case.

Edwin Cheng (Jun 02 2019 at 17:16, on Zulip):

I wonder if, for storing macro args in salsa, we should use more compact representation than a real tree of tokens? Perhaps we should allocate the three into a flat array, or allocate the nodes in the arena?

Should we intern Ident and Literal token in general ?

matklad (Jun 02 2019 at 17:17, on Zulip):

I don't think so: they store SmolStr inside, and it doesn't allocate for short strings

Edwin Cheng (Jun 02 2019 at 17:21, on Zulip):

But we are using Vec<TokenTree> to store the leaf of Subtree. so it should be still allocated, right ?? And I can imagine there a tons of duplication tokens which are keywords and identifiers in token tree.

matklad (Jun 02 2019 at 17:23, on Zulip):

Hm, I think, for something like struct Foo;, all three tokens would be in a single Vec?

matklad (Jun 02 2019 at 17:23, on Zulip):

That is, that Vec allocation is shared among all leaves, so it's not that bad

matklad (Jun 02 2019 at 17:23, on Zulip):

but getting rid of that Vec would definitely be sweet!

Edwin Cheng (Jun 02 2019 at 17:24, on Zulip):

Do you want to try a smallVec version first ?

matklad (Jun 02 2019 at 17:24, on Zulip):

Hm, yeah, smallvec could help here.

matklad (Jun 02 2019 at 17:25, on Zulip):

It would be interesting to get statistics about how long are subtrees ingeneral

matklad (Jun 02 2019 at 17:25, on Zulip):

but, long term, arena-based solution seems best to me

matklad (Jun 02 2019 at 17:25, on Zulip):

Or maybe not :-)

matklad (Jun 02 2019 at 17:26, on Zulip):

I imagine, for macro-by-example situations where you can paste a part of input stream into the output stream, you could take advantage of structural sharing

matklad (Jun 02 2019 at 17:26, on Zulip):

in this case, Art-tree based structure would be more benefitial

matklad (Jun 02 2019 at 17:27, on Zulip):

It'll also work better for recursive macros, which process a bit of $($tt:tt)* at a time

Edwin Cheng (Jun 02 2019 at 17:32, on Zulip):

Thats true. And if i understand correctly , rustc libprocmacro token handling is handle based:
https://github.com/rust-lang/rust/blob/dc6db14e1cd60012f25be4fd8d2eb96ea5b4bb68/src/libproc_macro/bridge/handle.rs

Edwin Cheng (Jun 02 2019 at 17:33, on Zulip):

a Group is defined as struct Group(Handle)

Edwin Cheng (Jun 02 2019 at 17:45, on Zulip):

I just tried to use smallvec instead, but it fail in our case because I forget it is recursive defined :)

Laurențiu Nicola (Jun 03 2019 at 09:46, on Zulip):

I'll also try serveo::heap_size_of. This seems like the right approach overall, but I am not sure if it works in practice

I think it can, work, it's the same approach that Firefox uses: https://blog.mozilla.org/nnethercote/2015/06/03/measuring-data-structure-sizes-firefox-c-vs-servo-rust/

Laurențiu Nicola (Jun 03 2019 at 09:46, on Zulip):

(there should be some older blog posts about that, too)

matklad (Jun 03 2019 at 17:03, on Zulip):

@Laurențiu Nicola thanks for that link! That makes me think that we need to ping @nnethercote :)

@nnethercote we'd love implement something like about:memory for rust-analyzer. Unlike rustc, rust-analyzer is a long-lived process, so we need to get current memory usage of various structs, and not just "after-the-process exited" usage.

I am somewhat hesitant to use the heapsize crate:

Do you think heapsize is a safe bet, long-term?

matklad (Jun 03 2019 at 17:03, on Zulip):

(re-ping @nnethercote b/c matklad can't zullip)

matklad (Jun 03 2019 at 17:09, on Zulip):

Ah, I think https://github.com/servo/servo/tree/master/components/malloc_size_of is what I probably want, but it's in servo :-(

Laurențiu Nicola (Jun 03 2019 at 17:10, on Zulip):

I suppose we'd have to instrument salsa as well?

matklad (Jun 03 2019 at 17:11, on Zulip):

@Laurențiu Nicola ideally, yes, but I think just measuring keys/values themselves would be good enough, and that we can do without touching salsa

nnethercote (Jun 03 2019 at 23:36, on Zulip):

@matklad: I advise against using heapsize. I implemented it in Servo. It was kind of experimental. Some eager beaver (can't remember who) put it on crates.io. It has some major problems, such as not handling Arcs.

nnethercote (Jun 03 2019 at 23:39, on Zulip):

malloc_size_of is much better, and fixes most of the problems with heapsize. However, it relies on some mozjemalloc-specific features, such as the ability to get the size of an allocation when you have an interior pointer (i.e. a pointer that points somewhere in the allocation, but not necessarily at the start). I would also still class it as somewhat experimental.

nnethercote (Jun 03 2019 at 23:39, on Zulip):

These comments are worth reading: https://github.com/servo/servo/blob/master/components/malloc_size_of/lib.rs#L11-L47 and https://github.com/servo/servo/blob/master/components/malloc_size_of/lib.rs#L102-L105

nnethercote (Jun 03 2019 at 23:40, on Zulip):

@Matklad: looks like Webrender has a cut-down fork of malloc_size_of https://github.com/servo/webrender/blob/master/wr_malloc_size_of/lib.rs

nnethercote (Jun 03 2019 at 23:42, on Zulip):

One awkward thing about malloc_size_of is where it sits in the module hierarchy. Ideally it would be a built-in rust thing, and then data structures like HashSet could be built on top of it, rather than it being built on top of HashSet, if you see what I mean.

nnethercote (Jun 03 2019 at 23:46, on Zulip):

In general, live memory measurements are useful, and that's why somebody put heapsize on crates.io. But it's also something of a tricky problem given current language constraints. malloc_size_of is good enough for Servo, but not a truly general-purpose solution that's appropriate for putting on crates.io, IMO.

matklad (Jun 03 2019 at 23:48, on Zulip):

Thanks! I've also found another fork of malloc_size_of on crates.io: https://crates.io/crates/graphannis-malloc_size_of

nnethercote (Jun 04 2019 at 01:56, on Zulip):

Interesting. Maybe graphannis-malloc_size_of might be a good place to start

matklad (Jun 04 2019 at 07:36, on Zulip):

Yeah, I guess, I would either use that, or publish ra_malloc_size_of to crates.io :-) Thanks again, this all was super useful!

Laurențiu Nicola (Jun 04 2019 at 09:08, on Zulip):

What's the issue with deepsize?

matklad (Jun 04 2019 at 09:27, on Zulip):

at the moment, it doesn't support enums

Laurențiu Nicola (Jun 04 2019 at 09:27, on Zulip):

enums?

matklad (Jun 04 2019 at 09:28, on Zulip):

yeah, it just panics on SmolStr, because that contains an enum inside

Laurențiu Nicola (Jun 04 2019 at 09:29, on Zulip):

is that a fundamental limitation, or is just not implemented?

matklad (Jun 04 2019 at 09:29, on Zulip):

just not implemented

matklad (Jun 04 2019 at 09:29, on Zulip):

Like, all of {deepsize,heapsizeof,malloc_size_of} work approximately the same and are not complex in theory

matklad (Jun 04 2019 at 09:30, on Zulip):

The question is, which one is closest to being production ready and safe to use across the crate graph

Edwin Cheng (Jun 06 2019 at 17:18, on Zulip):

I managed to get some stats for subtree count by inserting dbg(subtree.count) in macro expanding code:

Edwin Cheng (Jun 06 2019 at 17:18, on Zulip):
total = 32234
mean = 122.89470745175902
median = 64.0
max = 34222
The first 100 subtree.count :

count |  occurred
58    => 2286
46    => 1535
69    => 1063
24    => 868
81    => 867
0     => 741
74    => 573
87    => 564
23    => 473
53    => 413
25    => 406
18    => 401
93    => 400
218   => 398
57    => 390
99    => 376
7     => 344
21    => 342
101   => 328
54    => 312
32    => 296
34    => 287
29    => 279
105   => 276
27    => 273
43    => 265
28    => 262
48    => 256
70    => 256
15    => 255
84    => 252
51    => 246
22    => 242
35    => 232
44    => 227
39    => 225
31    => 224
72    => 223
111   => 220
64    => 219
33    => 218
63    => 217
90    => 208
37    => 203
50    => 192
47    => 192
41    => 191
26    => 190
61    => 183
76    => 181
117   => 180
103   => 176
96    => 165
91    => 162
77    => 152
52    => 150
17    => 148
30    => 145
60    => 143
79    => 141
193   => 135
38    => 132
302   => 127
73    => 126
56    => 124
55    => 123
59    => 122
12    => 120
42    => 118
129   => 117
49    => 117
45    => 117
108   => 115
106   => 113
115   => 112
40    => 108
135   => 107
16    => 107
126   => 104
114   => 104
102   => 102
67    => 102
97    => 99
88    => 99
83    => 96
112   => 95
123   => 94
78    => 94
75    => 92
94    => 88
95    => 84
65    => 84
62    => 82
66    => 82
141   => 80
147   => 79
113   => 78
10    => 78
80    => 78
207   => 74
Edwin Cheng (Jun 06 2019 at 17:19, on Zulip):

By running ra-cli anaylsis-stats in ra-analyizer

matklad (Jun 06 2019 at 17:25, on Zulip):

64

That looks like SmallVec wouldn't help much?

Edwin Cheng (Jun 06 2019 at 17:27, on Zulip):

I think so.

Edwin Cheng (Jun 06 2019 at 17:27, on Zulip):

How about TreeArc ?

matklad (Jun 06 2019 at 17:29, on Zulip):

Hm, TreeArc works only with syntax trees, it's not a general-purpose smart pointer. Some kind of arena-based solution seems best here

Edwin Cheng (Jun 06 2019 at 17:31, on Zulip):

And forget to write it down : the sum of all count is 3961388

Edwin Cheng (Jun 06 2019 at 17:32, on Zulip):

And note that the dbg is inserted in crate::ids::macro_expand_query , so it is basically the number of tokentree stored in salsa

Laurențiu Nicola (Jun 07 2019 at 09:19, on Zulip):

probably a stupid question, but could we compress some of the data?

Laurențiu Nicola (Jun 07 2019 at 09:20, on Zulip):

I assume most space is taken by salsa values (not keys), and those are trees more often than not?

matklad (Jun 07 2019 at 09:29, on Zulip):

Yeah, most of the space is occupied by the parse trees

matklad (Jun 07 2019 at 09:30, on Zulip):

the recent PR adds an LRU for them, but I wonder if there's a more precise strategy

matklad (Jun 30 2019 at 11:50, on Zulip):

https://github.com/rust-analyzer/rust-analyzer/pull/1463 add --memory-usage flag to analysis-stats:

Analysis: 18.062757431s, 891mb allocated 1263mb resident
   206mb MacroExpandQuery
    96mb ParseQuery
    82mb MacroArgQuery
    70mb CrateDefMapQuery
    21mb RawItemsWithSourceMapQuery
     7mb MacroDefQuery
     7mb AstIdMapQuery
     4mb ImplsInModuleWithSourceMapQuery
  3285kb ImplDatumQuery
  2947kb GenericParamsQuery
  2348kb InferQuery
  1901kb BodyHirQuery
  1890kb ParseMacroQuery
  1797kb ImplsForTraitQuery
  1613kb BodyWithSourceMapQuery
   651kb FnDataQuery
   601kb ExprScopesQuery
   359kb ImplsInCrateQuery
   244kb GenericPredicatesQuery
   241kb TypeForDefQuery
   175kb TypeAliasDataQuery
   174kb GenericDefaultsQuery
   153kb CallableItemSignatureQuery
   144kb StructDatumQuery
   124kb EnumDataQuery
    46kb StructDataQuery
    38kb TraitDatumQuery
    20kb TraitItemsIndexQuery
    19kb ModuleLangItemsQuery
    17kb TypeForFieldQuery
    15kb LangItemsQuery
    11kb TraitDataQuery
     8kb NormalizeQuery
   4096b ImplementsQuery
      0b SourceRootCratesQuery
      0b RawItemsQuery
      0b ImplsInModuleQuery
      0b ConstDataQuery
      0b StaticDataQuery
      0b LangItemQuery
      0b DocumentationQuery
      0b AssociatedTyDataQuery
Florian Diebold (Jun 30 2019 at 12:50, on Zulip):

the CrateDefMaps are pretty big, maybe interning the names would help there?

matklad (Jun 30 2019 at 13:30, on Zulip):

Yeah, I guess we can just intern all Names via salsa interning and see how it goes

Last update: Nov 12 2019 at 17:00UTC