Stream: t-compiler/rust-analyzer

Topic: How much Ram do we need?


matklad (Apr 20 2021 at 11:07, on Zulip):

I am again thinking that our ram consumption seems unreasonable

matklad (Apr 20 2021 at 11:07, on Zulip):

In the rust-analyzer directory, I did rm ./target -rf && cargo check && cargo vendor && cat vendor/**.rs > all.rs

matklad (Apr 20 2021 at 11:08, on Zulip):

So, the size of all our vendored deps is 37 megs

matklad (Apr 20 2021 at 11:08, on Zulip):

the size of the target directory after check is 500 megs

matklad (Apr 20 2021 at 11:10, on Zulip):

I think the size of the target should represent the upper bound on rust-analyzer's ram consumption -- cargo check processes everything, while we ignore function bodies (which should be the majority of code).

matklad (Apr 20 2021 at 11:10, on Zulip):

Yet, rust-analyzer occupies about 1GB of ram

matklad (Apr 20 2021 at 11:28, on Zulip):

The size of ~/.cache/JetBrains after cleaning and working with rust-analyzer a bit is 150 megs

matklad (Apr 20 2021 at 11:31, on Zulip):

So, either all those on-disk stores do compression, and it is 10x efficient (which is plausible) or we are using unreasonable amount of RAM for something?

matklad (Apr 20 2021 at 11:50, on Zulip):

Finally stopped whining about comparing with CLion and actually compared with CLion: https://github.com/rust-analyzer/rust-analyzer/issues/7330

Jonas Schievink [he/him] (Apr 20 2021 at 11:57, on Zulip):

interesting result! looks like there's still more data to intern/deduplicate then?

Jonas Schievink [he/him] (Apr 20 2021 at 11:58, on Zulip):

ah, we still duplicate all the file contents, right?

Jonas Schievink [he/him] (Apr 20 2021 at 11:58, on Zulip):

that should cost us around 50 MB

Laurențiu (Apr 20 2021 at 11:59, on Zulip):

We could claw back about 20 MB with https://github.com/rust-analyzer/rust-analyzer/issues/869#issuecomment-695171218

matklad (Apr 20 2021 at 12:02, on Zulip):

Another interesting bit: clion's adress space compresses much worse than ours

Florian Diebold (Apr 20 2021 at 12:02, on Zulip):

ideally, we wouldn't keep library code in memory at all, I guess? just a bit complicated to do with salsa

Florian Diebold (Apr 20 2021 at 12:04, on Zulip):

while we're at it, we could save the 'HIR' for libraries on disk and load it back, which would also solve the startup time problem and be useful for WASM/playground :thinking:

matklad (Apr 20 2021 at 12:05, on Zulip):

Uhu

matklad (Apr 20 2021 at 12:05, on Zulip):

And, do do that, we probably want to make sure that hir data is a flat array, which we can jusm memmap

matklad (Apr 20 2021 at 12:06, on Zulip):

and that by itself might bring some wins

Laurențiu (Apr 20 2021 at 12:06, on Zulip):

Or dump it in SQLite :D

Florian Diebold (Apr 20 2021 at 12:07, on Zulip):

somewhat relatedly, I've also wondered whether we could load our HIR from rustdoc's json :grimacing:

matklad (Apr 20 2021 at 12:08, on Zulip):

I mean, making a normalized relation model out of hir is probably going to far, but using SQLite as a key-value store is probably on of the best way to do on-disk storage

Laurențiu (Apr 20 2021 at 12:10, on Zulip):

(unrelated, https://github.com/rust-analyzer/rust-analyzer/issues/8599 might be a nice benchmark for memory and loading times)

Florian Diebold (Apr 20 2021 at 12:12, on Zulip):

I'm not sure we need something like sqlite when we just basically want to mmap a big datastructure and dump it fully back sometimes

Florian Diebold (Apr 20 2021 at 12:15, on Zulip):

something like https://crates.io/crates/rkyv might be helpful? (somewhat new and experimental though)

Laurențiu (Apr 20 2021 at 12:18, on Zulip):

It probably doesn't matter that much, but SQLite is a pretty safe choice. Something like sled might be nicer when it matures, but you can't go wrong with SQLite

Florian Diebold (Apr 20 2021 at 12:21, on Zulip):

I would usually agree, but I think we need almost none of the features of sqlite (like indexing, or doing any kind of update except writing the full database)

Jonas Schievink [he/him] (Apr 20 2021 at 12:21, on Zulip):

it would also mean having a mandatory dependency on C code, which we currently try to avoid

Laurențiu (Apr 20 2021 at 12:21, on Zulip):

Yeah, but at least it's a way forward if we want/can move more stuff to disk

Florian Diebold (Apr 20 2021 at 12:22, on Zulip):

I kind of want to do a prototype of this now :grimacing: maybe I'll find some time this week

Laurențiu (Apr 20 2021 at 12:23, on Zulip):

Like what if salsa could use an on-disk store? (It would probably be too slow for us)

Jonas Schievink [he/him] (Apr 20 2021 at 12:26, on Zulip):

this probably does need some salsa integration in any case, right?

Jonas Schievink [he/him] (Apr 20 2021 at 12:26, on Zulip):

we could also just use this https://github.com/michaelwoerister/odht

Laurențiu (Apr 20 2021 at 12:26, on Zulip):

Yeah, not gonna happen too soon

Florian Diebold (Apr 20 2021 at 12:26, on Zulip):

not necessarily, I think. I think the general "salsa persistence" problem is quite a bit harder

Jonas Schievink [he/him] (Apr 20 2021 at 12:26, on Zulip):

it was written for use in rustc to store the incremental compilation cache

Florian Diebold (Apr 20 2021 at 12:28, on Zulip):

this could IMO be done by having some if let Some(persistent_store) = db.persistent_store_for_crate(crate) { return persistent_store.get_data(id) } in a bunch of queries

Florian Diebold (Apr 20 2021 at 12:29, on Zulip):

with some hackery to make the persistent_store_for_crate query only be invalidated if the store file changes (or never, for a start)

Jonas Schievink [he/him] (Apr 20 2021 at 12:33, on Zulip):

that aside, there still seems to be significant duplication that we can address without persistence

Jonas Schievink [he/him] (Apr 20 2021 at 12:34, on Zulip):

unfortunately all the tooling for this sucks (case in point: you literally had to write the address space to disk and compress the file to get this information), so I'm not really sure how we're supposed to figure out what gets duplicated without auditing the entire codebase

Florian Diebold (Apr 20 2021 at 12:42, on Zulip):

yeah... makes me wonder why there isn't better tooling to analyze memory dumps. With debug info, that should be possible in principle, right?

Laurențiu (Apr 20 2021 at 12:43, on Zulip):

Not unless you record a stack trace on allocation (like DHAT)

Jonas Schievink [he/him] (Apr 20 2021 at 12:43, on Zulip):

DHAT even has all the info to help with this, it just doesn't, or it does but its UI is too convoluted

Jonas Schievink [he/him] (Apr 20 2021 at 12:44, on Zulip):

DHAT computes stuff like "on average only N bytes of these M byte allocations get used"

Jonas Schievink [he/him] (Apr 20 2021 at 12:44, on Zulip):

but not "hey this string is kept in memory 7 times" apparently

matklad (Apr 20 2021 at 12:51, on Zulip):

this could IMO be done by having some if let Some(persistent_store) = db.persistent_store_for_crate(crate) { return persistent_store.get_data(id) } in a bunch of queries

I'd probably do this a bit differntly

matklad (Apr 20 2021 at 12:52, on Zulip):

I think we can just change values to cache stuff on disk. Like, an Item tree can contain a PathBuf and load its data from disk on the first access (and evict data to disc under memory pressuer)

matklad (Apr 20 2021 at 12:53, on Zulip):

That is, that'll allow us to transparently unload some ram do disk. This won't help us with avoding computation during initial load.

Jonas Schievink [he/him] (Apr 20 2021 at 12:57, on Zulip):

we don't really have a good way to compute memory pressure without a custom allocator

Jonas Schievink [he/him] (Apr 20 2021 at 12:57, on Zulip):

maybe this can be hooked into salsa GC if the API for that is extended

Jonas Schievink [he/him] (Apr 20 2021 at 12:58, on Zulip):

so instead of outright deleting a value it just hands out a reference

Last update: Jul 28 2021 at 03:15UTC