Stream: t-compiler/wg-rls-2.0

Topic: Scaling Analyzer to Google-sized codebases


matklad (Feb 19 2019 at 12:02, on Zulip):

I've recently watched the talk about clangd (https://www.youtube.com/watch?v=5HIyAXj1YNQ), and it's significantly different from other IDE talks in that it places emphasis on working with behemoth codebases. I don't think we need to solve this problem right now (getting precise completions for a single file with traits is much more important), but it's something to keep in mind.

The trick they use is to split IDE into "analyze the current translation unit" and "analyzer the whole workspace" bits. Analyze TU "just" runs clang-frontend to get completions for methods, etc. Analyze workspace uses offline, potentially remote indices.

In Rust, I fear we might get into trouble with just the first bit. In C++, because of header files, it should be pretty easy to analyzer only "interface" of the dependencies, skipping cpp files altogether. In Rust, we don't have header files, so kinda need to analyzer the whole transitive closure of code.

It might be cool to introduce a notion of "crate interface file" (not expressible in the surface syntax) so that we can break dependency chain. I imagine a reified "public interface of the crate" will be useful for public-private dependencies as well .

Florian Diebold (Feb 19 2019 at 14:43, on Zulip):

Yeah, I think saving & loading the item map etc. would actually make sense, already to avoid parsing all of std on every startup. The compiled crates from rustc are something like that, aren't they? Though we'd need a more stable format, of course.

And I feel like any big project will be split into many small crates, so this might already be enough.

matklad (Feb 19 2019 at 14:52, on Zulip):

The problem with item-map is that it contains all of the private items as well. And splitting into crates does not fundamentally change the picture: you still have to process the transitive closure of reachable code.

In (non-templated) C++ headers the situation is different: you can use tricks like pimpl to make sure that you don't even have to look at the implementation code. Moreover, the impl can change freely without affecting the header.

I guess we can do similar with the item-map setup, if we add a public API projection query.

Florian Diebold (Feb 19 2019 at 15:11, on Zulip):

we would need the private items to provide a "item is private" diagnostic instead of "item not found" though, especially if we want to have an assist that turns it public for local crates, right? :thinking:

matklad (Feb 19 2019 at 15:16, on Zulip):

Good point! I think it woudn't be too bad to say "item not found" for completely external crates, but doing things differently for local and remote crates seems hard.

And, given that item-map lacks item bodies, perhaps I am trying to solve a non-existing problem? Like, if you want to link the resulting library/binary, than you need to be able to process everything (including bodies) on a single machine.

Florian Diebold (Feb 19 2019 at 15:24, on Zulip):

do we currently have statistics about the memory usage of the item maps? I don't think so, right?

matklad (Feb 19 2019 at 15:25, on Zulip):

Yeah, we don't have it, but shouldn't be too hard to add

matklad (Feb 19 2019 at 15:27, on Zulip):

https://github.com/Aeledfyr/deepsize or heap_size_of could be useful

nikomatsakis (Feb 19 2019 at 18:15, on Zulip):

I feel like (a) this is often the role that nrc was citing for save analysis but (b) it seems like this would be achieved by having a "outer frontier" of query results that we can preserve from crates and things (without having to preserve all the detailed work needed to reproduce that frontier). In other words, I'd really like to express this idea in terms of queries

Dale Wijnand (Feb 20 2019 at 10:29, on Zulip):

In Rust, I fear we might get into trouble with just the first bit. In C++, because of header files, it should be pretty easy to analyzer only "interface" of the dependencies, skipping cpp files altogether. In Rust, we don't have header files, so kinda need to analyzer the whole transitive closure of code.

It might be cool to introduce a notion of "crate interface file" (not expressible in the surface syntax) so that we can break dependency chain. I imagine a reified "public interface of the crate" will be useful for public-private dependencies as well .

Another option is doing a two-phase compilation, first outlining the API and later compiling the bodies. See this presentation by Twitter where they're exploring this in an experimental compiler for Scala: https://www.youtube.com/watch?v=8SnIBkJXD8I.

nikomatsakis (Feb 20 2019 at 11:49, on Zulip):

This two-phase setup basically falls out from the query system, at least if we set things up correctly -- but it's a good thing for us to pay attention to!

Dale Wijnand (Feb 20 2019 at 12:38, on Zulip):

OK, cool! :smile:

Last update: Nov 12 2019 at 17:10UTC