Stream: t-compiler/wg-rls-2.0

Topic: High-level architectue of rust-analyzer


matklad (Oct 14 2019 at 09:00, on Zulip):

I think it's time we split ra_hir into some pieces. It takes ages to compiler, and is a single crate which does everything (with a notable exception of macro_by_example and parsing, which do live in separate crates). I wonder how we actually go about doing that :)

What I like about current situation:

What I dislike about this solution is that OO interface forces everything to hang of of defs, which makes things harder to split. (like, if we split nameres and type inference, where would struct Function live? What about Function::signature?)

What is good for factoring is the db API: db.body(def), db.ty(def), where it seems like we can have a crate which only defines IDs, and than a bunch of analysis crates, which add various methods to the ids via different DBs.

I wonder if we should split ra_hir

into

ra_hir -- OO facade
  hir_ids (defines defs and `Body`)
  hir_lower ( provides `DefDatabase` over hir_ids)
  hir_ty (provides `TyDatabase` over hir_ids by using DefDatabase)

Note that means that, outside of hir, we will be doing a_struct.fields(hir_db), while inside hir_ty we will be doing def_db.struct_fields(a_struct)

@nikomatsakis @Florian Diebold what do you think about this? Should we perhaps create some markdown docs to scetch ideas?

Florian Diebold (Oct 14 2019 at 10:51, on Zulip):

that probably makes sense... I don't know how much splitting the crate will help with compile times, but we should certainly split out the type inference

matklad (Oct 14 2019 at 12:21, on Zulip):

So, what would be the next step here? Clearly, we can't just move types into a ra_hir_ty crate, because we still need ra_hir which has types. So perhaps we should not split out type inference, but everything else instead? Arghs, this seems tricky. I guess, someone (probably me) should just look into this?

Jeremy Kolb (Oct 14 2019 at 12:29, on Zulip):

I don't see a problem other than the difference in API between crates could be confusing so heavy docs and consistent visibility are a must.

Jason Williams (Oct 14 2019 at 12:43, on Zulip):

Howdy

matklad (Oct 15 2019 at 13:21, on Zulip):

tried splitting the thing in two here: https://github.com/rust-analyzer/rust-analyzer/pull/2017

matklad (Oct 15 2019 at 13:21, on Zulip):

It's not too bad in that it's actually possible to just mechanically split the code with little changes. But it's bad in a sense that I don't see a clear boundary we want to split at.

Florian Diebold (Oct 15 2019 at 14:21, on Zulip):

hm... maybe we should actually have at least an idea how to handle expression-level items before we split this up?

matklad (Oct 15 2019 at 17:59, on Zulip):

I want to move ExprId / ExprKind and lowering code to this extracted piece as well, precisely because I think expr-level items will interract with local scopes

Florian Diebold (Oct 15 2019 at 18:20, on Zulip):

OTOH, it might actually be enough to get the tree of 'anonymous modules' from the body and have the Expr::Blocks link back to them -- the name resolution doesn't actually care about expressions, I think

Florian Diebold (Oct 15 2019 at 18:21, on Zulip):

the question would of course be how to get that without knowing about expressions ;)

matklad (Oct 15 2019 at 18:36, on Zulip):

the name resolution doesn't actually care about expressions, I think

Error reporting sometimes cares: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2cd9c8e9bd6725e712f451cf1c11f33c

But yeah, in general, there are two dimensions across which we might cut: bodies/modules (roughtly, the current split between DefDb and HirDb) and before types / after types.

matklad (Oct 15 2019 at 18:36, on Zulip):

I am not sure which split is better, but I lean towards the first one. Specifically because it'll let us keep Resolver separated from types

matklad (Oct 15 2019 at 18:37, on Zulip):

And perhaps the first-order bit is to split "something", just to get more experience with salsa database being split non-trivially across several crates

matklad (Oct 15 2019 at 18:38, on Zulip):

In particular, I feel that our current OO thinking of code model just falls apart if we start splitting code :)

matklad (Oct 15 2019 at 19:19, on Zulip):

Let me also write down some possible API flavors for spliting compiler databases over several crates.

OO-ish

This is how the current hir works. Every entity, such as a Struct, has a bunch of methods which take a &db arg and return some piece of data. The nice thing with this approach is that it is code-completion friendly and is easy to read. The main drawback is that inherent methods can be defined only in a single crate, so, Struct::ty would have to be implemented via a trait, which might bring more complexity, especially for one-off private utility methods. The second drawback is that this approach is less efficient: projecting a field out of a StructData requires some sort of cloning, and getting two fields out of StructData requires to queries for StructData, because they are hidden behind accessors

Raw Data

This is the simplest thing which can work in salsa, and how the code in the branch above works. Basically, every entity is just an ID without methods. To do anything, you need to db.struct_data(struct_id). StrcutData contains local IDs, so you need to massage those to get absolute ids. The first benefit of this approach is that it's the simplest one that could work. The second benefit is that it works really well with spliting code across crate: the set of things you can do with StructId is different depending on whether you have a DefDatabase or HirDatabase.

The two drawbacks are:

Bound Entities

That's somewhat of a merger beween the two approaches, where the api is db.get(struct_id).fields(), where get returns a struct StructApi { db: &DB, id: StructId}. It seems like a brute force way to combine the benefits of the previous too approahces, but feels pretty complex to me. I don't have a good feeling about how it'll work in pratice.

My prefered approach

I think I currently lean towards rewriting everything we have in hir in terms of raw data approach, moving it into separate crates, and then implementing oo approach in the umbrella hir crate, solely as an API for IDE.

Jason Williams (Oct 17 2019 at 12:40, on Zulip):

(deleted)

nikomatsakis (Oct 25 2019 at 17:00, on Zulip):

@matklad -- is this what you wanted to discuss?

nikomatsakis (Oct 25 2019 at 17:00, on Zulip):

should I catch up on this topic first?

matklad (Oct 25 2019 at 17:02, on Zulip):

Yeah, this is related

nikomatsakis (Oct 25 2019 at 17:02, on Zulip):

I just skimmed the topic

nikomatsakis (Oct 25 2019 at 17:02, on Zulip):

but if you would rather chat in some other place (salsa zulip?) that's fine

matklad (Oct 25 2019 at 17:02, on Zulip):

I've also made a hackmd doc in the salsa zulip,

matklad (Oct 25 2019 at 17:03, on Zulip):

yeah, I feel like salsa's zuplip might be a better fit here

nikomatsakis (Oct 25 2019 at 17:03, on Zulip):

ok :)

matklad (Oct 25 2019 at 17:03, on Zulip):

https://salsa.zulipchat.com/#narrow/stream/145099-general/topic/Physical.20Architecture.20of.20Large.20Scale.20Salsa-based.20Compilers

Last update: Nov 12 2019 at 15:45UTC