Stream: t-compiler/wg-rls-2.0

Topic: Architecture quetion: to remember or to compute


matklad (Dec 10 2019 at 13:23, on Zulip):

This a thread to discuss a problem which I know exists, but which I don't know how to solve.

Consider this sceenshot:

pasted image

As you see, the ::Hash is not syntax highlighted properly, because IDE can't figure out that it is an associated type of the BlockT trait. The architectural problem here is that hir actually knows this correctly! We lose this info on the way from hir to ide, and I'd like to discuss this (pretty general I think problem)

matklad (Dec 10 2019 at 13:31, on Zulip):

Specifically, there's a code in hir which lowers Result<Vec<Block::Hash>> syntax to a type (Ty::from_hir (the name is bad)). Inside that method, we properly resolve ::Hash, but we then forget this bit of info: the rest of the compiler is interested only in the results of lowering as a whole, and in the subresult for ::Hash.

matklad (Dec 10 2019 at 13:34, on Zulip):

So, when in IDE we want to compute gotodef for ::Hash, we have two choices:

In the first approach, you synthesize a path like Block::Hash, and ask hir: "what would be the result of resolving this path, if it were used in this context", where "this context" is a somewhat fussy concept.

In the second approach, the Ty::from_hir would compute not only the final Ty value, but also an auxilary mapping, <PathSyntax, PathResolution>. This mapping won't be used by the rest of the compiler, but would be used by an IDE for goto def.

matklad (Dec 10 2019 at 13:37, on Zulip):

The current approach is obviously a broken mixture of both.

Specifically, we use "re-derive" for "top-level" paths. For paths inside expressions though, we use a side-table approach, because computing the correct lexical scope is tricky. In particular, in the above screenshot we hit the "re-derive" path, and fail, because we don't take into account the scope with generic parameters (which is a pretty complex beast, with assoc types and projections)

matklad (Dec 10 2019 at 13:40, on Zulip):

In general, I am starting to believe that trying to re-derive things is a bad idea: that's a hidden copy-paste of code, which is bound to diverge over time, especially in various corner cases.

Daniel Mcnab (Dec 10 2019 at 13:40, on Zulip):

What type is Result<Vec<Block::Hash>> lowered to, and would this be any different if it was just Block::Hash?

Daniel Mcnab (Dec 10 2019 at 13:42, on Zulip):

I'm not even certain that those questions make sense, but if they do I hope they can clarify the issue.

Florian Diebold (Dec 10 2019 at 13:43, on Zulip):

In general, I am starting to believe that trying to re-derive things is a bad idea: that's a hidden copy-paste of code, which is bound to diverge over time, especially in various corner cases.

yeah, macro expansion was another example until recently -- the SourceAnalyzer resolved macros again, with sometimes slightly different results

matklad (Dec 10 2019 at 13:43, on Zulip):

I would say that it is lowerResult<Vec<Block::Hash>>, but with IDs instead of identifiers, and where Block::Hash remebers from which trait the ::Hash bit comes. The question makes sense, but the answer is funny "it's the same after lowering, although the meaning is completley different"

Florian Diebold (Dec 10 2019 at 13:45, on Zulip):

I agree that long-term, anything that's not trivial to derive (i.e. just calling a query) should be remembered

Florian Diebold (Dec 10 2019 at 13:46, on Zulip):

this is actually kind of similar to the diagnostics problem, isn't it, in that it's some additional 'side channel' information we'd like to produce

matklad (Dec 10 2019 at 13:54, on Zulip):

So, let me finish the above write up...

The are several problem with "sidetable" approach as well, I think.

One thing I've tried to do is to basically make a function which returns a (Data, Map<SyntaxNode, Stuff>), which can either be called directly, or via a query which returns only Data. The funny thing with this approach is that we do compute everything twice.

matklad (Dec 10 2019 at 13:54, on Zulip):

@Florian Diebold yeah, diagnostics feel similar

matklad (Dec 10 2019 at 13:57, on Zulip):

And, in fact, in Kotlin they use the same binding trace thing for both:

  1. https://github.com/JetBrains/kotlin/blob/344bc33096f0a6786cfde8a85297be45fc6b9239/compiler/frontend/src/org/jetbrains/kotlin/resolve/lazy/descriptors/LazyTypeParameterDescriptor.java#L60
  2. https://github.com/JetBrains/kotlin/blob/344bc33096f0a6786cfde8a85297be45fc6b9239/compiler/frontend/src/org/jetbrains/kotlin/resolve/lazy/descriptors/LazyTypeParameterDescriptor.java#L67

1 records an association between syntax and Kotlin's HIR, 2 records an error

Florian Diebold (Dec 10 2019 at 13:58, on Zulip):

I feel like we 'just' need IDs for type refs similar to ExprIds and a source mapping for them... yeah, it'll be more work to get from a source location to the actual data, but it's the same thing we're already doing

matklad (Dec 10 2019 at 13:59, on Zulip):

@Florian Diebold I am not sure... Like, the rest of the compiler doesn't care about what bits a TypeRef consists of, it only cares what it lowers to, as a whole.

matklad (Dec 10 2019 at 15:04, on Zulip):

So, I think our options are:

I am somewhat leaning towards the latter option, because it avoids the need to store stuff which we don't really need anyway.

matklad (Dec 10 2019 at 15:05, on Zulip):

Also, cc @nikomatsakis , I feel this thread an interesting question for the overall RLS 2.0 architecture .

Florian Diebold (Dec 10 2019 at 15:06, on Zulip):

we could probably do simple path-like pointers for this, like root -> first type param -> second type param, that would be incremental-friendly and not require any changes to the TypeRef structure itself?

Florian Diebold (Dec 10 2019 at 15:07, on Zulip):

and as long as the AST->HIR lowering is trivial, it would even not require a source map, though probably that won't stay that way

matklad (Dec 10 2019 at 15:07, on Zulip):

@Florian Diebold for ast-stable pointers? Yeah, that's an option, though I think (parent_item_id, relative_range) might be a more-general, less code option.

Florian Diebold (Dec 10 2019 at 15:08, on Zulip):

ranges, even relative ones, aren't as stable though... but probably that's fine

matklad (Dec 10 2019 at 15:10, on Zulip):

Hm, actually I am confused here: if you are talkig about type parameters, that's already somewhat handled since couple of days ago. If you are talking about type arguments (and that are arguments in the initial screenshot), then the concept of root is slighly fuzzy.

Relative ranges should be ok -- you re-run lowering code anyway if the syntax node changes. A somewhat painful moment here is that just range is not enough, there could be macros, so you'll need something like InFile<TextRange>...

Florian Diebold (Dec 10 2019 at 15:10, on Zulip):

yeah, I mean type arguments

Florian Diebold (Dec 10 2019 at 15:11, on Zulip):

basically, to be able to address any part of a type ref like Result<Vec<Block::Hash>>

Florian Diebold (Dec 10 2019 at 15:11, on Zulip):

you re-run lowering code anyway if the syntax node changes

Florian Diebold (Dec 10 2019 at 15:11, on Zulip):

you re-run AST->HIR lowering, not necessarily HIR->type

matklad (Dec 10 2019 at 15:12, on Zulip):

If AST->HIR stays the same, it's probably the case that relative ranges are the same as well. You loose optimizations on typing whitesapce, but that doesn't seem important

Florian Diebold (Dec 10 2019 at 15:24, on Zulip):

yeah, that's what I was thinking of, but you're right, it doesn't seem important

Florian Diebold (Dec 10 2019 at 15:26, on Zulip):

unless people start having huge type references spanning multiple lines I guess :sweat_smile:

Florian Diebold (Dec 10 2019 at 15:29, on Zulip):

then the concept of root is slighly fuzzy.

I think there should always be a clear root from the query that actually does the lowering -- i.e. if you're lowering the return type in a function signature, that's where your root is, etc.

matklad (Dec 10 2019 at 15:46, on Zulip):

Aha, I understand now, that's basically our AstIdMap, but extended to the bodies of the items. So, the id is something like "the nth descendant of a specific type". That would work, but would require threading the relevant mapping of syntax nodes to positional ids everywhere.

I sort of feel the core question is "do we store such a map in salsa at all?". If we do, we should pick any id scheme and be done with it. If we don't we can use syntax nodes themselves, which seems like it'd reduce complexity by a large constant factor.

Florian Diebold (Dec 10 2019 at 16:10, on Zulip):

I'm not sure I understand how it would work without storing the map

matklad (Dec 10 2019 at 16:11, on Zulip):

Some of the newer hir source-maps work this way: https://github.com/rust-analyzer/rust-analyzer/blob/ee904594af174866cfa10474fd8fd3f23580b6af/crates/ra_hir_def/src/child_by_source.rs#L105-L124

matklad (Dec 10 2019 at 16:12, on Zulip):

Like, we basically have the same two queries (foo and foo_with_source_map), but foo_with_source_map is #[salsa::transparent], so it can return SyntaxNodes directly

matklad (Dec 10 2019 at 16:13, on Zulip):

basically == the actual implementation at the moment goes via a couple of abstractions which are probably overly-complex for the task at hand...

Florian Diebold (Dec 10 2019 at 21:46, on Zulip):

in this case, we have two steps, though (AST -> HIR and HIR -> resolved type). We can't have the second step use SyntaxNodes directly, right?

matklad (Dec 10 2019 at 22:33, on Zulip):

Hrm, you are correct...

matklad (Dec 10 2019 at 22:34, on Zulip):

this... actually reminds me of another thing was thinking about recently. Current unresolved TypeRef doesn’t seem too useful

matklad (Dec 10 2019 at 22:36, on Zulip):

the lowering code in Ty spends a lot of effor on path resolution, which ideally belongs to the hir_def crate. I wonder if we can get rid of TypeRef altogether? I am not exactly sure that it helps with incrementality

matklad (Dec 10 2019 at 22:50, on Zulip):

Even more generally: hir::Path is convenient, because it’s a value type. But, to implement side tables, we’ll have to change it from value-type to a type with identity (by assigning ids to segments, or by using ast::Path directly).

We want two exactly opposite things at the same time :-(

Florian Diebold (Dec 10 2019 at 22:51, on Zulip):

doing name resolution directly on the AST feels a bit weird ;) but maybe that's fine. I think type inference will need a way to refer to specific user-written types anyway though, so I don't think it gets us around that

Florian Diebold (Dec 10 2019 at 22:52, on Zulip):

I'm not convinced that we need to give paths etc. identities for that though

Florian Diebold (Dec 10 2019 at 22:54, on Zulip):

I guess in a sense, giving them an ID gives them an identity by definition, what I mean is it can stay a value type anyway

matklad (Dec 10 2019 at 23:06, on Zulip):

Yeah, in my terminology a path with id is not a value type. To be less ambitious, today we have a property that foo::bar is indistinguishable from any other foo::bar. This seems to be a theoretically useful property: with it, you guarantee that name resolution looks only at the essential information and cant’ make different choices depending on, say, position of path within the file. With ids, foo#1::bar#2 is not Eq to foo#3::bar#4, and we should manually ensure that resolver doesn’t care about specific ids....

matklad (Dec 10 2019 at 23:09, on Zulip):

Again, a sort of theoretical way out of the problem is to use positional ids: number all segments in increasing order and, when passing a sub path to some method, appropriately increment indexes in the return of this method.

matklad (Dec 11 2019 at 10:21, on Zulip):

Interesting!

Thinking about this more, the most interesting problem seems to be how do we implement goto def for imports (use std::collections::HashMap). This is the place where we want to avoid storing auxiliary info, if possible: CrateDefMap is already a costly data structure.

It's also interesting that Kotlin's imports optionally carry syntax with them, and side-table is recorded only for imports which carry syntax:

Florian Diebold (Dec 11 2019 at 10:59, on Zulip):

Again, a sort of theoretical way out of the problem is to use positional ids: number all segments in increasing order and, when passing a sub path to some method, appropriately increment indexes in the return of this method.

that seems like a (theoretically ;) ) nice approach to me

matklad (Dec 11 2019 at 17:38, on Zulip):

Note: this will require to add "size of sub tree" field to hir::Path and friends.

Florian Diebold (Dec 11 2019 at 19:01, on Zulip):

can't we just calculate that on the fly?

matklad (Dec 11 2019 at 20:13, on Zulip):

That will work in one direction, but not in the other. Finding nth node in the tree needs subtree sizes, or it’ll be linear (which maybe is ok?)

Florian Diebold (Dec 11 2019 at 20:20, on Zulip):

I'm assuming it'd be ok since paths and typerefs aren't going to be very big

Florian Diebold (Dec 12 2019 at 09:55, on Zulip):

... and would we actually need to ever find the nth node? I think we just need to have the current ID while walking the whole thing, and find the ID from the node

matklad (Dec 12 2019 at 10:02, on Zulip):

Maybe i am thinking about somehting different?

Consider something like VariantData, which stores a type ref for each field. I think we want to have "ids" of segments of paths in different fields be different, because we lower at variant granularity (ids might be the same between different structs though).

In this setup, when we zoom into a specific field, we need to know the largest ID int he preceeding fields

matklad (Dec 12 2019 at 10:13, on Zulip):

TBH, I can't imagine in full details how the scheme would work. I wonder if we should start trying things out? So far, I see these three alternatives (one of which isn't)

That is, the actual IDs should be exactly the same in both cases, the question is how do we store them.
Note that the last two approaches probably wont' work for paths in modules (for uses and top-level macros), because that would make crate def map more brittle and large. But I think it's ok to use re-derive approach for module paths, expecially if we separate "paths with ::<>" for use-style paths

matklad (Dec 17 2019 at 13:15, on Zulip):

Found an interesting case here. Currently the hir::Path doesn't handle nested supers, like super::super::super. A natural way to model that is

enum PathAnchor {
  Super(u8), // Super(0) == self::
}

That is, we lower a sequence of identifiers to a number... This seems like a case where implicit numbering of segments would be the most convenient

nikomatsakis (Dec 17 2019 at 15:30, on Zulip):

@matklad (I'm catching up here -- I've been feeling quite guilty about dropping salsa development on the floor, but I definitely had thoughts about "side channels" for diagnostics, I forget if we ever discussed them, I guess they might plausibly apply here too)

nikomatsakis (Dec 17 2019 at 15:31, on Zulip):

I'll try finish reading but maybe we could also have some sync time for a deeper discussion

matklad (Dec 17 2019 at 21:03, on Zulip):

@nikomatsakis sync time would be valuable, I think! So far, we have two questions:

And there's also the borrowed key work, which I anticipate a lot :)

matklad (Mar 02 2020 at 18:54, on Zulip):

Opened https://github.com/rust-analyzer/rust-analyzer/issues/3407 to not forget about this.

Laurențiu Nicola (Mar 02 2020 at 18:56, on Zulip):

With side tables it might be possible to store them on disk. Maybe that would be easier than implementing persistence in salsa?

Laurențiu Nicola (Mar 02 2020 at 18:57, on Zulip):

CC https://github.com/salsa-rs/salsa/issues/10, of course

matklad (Mar 02 2020 at 18:57, on Zulip):

I think these are mostly orthogonal issues.

matklad (Mar 02 2020 at 18:58, on Zulip):

I also wonder if we can somehow apply the tree identity trick of hir::Semantics here...

std::Veetaha (Mar 02 2020 at 19:06, on Zulip):

In one of your posts you said that being lazy and not storing hash-tables is what makes and IDE vs compiler. What changed?

matklad (Mar 02 2020 at 19:09, on Zulip):

not storing hash-tables

Not storing hash tables where you can use an array

matklad (Mar 02 2020 at 19:10, on Zulip):

In one of your posts you said that being lazy and not storing hash-tables is what makes and IDE vs compiler. What changed?

I must say I don't really understand this question, as i don't see any kind of contradiction between this discussion and sorbet discussion

std::Veetaha (Mar 02 2020 at 19:29, on Zulip):

Sorry, I might've missed the point, I was referring to this post

Last update: Sep 22 2020 at 02:45UTC