Stream: t-compiler/wg-rls-2.0

Topic: Design question: identity in syntax trees


matklad (Feb 13 2020 at 13:47, on Zulip):

Discussion for https://github.com/rust-analyzer/rust-analyzer/issues/3129

matklad (Feb 13 2020 at 13:48, on Zulip):

That's basically about the same core question all other again:

or

?

matklad (Feb 13 2020 at 13:59, on Zulip):

cc @Florian Diebold in case you have some brilliant idea how to deal with it.

Florian Diebold (Feb 13 2020 at 14:29, on Zulip):

no brilliant idea, but I feel like if we do generate mirrored APIs for InFile<AstNode>s (and maybe add further context and give it a better name than InFile), we can get an acceptable API

matklad (Feb 13 2020 at 14:29, on Zulip):

maybe add further context

Ctx<T> ? :)

matklad (Feb 13 2020 at 14:30, on Zulip):

Note that generating API would require to duplicate AST traits, like TypeParamsOwner as well

Florian Diebold (Feb 13 2020 at 14:33, on Zulip):

yeah, and duplicating hand-written extensions :(

Florian Diebold (Feb 13 2020 at 14:34, on Zulip):

maybe we could invert it to SyntaxNode<Context> :thinking:

Florian Diebold (Feb 13 2020 at 14:34, on Zulip):

so the current syntax nodes would be SyntaxNode<()>

matklad (Feb 13 2020 at 14:34, on Zulip):

Yeah, that's another option

matklad (Feb 13 2020 at 14:35, on Zulip):

I'd even say we can make it on-generic, by using SyntaxNode<u64>, because everybody should use indexes anyway

matklad (Feb 13 2020 at 14:36, on Zulip):

And I think that's how most compiles work: sytnax generally remmebers a file it originated from, which is usually good enough for identity

matklad (Feb 13 2020 at 14:36, on Zulip):

I think Swift's libsyntax is an expcetion: they don't store the original file, or any other bits of context

Florian Diebold (Feb 13 2020 at 14:37, on Zulip):

I feel like if we define type SyntaxNode = SyntaxNode<()> the parameter probably wouldn't be annoying

matklad (Feb 13 2020 at 14:37, on Zulip):

And i really, really love that design, abstractly. But I don't see we how that would square with API we want...

Florian Diebold (Feb 13 2020 at 14:38, on Zulip):

so what kind of API does swift have?

matklad (Feb 13 2020 at 14:39, on Zulip):

I belive that haven't move semantic analysis over libsyntax yet, so they don't have an API which requires identity at all

Florian Diebold (Feb 13 2020 at 14:39, on Zulip):

ah :grimacing:

matklad (Feb 13 2020 at 14:40, on Zulip):

I think they do have strong pointer identity of trees though, so, in theory, they can use pointer-keyed hash maps

matklad (Feb 13 2020 at 14:47, on Zulip):

It's also interesting how we hit this problem because of macros.

Before, you were generally working with syntax from a single file, so there was no need for fine-grained tweaking of the context.

Now, you can hold several trees simulatenously, which originate from different macros, and kind-a need to pair a tree with ctx...

Florian Diebold (Feb 13 2020 at 14:50, on Zulip):

btw,

It does not actually yield a true global identity, because the same file can be included twice into a module tree of a crate (Note how this makes syntax<->semantics not a bijection, which is painful to deal with).

should that maybe lead to different HirFileIds?

Florian Diebold (Feb 13 2020 at 14:51, on Zulip):

if we have multiple copies of a whole crate in the crate graph because of e.g. configurations, does that lead to different HirFileIds?

matklad (Feb 13 2020 at 14:52, on Zulip):

Yeah, I think we need to somehow fix it, and to make HirFileId unambgious. Last time I've looked into it I've hit a chicken and egg problem though

matklad (Feb 13 2020 at 14:53, on Zulip):

Basically, what we need is to include a ModuleId into HIrFileId. However, we can't get a ModuleId before we parse it's file into raw items.

And the query to get raw items needs a file id!

matklad (Feb 13 2020 at 15:35, on Zulip):

Looked at Roslyn, I think their magic is relying on trees identity: http://sourceroslyn.io/#Microsoft.CodeAnalysis.CSharp/Compilation/SyntaxTreeSemanticModel.cs,1710

matklad (Feb 13 2020 at 15:38, on Zulip):

And that's actually is interesting, because, unlike SyntaxNode<Context>, the context is meaningless. Ie, does this mean that we don't even need an index, and just SyntaxNode<impl Eq>?

matklad (Feb 13 2020 at 15:45, on Zulip):

Ok, I think I am onto something!

matklad (Feb 13 2020 at 16:42, on Zulip):

So, this is the proposal:

std::Veetaha (Feb 13 2020 at 17:15, on Zulip):

I remember you saying that green node doesn't have an identity and is easily interned, but red one has through the parent pointer. I guess these properties are preserved with your proposal, aren't they? And if Syntax Node already has an identity, making it global seems reasonable

matklad (Feb 13 2020 at 17:18, on Zulip):

Exactly! Green node still is identity less. SyntaxNode gets a real identity (currently, it has identity within the file, and comparison of nodes across files is just ill-defined and might return true)

std::Veetaha (Feb 13 2020 at 17:20, on Zulip):

Heh, looks like ub when you put it like that)

Florian Diebold (Feb 13 2020 at 17:20, on Zulip):

yeah, that might actually work out :thinking:

std::Veetaha (Feb 13 2020 at 17:22, on Zulip):

But, are there sensible caveats?

std::Veetaha (Feb 13 2020 at 17:24, on Zulip):

Maybe it is ridiculous, but what is an estimate of increased memory usage? I remember you being concerned about usize vs u32 for TextUnit

matklad (Feb 13 2020 at 17:27, on Zulip):

We don't increase the size of GreenNodes, the size of SyntaxNodes does not matter that much as there are few of them, and otherwise we add roughly one u32 + interning metadata per file/macro

matklad (Feb 18 2020 at 15:53, on Zulip):

Ok, I am experiemnting with it right now, and I have a problem: I need a thing, which is exactly like SourceBinder and SourceAnalyzer, but different from those, and I need a name for it :laughter_tears:

Florian Diebold (Feb 18 2020 at 15:56, on Zulip):

shouldn't we be able to get rid of SourceAnalyzer with this :thinking:

also, what does the new thing do?

matklad (Feb 18 2020 at 15:58, on Zulip):

The same, but without InFile<T>

matklad (Feb 18 2020 at 15:58, on Zulip):

well, it doesn't do anything yet, because it doesn't exist, but I think we should be able to cut own a lot of InFiles with it

matklad (Feb 18 2020 at 16:01, on Zulip):

pasted image

matklad (Feb 18 2020 at 16:02, on Zulip):

The above is what we have now, the bellow is what I want

Florian Diebold (Feb 18 2020 at 16:03, on Zulip):

why not token.descend_into_macros(&source_binder) / token.ancestors_with_macros?

matklad (Feb 18 2020 at 16:04, on Zulip):

That would require an extension trait

matklad (Feb 18 2020 at 16:05, on Zulip):

That's actually exactly how C# API looks like I think: https://github.com/dotnet/roslyn/wiki/Getting-Started-C%23-Semantic-Analysis

matklad (Feb 18 2020 at 16:05, on Zulip):

And yes, I hope this new thing subsumes bothe SourceBinder and SourceAnalyzer

matklad (Feb 18 2020 at 16:06, on Zulip):

Also, as a minor point, I think we should use caches like in SourceBinder, but we should hide them behind RefCell, so that we dont' need mut al over the place

Florian Diebold (Feb 18 2020 at 16:07, on Zulip):

how about calling it RustAnalyzer :big_smile:

Florian Diebold (Feb 18 2020 at 16:07, on Zulip):

more seriously, maybe just Analyzer

matklad (Feb 18 2020 at 16:08, on Zulip):

Note that we already have Analysis. I acutally think that SemanticModel / Semantics might be a good name.

matklad (Feb 18 2020 at 16:08, on Zulip):

Because that's what it is -- semantics ascribed to syntax

Florian Diebold (Feb 18 2020 at 16:10, on Zulip):

I'm not a fan of SemanticModel because it doesn't seem to contain the actual model

matklad (Feb 18 2020 at 16:11, on Zulip):

SemanticApi?

Emil Lauridsen (Feb 18 2020 at 16:18, on Zulip):

matklad said:

SemanticApi?

That doesn't seem too unreasonable

Florian Diebold (Feb 18 2020 at 16:19, on Zulip):

I like that more... though I think Semantics might also be fine

Florian Diebold (Feb 18 2020 at 16:20, on Zulip):

does it actually contain any context, apart from the source binder cache thing?

matklad (Feb 18 2020 at 16:20, on Zulip):

Yeah, i guess the main Q is whether this API flies at all. It might, and if it does, it should simplify everything a lot (also, I am scared by amount of refactorig this will require to complete)

matklad (Feb 18 2020 at 16:21, on Zulip):

does it actually contain any context, apart from the source binder cache thing?

Yeah, I imagine it would contain a ton of hash maps keyed by SyntaxNodes

Emil Lauridsen (Feb 18 2020 at 16:21, on Zulip):

I'm a big fan of designing APIs by writing the usage code first, and then iterating from there, so it seems like a good starting point

matklad (Feb 18 2020 at 16:21, on Zulip):

At least, it'll have something like HashMap<TreeId, HirFileId>

Florian Diebold (Feb 18 2020 at 16:21, on Zulip):

hmm I thought that would be in the db?

Florian Diebold (Feb 18 2020 at 16:22, on Zulip):

or the db contains the other direction, and this needs to reverse it?

Florian Diebold (Feb 18 2020 at 16:23, on Zulip):

I guess I'll just wait how it actually looks ;)

matklad (Feb 18 2020 at 16:23, on Zulip):

In general, I'd expect the other direction

matklad (Feb 18 2020 at 16:23, on Zulip):

well, acutally, I don't really know myself yet :)

matklad (Feb 18 2020 at 16:24, on Zulip):

But rowan branch with ids is https://github.com/rust-analyzer/rowan/tree/identity

matklad (Feb 18 2020 at 16:28, on Zulip):

well, acutally, I don't really know myself yet :)

So, I think the general idea is rougthly what we already do with SourceBinder.

We store things like

struct Parent {
   children: FxHashMap<Name, Child>
}

and, in general, we want to do look up by Child. We do this recursively -- we lookup parent, than inver and cache (in semantics) the children map and lookup the node by id it it

Florian Diebold (Feb 18 2020 at 16:29, on Zulip):

so it's still just a cache, right?

matklad (Feb 18 2020 at 16:29, on Zulip):

right

matklad (Feb 18 2020 at 16:30, on Zulip):

basically, the three lines starting from here sourceroslyn.io/#Microsoft.CodeAnalysis.CSharp/Compilation/SyntaxTreeSemanticModel.cs,1708, is the solution to all our API problems

matklad (Feb 18 2020 at 16:30, on Zulip):

hopefully :D

matklad (Feb 18 2020 at 16:30, on Zulip):

ah, nope, the core is actually the outer for

matklad (Feb 18 2020 at 16:30, on Zulip):

http://sourceroslyn.io/#Microsoft.CodeAnalysis.CSharp/Compilation/SyntaxTreeSemanticModel.cs,1695

matklad (Feb 18 2020 at 17:37, on Zulip):

Did a minimal thing here: https://github.com/rust-analyzer/rust-analyzer/pull/3222

matklad (Feb 18 2020 at 17:39, on Zulip):

The funniest thing is that I think we don't need a strong identity for this to work. I need to think through this tomorrow, but it looks like the fact that we store syntax nodes in the cache solves the potential problem when we get two different trees due to LRU.

matklad (Feb 18 2020 at 17:40, on Zulip):

And yes, semancits creates source binder which creates source analyzer =/

Jeremy Kolb (Feb 18 2020 at 19:03, on Zulip):

Would strong identity still be worth having?

Florian Diebold (Feb 18 2020 at 19:16, on Zulip):

well, this approach only works if we only obtain syntax nodes through the Semantics thing, right?

Jeremy Kolb (Feb 18 2020 at 20:12, on Zulip):

Right. I was thinking it could be useful for someone else

matklad (Feb 18 2020 at 21:48, on Zulip):

well, this approach only works if we only obtain syntax nodes through the Semantics thing, right?

I think this is true for any identity based approach. If all you have is an identity, the only way to get something out of it is to look it up in a hashmap which was created by an entity that returned you this thing with identity

matklad (Feb 25 2020 at 13:46, on Zulip):

Lol, turns out rowan has a bug, in that it tries to use pointer equality, but unfortunately there's one more level of indiretcion, and i was comparing pointers to pointers :)

std::Veetaha (Feb 25 2020 at 14:01, on Zulip):

You should add a test for that...

Laurențiu Nicola (Feb 25 2020 at 14:02, on Zulip):

Was it wrong or only slower?

matklad (Feb 25 2020 at 14:02, on Zulip):

wrong

std::Veetaha (Feb 25 2020 at 14:05, on Zulip):

But what changed? It still compares ptr() == ptr()...

matklad (Feb 25 2020 at 14:08, on Zulip):

it's now a *const NodeData, rather than *const *const NodeData

std::Veetaha (Feb 25 2020 at 14:19, on Zulip):

We can compare *const NodeData pointerwise because they are interned right?

Last update: Sep 22 2020 at 01:45UTC