Stream: t-compiler

Topic: frontend library-ification


centril (Sep 13 2019 at 15:04, on Zulip):

@matklad mainly I think if we do libast/libparser/liblowering refactorings now, then it's way nicer to hack on diagnostics without having to recompile things only depending on AST but not parser

matklad (Sep 13 2019 at 15:08, on Zulip):

Seems fine for me to do!

matklad (Sep 13 2019 at 15:08, on Zulip):

It's just that it won't be directly beneficial for rust-analyzer, as I don't think it can use rustc AST

centril (Sep 13 2019 at 15:09, on Zulip):

sure

matklad (Sep 13 2019 at 15:09, on Zulip):

My thoughts on extracting the parser is that there are two possible approaches:

matklad (Sep 13 2019 at 15:10, on Zulip):

Swift at the moment is transitioning to the CST based on the first idea

matklad (Sep 13 2019 at 15:10, on Zulip):

Basically, the parser builds two trees at once.

eddyb (Sep 13 2019 at 15:10, on Zulip):

hmm would syn be considered a CST?

eddyb (Sep 13 2019 at 15:11, on Zulip):

or is it more like a parse tree

matklad (Sep 13 2019 at 15:11, on Zulip):

Yeah, I should have said parse treee

centril (Sep 13 2019 at 15:11, on Zulip):

syn seems like a CST

matklad (Sep 13 2019 at 15:11, on Zulip):

(with comments and whitespace)

eddyb (Sep 13 2019 at 15:11, on Zulip):

syn tries to keep all the tokens AFAIK

eddyb (Sep 13 2019 at 15:11, on Zulip):

(that it gets from TokenStream)

centril (Sep 13 2019 at 15:11, on Zulip):

fair... seems like something in-between then

matklad (Sep 13 2019 at 15:12, on Zulip):

I think that's the definition of CST according to megamodel

matklad (Sep 13 2019 at 15:12, on Zulip):

and if you add ws and comments, you'll get a parse tree. Riht?

centril (Sep 13 2019 at 15:12, on Zulip):

parse tree and CST seem synonymous according to wiki

centril (Sep 13 2019 at 15:12, on Zulip):

A parse tree or parsing tree[1] or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

matklad (Sep 13 2019 at 15:13, on Zulip):

but not accodring to http://grammarware.github.io/parsing/

matklad (Sep 13 2019 at 15:14, on Zulip):

anyway, building both ast and parse tree seems nice

matklad (Sep 13 2019 at 15:14, on Zulip):

but it makes the parser dependent on the AST, which will make it harder to use it in rust-analyzer, which totally doesn't want rustc's AST

centril (Sep 13 2019 at 15:15, on Zulip):

@matklad can you tl;dr the reasons why rustc's AST is bad for RA?

matklad (Sep 13 2019 at 15:15, on Zulip):

An alternative is to make the parser just traverse a parse tree, without actually building it. Than rustc can produce AST out of it, and rust-analyzer will build a parsee tree

matklad (Sep 13 2019 at 15:16, on Zulip):

it has types, it has spans (more generally, identity), it doesn't have whiespace, it doesn't have comments, it is mutable

centril (Sep 13 2019 at 15:17, on Zulip):

@matklad sounds like you want another IR before ast:: then

matklad (Sep 13 2019 at 15:17, on Zulip):

has types -- different nodes are represented by different Rust types

eddyb (Sep 13 2019 at 15:17, on Zulip):

(we want to move away from mutability, at least I do :P)

matklad (Sep 13 2019 at 15:17, on Zulip):

Yeah, exactly, I want parse tree

centril (Sep 13 2019 at 15:18, on Zulip):

This sounds like tagless-final; using a trait for "callback"

matklad (Sep 13 2019 at 15:18, on Zulip):

Another big thing for me is indentity: I want a + b to always be just a + b, and not a + b in main.rs:42:10

matklad (Sep 13 2019 at 15:18, on Zulip):

How to marry that with hygiene and macro expansion is an open question though...

centril (Sep 13 2019 at 15:18, on Zulip):

@matklad i.e. you don't want spans

matklad (Sep 13 2019 at 15:19, on Zulip):

exactly

eddyb (Sep 13 2019 at 15:19, on Zulip):

hmm

eddyb (Sep 13 2019 at 15:20, on Zulip):

what if AST nodes were interned, with a single attached quantity

eddyb (Sep 13 2019 at 15:20, on Zulip):

a "size"

eddyb (Sep 13 2019 at 15:20, on Zulip):

which you could use to attach data to AST nodes in flat arrays

eddyb (Sep 13 2019 at 15:20, on Zulip):

wait no this is the "relative node ID" thing

eddyb (Sep 13 2019 at 15:20, on Zulip):

I just thought about it in a weird way

oli (Sep 13 2019 at 15:20, on Zulip):

:D

eddyb (Sep 13 2019 at 15:20, on Zulip):

(oops)

centril (Sep 13 2019 at 15:20, on Zulip):

If we want a zero cost solution then another IR in-between TokenStream and ast:: may be "expensive" for rustc in terms of compile-perf...

A tagless-final approach might be better

eddyb (Sep 13 2019 at 15:20, on Zulip):

nah the current AST is not good by any stretch of the imagination

eddyb (Sep 13 2019 at 15:21, on Zulip):

I'd rather replace it with something like this

eddyb (Sep 13 2019 at 15:21, on Zulip):

how did I forget about the relative thing, now I need to figure out how to use it more...

centril (Sep 13 2019 at 15:21, on Zulip):

nah the current AST is not good by any stretch of the imagination

Sure, but an added pass transform might make not-good worse?

eddyb (Sep 13 2019 at 15:21, on Zulip):

you'd not have a pass

eddyb (Sep 13 2019 at 15:22, on Zulip):

you'd use the superior representation directly :P

matklad (Sep 13 2019 at 15:22, on Zulip):

Yeah, I think we just don't need AST ideally. We can run nameres/macro expansion on the parse tree, and then lower the bodies into a proper simple IR (HIR or direclty MIR)

eddyb (Sep 13 2019 at 15:23, on Zulip):

(basically anything that is tree-shaped can have data attached to it in a similar tree shape or in a flat manner with just a single "total number of subnodes" number per node)

eddyb (Sep 13 2019 at 15:23, on Zulip):

I want to implement this so badly right now

matklad (Sep 13 2019 at 15:24, on Zulip):

@eddyb yeah, that's roughly the plan for rust-analyzer. We currently use in-file offsets and not sizes, but that's not a super big difference

centril (Sep 13 2019 at 15:24, on Zulip):

We can run nameres/macro expansion on the parse tree, and then lower the bodies into a proper simple IR (HIR or direclty MIR)

Sounds like it would complicate spec efforts tho? nameres on a more "untyped" structure sounds harder to specify... Also, idk how you get directly to MIR lol...

matklad (Sep 13 2019 at 15:25, on Zulip):

You can have typed views over untyped datastructure

centril (Sep 13 2019 at 15:28, on Zulip):

@matklad sounds like "typing it" on-demand?

matklad (Sep 13 2019 at 15:30, on Zulip):

I mean "do what Swift libsyntax does"

matklad (Sep 13 2019 at 15:31, on Zulip):

A nice benefit here is that the green layer of Swift's syntax is so simple, that I feel it could be safely put into a shared interface

matklad (Sep 13 2019 at 15:31, on Zulip):

Hm, :thinking:

centril (Sep 13 2019 at 15:32, on Zulip):

I mean "do what Swift libsyntax does"

Guess I have to look more at swift then =)

matklad (Sep 13 2019 at 15:34, on Zulip):

Yeah, I think if the parser builds a dumb green tree, that would be a rather simple an uncontroversial way forwad.

That is, at the first stage, parer produces a green tree, and then we build a currrent AST over it (this increases alloc pressure somewhat, but I believe we can temporary stomach it just fine)

At the next stage, we slowly replace processing of the AST with processing of the green tree (my understanding is that immutable green tree is basically what @eddyb wants)

At the next stage, if we find a need for this, we implement a red layer for convenience API on top

eddyb (Sep 13 2019 at 15:35, on Zulip):

We currently use in-file offsets and not sizes, but that's not a super big difference

I mean "node count" not "source length"

matklad (Sep 13 2019 at 15:35, on Zulip):

https://github.com/apple/swift/tree/master/lib/Syntax <- I consider this a required reading for discussing IDE syntax trees

eddyb (Sep 13 2019 at 15:35, on Zulip):

this allows attaching e.g. spans orthogonally

eddyb (Sep 13 2019 at 15:36, on Zulip):

and interning the AST!

eddyb (Sep 13 2019 at 15:36, on Zulip):

I wonder if you could have layers of interning :P

matklad (Sep 13 2019 at 15:36, on Zulip):

@eddyb source lenght (as opposed to offsets) allows interning as well. I actually do that in rowan

eddyb (Sep 13 2019 at 15:37, on Zulip):

so a + b twice would be interned at both levels, but a + b vs a+b would only share one interning level

matklad (Sep 13 2019 at 15:37, on Zulip):

https://github.com/rust-analyzer/rowan/blob/a00ccb60ea99eccbc7f24d31ee83e925e0d8258d/src/green.rs#L151-L156

matklad (Sep 13 2019 at 15:37, on Zulip):

^ interning

eddyb (Sep 13 2019 at 15:37, on Zulip):

yeah I know but code being written exactly the same twice is not necessarily as likely

matklad (Sep 13 2019 at 15:40, on Zulip):

That's true, but I am not sure we need more layers between "parse tree" and "name resolves AST"

Last update: Nov 22 2019 at 04:45UTC