Stream: t-compiler

Topic: frontend library-ification


centril (Sep 13 2019 at 15:04, on Zulip):

@matklad mainly I think if we do libast/libparser/liblowering refactorings now, then it's way nicer to hack on diagnostics without having to recompile things only depending on AST but not parser

matklad (Sep 13 2019 at 15:08, on Zulip):

Seems fine for me to do!

matklad (Sep 13 2019 at 15:08, on Zulip):

It's just that it won't be directly beneficial for rust-analyzer, as I don't think it can use rustc AST

centril (Sep 13 2019 at 15:09, on Zulip):

sure

matklad (Sep 13 2019 at 15:09, on Zulip):

My thoughts on extracting the parser is that there are two possible approaches:

matklad (Sep 13 2019 at 15:10, on Zulip):

Swift at the moment is transitioning to the CST based on the first idea

matklad (Sep 13 2019 at 15:10, on Zulip):

Basically, the parser builds two trees at once.

eddyb (Sep 13 2019 at 15:10, on Zulip):

hmm would syn be considered a CST?

eddyb (Sep 13 2019 at 15:11, on Zulip):

or is it more like a parse tree

matklad (Sep 13 2019 at 15:11, on Zulip):

Yeah, I should have said parse treee

centril (Sep 13 2019 at 15:11, on Zulip):

syn seems like a CST

matklad (Sep 13 2019 at 15:11, on Zulip):

(with comments and whitespace)

eddyb (Sep 13 2019 at 15:11, on Zulip):

syn tries to keep all the tokens AFAIK

eddyb (Sep 13 2019 at 15:11, on Zulip):

(that it gets from TokenStream)

centril (Sep 13 2019 at 15:11, on Zulip):

fair... seems like something in-between then

matklad (Sep 13 2019 at 15:12, on Zulip):

I think that's the definition of CST according to megamodel

matklad (Sep 13 2019 at 15:12, on Zulip):

and if you add ws and comments, you'll get a parse tree. Riht?

centril (Sep 13 2019 at 15:12, on Zulip):

parse tree and CST seem synonymous according to wiki

centril (Sep 13 2019 at 15:12, on Zulip):

A parse tree or parsing tree[1] or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term parse tree itself is used primarily in computational linguistics; in theoretical syntax, the term syntax tree is more common.

matklad (Sep 13 2019 at 15:13, on Zulip):

but not accodring to http://grammarware.github.io/parsing/

matklad (Sep 13 2019 at 15:14, on Zulip):

anyway, building both ast and parse tree seems nice

matklad (Sep 13 2019 at 15:14, on Zulip):

but it makes the parser dependent on the AST, which will make it harder to use it in rust-analyzer, which totally doesn't want rustc's AST

centril (Sep 13 2019 at 15:15, on Zulip):

@matklad can you tl;dr the reasons why rustc's AST is bad for RA?

matklad (Sep 13 2019 at 15:15, on Zulip):

An alternative is to make the parser just traverse a parse tree, without actually building it. Than rustc can produce AST out of it, and rust-analyzer will build a parsee tree

matklad (Sep 13 2019 at 15:16, on Zulip):

it has types, it has spans (more generally, identity), it doesn't have whiespace, it doesn't have comments, it is mutable

centril (Sep 13 2019 at 15:17, on Zulip):

@matklad sounds like you want another IR before ast:: then

matklad (Sep 13 2019 at 15:17, on Zulip):

has types -- different nodes are represented by different Rust types

eddyb (Sep 13 2019 at 15:17, on Zulip):

(we want to move away from mutability, at least I do :P)

matklad (Sep 13 2019 at 15:17, on Zulip):

Yeah, exactly, I want parse tree

centril (Sep 13 2019 at 15:18, on Zulip):

This sounds like tagless-final; using a trait for "callback"

matklad (Sep 13 2019 at 15:18, on Zulip):

Another big thing for me is indentity: I want a + b to always be just a + b, and not a + b in main.rs:42:10

matklad (Sep 13 2019 at 15:18, on Zulip):

How to marry that with hygiene and macro expansion is an open question though...

centril (Sep 13 2019 at 15:18, on Zulip):

@matklad i.e. you don't want spans

matklad (Sep 13 2019 at 15:19, on Zulip):

exactly

eddyb (Sep 13 2019 at 15:19, on Zulip):

hmm

eddyb (Sep 13 2019 at 15:20, on Zulip):

what if AST nodes were interned, with a single attached quantity

eddyb (Sep 13 2019 at 15:20, on Zulip):

a "size"

eddyb (Sep 13 2019 at 15:20, on Zulip):

which you could use to attach data to AST nodes in flat arrays

eddyb (Sep 13 2019 at 15:20, on Zulip):

wait no this is the "relative node ID" thing

eddyb (Sep 13 2019 at 15:20, on Zulip):

I just thought about it in a weird way

oli (Sep 13 2019 at 15:20, on Zulip):

:D

eddyb (Sep 13 2019 at 15:20, on Zulip):

(oops)

centril (Sep 13 2019 at 15:20, on Zulip):

If we want a zero cost solution then another IR in-between TokenStream and ast:: may be "expensive" for rustc in terms of compile-perf...

A tagless-final approach might be better

eddyb (Sep 13 2019 at 15:20, on Zulip):

nah the current AST is not good by any stretch of the imagination

eddyb (Sep 13 2019 at 15:21, on Zulip):

I'd rather replace it with something like this

eddyb (Sep 13 2019 at 15:21, on Zulip):

how did I forget about the relative thing, now I need to figure out how to use it more...

centril (Sep 13 2019 at 15:21, on Zulip):

nah the current AST is not good by any stretch of the imagination

Sure, but an added pass transform might make not-good worse?

eddyb (Sep 13 2019 at 15:21, on Zulip):

you'd not have a pass

eddyb (Sep 13 2019 at 15:22, on Zulip):

you'd use the superior representation directly :P

matklad (Sep 13 2019 at 15:22, on Zulip):

Yeah, I think we just don't need AST ideally. We can run nameres/macro expansion on the parse tree, and then lower the bodies into a proper simple IR (HIR or direclty MIR)

eddyb (Sep 13 2019 at 15:23, on Zulip):

(basically anything that is tree-shaped can have data attached to it in a similar tree shape or in a flat manner with just a single "total number of subnodes" number per node)

eddyb (Sep 13 2019 at 15:23, on Zulip):

I want to implement this so badly right now

matklad (Sep 13 2019 at 15:24, on Zulip):

@eddyb yeah, that's roughly the plan for rust-analyzer. We currently use in-file offsets and not sizes, but that's not a super big difference

centril (Sep 13 2019 at 15:24, on Zulip):

We can run nameres/macro expansion on the parse tree, and then lower the bodies into a proper simple IR (HIR or direclty MIR)

Sounds like it would complicate spec efforts tho? nameres on a more "untyped" structure sounds harder to specify... Also, idk how you get directly to MIR lol...

matklad (Sep 13 2019 at 15:25, on Zulip):

You can have typed views over untyped datastructure

centril (Sep 13 2019 at 15:28, on Zulip):

@matklad sounds like "typing it" on-demand?

matklad (Sep 13 2019 at 15:30, on Zulip):

I mean "do what Swift libsyntax does"

matklad (Sep 13 2019 at 15:31, on Zulip):

A nice benefit here is that the green layer of Swift's syntax is so simple, that I feel it could be safely put into a shared interface

matklad (Sep 13 2019 at 15:31, on Zulip):

Hm, :thinking:

centril (Sep 13 2019 at 15:32, on Zulip):

I mean "do what Swift libsyntax does"

Guess I have to look more at swift then =)

matklad (Sep 13 2019 at 15:34, on Zulip):

Yeah, I think if the parser builds a dumb green tree, that would be a rather simple an uncontroversial way forwad.

That is, at the first stage, parer produces a green tree, and then we build a currrent AST over it (this increases alloc pressure somewhat, but I believe we can temporary stomach it just fine)

At the next stage, we slowly replace processing of the AST with processing of the green tree (my understanding is that immutable green tree is basically what @eddyb wants)

At the next stage, if we find a need for this, we implement a red layer for convenience API on top

eddyb (Sep 13 2019 at 15:35, on Zulip):

We currently use in-file offsets and not sizes, but that's not a super big difference

I mean "node count" not "source length"

matklad (Sep 13 2019 at 15:35, on Zulip):

https://github.com/apple/swift/tree/master/lib/Syntax <- I consider this a required reading for discussing IDE syntax trees

eddyb (Sep 13 2019 at 15:35, on Zulip):

this allows attaching e.g. spans orthogonally

eddyb (Sep 13 2019 at 15:36, on Zulip):

and interning the AST!

eddyb (Sep 13 2019 at 15:36, on Zulip):

I wonder if you could have layers of interning :P

matklad (Sep 13 2019 at 15:36, on Zulip):

@eddyb source lenght (as opposed to offsets) allows interning as well. I actually do that in rowan

eddyb (Sep 13 2019 at 15:37, on Zulip):

so a + b twice would be interned at both levels, but a + b vs a+b would only share one interning level

matklad (Sep 13 2019 at 15:37, on Zulip):

https://github.com/rust-analyzer/rowan/blob/a00ccb60ea99eccbc7f24d31ee83e925e0d8258d/src/green.rs#L151-L156

matklad (Sep 13 2019 at 15:37, on Zulip):

^ interning

eddyb (Sep 13 2019 at 15:37, on Zulip):

yeah I know but code being written exactly the same twice is not necessarily as likely

matklad (Sep 13 2019 at 15:40, on Zulip):

That's true, but I am not sure we need more layers between "parse tree" and "name resolves AST"

matklad (Dec 13 2019 at 16:16, on Zulip):

@qmx in terms of reading material, I'd say that https://github.com/apple/swift/tree/master/lib/Syntax is still the best concentrated source of info. The libsyntax2 RFCs is also still mostly relevant

mark-i-m (Dec 13 2019 at 16:17, on Zulip):

@matklad @centril I landed here from the design meeting log... if either of you is interested in documenting the parser for the rustc-guide (after refactorings) that would be awesome!

matklad (Dec 13 2019 at 16:18, on Zulip):

There are no super good docs about how rust-analyzer syntax tree and parser works right now. I think it will be useful for me to write such a document, as I think we've almost reached the fixed point of the design

matklad (Dec 13 2019 at 16:19, on Zulip):

@Esteban Küber you were interested in changing the number of people with shared knowledge of rustc + ra parsers/syntax trees.

I think I can ramp up anybody pretty quickly regarging ra implementation. The only condition for success is that the person needs to implement some small feature for rust-analyzer, just to get the feeling of the real API

centril (Dec 13 2019 at 16:20, on Zulip):

@mark-i-m documenting librustc_parse would probably be something for me then since I've sorta become the expert on that crate now

centril (Dec 13 2019 at 16:21, on Zulip):

I'm not sure what granularity you're seeking

centril (Dec 13 2019 at 16:21, on Zulip):

The source code is getting fairly well documented

matklad (Dec 13 2019 at 16:21, on Zulip):

For existing reading material about rust-analyzer specifically, this API walkthrough tests is a good starting point: https://github.com/rust-analyzer/rust-analyzer/blob/ebc95af2b5b91239fc1d8a5fc8344ded6f6ef3e4/crates/ra_syntax/src/lib.rs#L193

centril (Dec 13 2019 at 16:23, on Zulip):

@matklad also, now that we've split librustc_parse into its own crate, things should probably get a bit easier

matklad (Dec 13 2019 at 16:24, on Zulip):

I think it will be useful for me to write such a document,

Hm, @Esteban Küber , what about this deal: you write this document, but I am obliged to answer all your questions, give required screen-sharing sessions, etc? :)

centril (Dec 13 2019 at 16:25, on Zulip):

maybe I should join in on that

matklad (Dec 13 2019 at 16:25, on Zulip):

(the goal here is not to save my work, but to forcefully dissipate knowledge)

qmx (Dec 13 2019 at 16:25, on Zulip):

I want in too

Esteban Küber (Dec 13 2019 at 16:25, on Zulip):

That would be interesting, I just don't know what my availability will be ^_^

centril (Dec 13 2019 at 16:26, on Zulip):

@matklad right; and it would be good if e.g. @Vadim Petrochenkov also were in on it

centril (Dec 13 2019 at 16:26, on Zulip):

maybe not the writing, but also understanding RA

matklad (Dec 13 2019 at 16:26, on Zulip):

Once we have the doc, anyone should be able to understand how syntax works

centril (Dec 13 2019 at 16:27, on Zulip):

but I may have questions also, which is why I want in ^^

matklad (Dec 13 2019 at 16:31, on Zulip):

I think it makes sense if there's a single core person, who is in charge of understanding everything and putting it down to paper. This can be me (as I've already did the first half), or somebody else. I estimate the amount of work here as 1 day.

@Esteban Küber , @qmx @centril who wants to make a commitment here? :)

Esteban Küber (Dec 13 2019 at 18:07, on Zulip):

I can try, but again, I won't be available until the upcoming year

matklad (Jan 09 2020 at 15:51, on Zulip):

I am starting with an old-school markdown document: https://hackmd.io/XoQrzR8GRLa64jpjylQ7Bw

matklad (Jan 09 2020 at 17:57, on Zulip):

I've run out of steam for today (and will be running to my German class), so, if you have time, it's a good time to comment on what's not clear so far (and point/correct uncountable number of typos :) )

qmx (Jan 09 2020 at 18:38, on Zulip):

cool, I'll do a review pass

qmx (Jan 09 2020 at 18:38, on Zulip):

thanks for the write-up, it's great even in draft form

matklad (Jan 10 2020 at 14:10, on Zulip):

Finished the second part about parsing.

matklad (Jan 10 2020 at 14:11, on Zulip):

I'll draft a concrete proposal for librari-ification, which I hope to dedicate a t-compiler design meeting to: https://hackmd.io/ifjST_Y4R-SQ0AWiEcU6FQ

qmx (Jan 10 2020 at 15:48, on Zulip):

I only fixed a typo on the first review pass

qmx (Jan 10 2020 at 15:48, on Zulip):

lemme read the second part

qmx (Jan 10 2020 at 16:19, on Zulip):

second part is good - it feels to me that the Parser section was a little bit more abstract than the rest - still reads good

qmx (Jan 10 2020 at 16:25, on Zulip):

I'll try to read the code around Sink and Source to see if things get a little bit clearer to me (might be my lack of exposure to the rust-analyzer code)

matklad (Jan 10 2020 at 19:01, on Zulip):

Finished proposal and submitted a meeting proposal

Last update: Jan 24 2020 at 02:05UTC