Stream: t-compiler

Topic: design meeting 2020.02.07


nikomatsakis (Feb 07 2020 at 13:31, on Zulip):

Hey @T-compiler/meeting -- reminder that we have a design meeting today in 1.5 hours. We'll be discussing compiler-team#237 ("parser librar-yification"). From the issue:

The goal of the meeting is twofold:

In particular, the hope is that we can discuss some of the complications that would arise and how they might be handled.

matklad (Feb 07 2020 at 15:01, on Zulip):

Hello @T-compiler/meeting , shall we start? Please say :wave: if you are here :-)

matklad (Feb 07 2020 at 15:02, on Zulip):

also, cc @WG-rls2.0

matklad (Feb 07 2020 at 15:04, on Zulip):

Let's maybe start with discussing what we have in rust-analyzer?

Did people have a chance to read the write up?

Are there any specific questions about that?

nikomatsakis (Feb 07 2020 at 15:06, on Zulip):

(Hey, sorry, was running a few minute late, thanks for picking up the ball @matklad =)

nikomatsakis (Feb 07 2020 at 15:06, on Zulip):

I read the write-up; I have one or two questions

centril (Feb 07 2020 at 15:06, on Zulip):

Some thoughts while reading the write-ups (adding more over time...):

nikomatsakis (Feb 07 2020 at 15:06, on Zulip):

First off, which parts of rust-analyzer work directly from the underlying, untagged "green" syntax trees?

centril (Feb 07 2020 at 15:07, on Zulip):

To deal with precedence in cases like $expr * 1, we use special invisible parenthesis, which are explicitelly handled by the parser

I believe that's the case today.

centril (Feb 07 2020 at 15:08, on Zulip):

Parsing is resilient (even if the input is invalid, parser tries to see as much syntax tree fragments in the input as it can).

It seems like the write-up is claiming that the current parser doesn't do this, but I don't think that's the case. The parser does a whole lot of recovery.

matklad (Feb 07 2020 at 15:08, on Zulip):

which parts of rust-analyzer work directly from the underlying, untagged "green" syntax trees?

So, green and untagged are not the same

matklad (Feb 07 2020 at 15:08, on Zulip):

green trees are untagged

matklad (Feb 07 2020 at 15:09, on Zulip):

however, there's also SyntaxNode layer above green trees, whcih is also untagged (but, unlike G, has parent pointers)

matklad (Feb 07 2020 at 15:10, on Zulip):

It's more or less true that nothing in rust-analyzer works directly with green trees (the only thing that touches green trees is the code that builds syntax tree in the first place)

matklad (Feb 07 2020 at 15:10, on Zulip):

Some code in rust-analyzer works with untagged representation with parent pointers. That is mostly the code close to the IDE

matklad (Feb 07 2020 at 15:11, on Zulip):

For example, "flip two things around the ," assist is implemented in terms of untagged SyntaxNodes

nikomatsakis (Feb 07 2020 at 15:11, on Zulip):

Yes, OK, that makes sense.

matklad (Feb 07 2020 at 15:11, on Zulip):

As well as a pretty important layer which takes a raw offset into a file and figures out what semantic element exists at that position

matklad (Feb 07 2020 at 15:12, on Zulip):

That code climbs untagged tree up from the cursor position to find an item, and than finds a semantic repr for that item

centril (Feb 07 2020 at 15:12, on Zulip):

Other notes:

If possible, errors are not reported during parsing and are postponed for a separate validation step. For example, parser accepts visibility modifiers on trait methods, but then a separate tree traversal flags all such visibilites as erroneous.

matklad (Feb 07 2020 at 15:14, on Zulip):

does it run before or after expansion?

Before expansion. From the point of view of highler-level layers, it is indivisible from the parse phase. That is, it is an impl detail of parsing

centril (Feb 07 2020 at 15:14, on Zulip):

Regarding:

pub trait TokenSource {
    fn current(&self) -> Token;
    fn lookahead_nth(&self, n: usize) -> Token;
    fn is_keyword(&self, kw: &str) -> bool;

    fn bump(&mut self);
}
matklad (Feb 07 2020 at 15:14, on Zulip):

How would we deal with that in the new model?

I think this also belongs to the validation phase

nikomatsakis (Feb 07 2020 at 15:15, on Zulip):

I'm feeling a bit lost, I have to admit, but I do think it's good to turn towards the parser integration

nikomatsakis (Feb 07 2020 at 15:16, on Zulip):

/me pulls up the document

nikomatsakis (Feb 07 2020 at 15:17, on Zulip):

I guess I'd love it if someone (@centril?) wanted to try and compare/contrast the "current" and "new" model

matklad (Feb 07 2020 at 15:17, on Zulip):

I assume we can (and will) encode higher level operations on this such as e.g. eat_keyword(..) and whatnot.

Correct, but not as a part of this interface. Rather, that would be utility methods on the parser.

can we replace kw: &str with kw: Symbol?

I'd rather not to: if we do, we'd have to make the set of keywords a part of the interface. But maybe I am not exactly understanding what is Symbol

centril (Feb 07 2020 at 15:18, on Zulip):

@matklad it's an interned string representation with some things declared upfront, https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html

nikomatsakis (Feb 07 2020 at 15:18, on Zulip):

(I also think details about the specifics of the API seem a bit "lower level", unless they map to some higher-level distinctions -- e.g., I could see that &str vs Symbol represents something important)

centril (Feb 07 2020 at 15:18, on Zulip):

Correct, but not as a part of this interface. Rather, that would be utility methods on the parser.

That's all good.

matklad (Feb 07 2020 at 15:19, on Zulip):

Agree that we should probably aim at teasing apart the high-level constraints here.

In particular, one constraint I have in mind is that the parser should not care about interning

nikomatsakis (Feb 07 2020 at 15:19, on Zulip):

Maybe let's start with how existing parser and expansion intertwine? e.g., @centril noted that the precise set of things parser accepts "pre-expansion" is part of the language definition, maybe we can sketch that out a bit more

centril (Feb 07 2020 at 15:19, on Zulip):

I would be happy with e.g. const Const: &str = "const" instead

centril (Feb 07 2020 at 15:20, on Zulip):

@nikomatsakis so @Vadim Petrochenkov is the resident expansion expert

nikomatsakis (Feb 07 2020 at 15:20, on Zulip):

I think if we had a kind of "idealized" model of how parser-expansion etc are interacting, it might be clearer how to express matklad's proposal as a diff on that

centril (Feb 07 2020 at 15:20, on Zulip):

(sorry, didn't have time to do a write-up to compare, had the fever yesterday & today)

nikomatsakis (Feb 07 2020 at 15:20, on Zulip):

No doubt, but they're not present, still I don't think we need the details of name resolution at the moment

nikomatsakis (Feb 07 2020 at 15:21, on Zulip):

I could try to skech a view and you can correct it :)

centril (Feb 07 2020 at 15:21, on Zulip):

So to give a quick overview of pre-expansion gating...

matklad (Feb 07 2020 at 15:21, on Zulip):

from what I undestand, the biggest diff would be that the current model very much builds on top of specific AST

nikomatsakis (Feb 07 2020 at 15:21, on Zulip):

(I am taking notes in this hackmd, fyi)

centril (Feb 07 2020 at 15:22, on Zulip):

The parser collects a set of spans into a sink: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.GatedSpans.html

centril (Feb 07 2020 at 15:22, on Zulip):

mainly using https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.GatedSpans.html#method.gate

nikomatsakis (Feb 07 2020 at 15:22, on Zulip):

Let's back up one more step :)

nikomatsakis (Feb 07 2020 at 15:23, on Zulip):

The ida here is that we introduce feature-gated syntax

centril (Feb 07 2020 at 15:23, on Zulip):

when dealing with macros, which call into the black-box parsers, the sink is snapshotted, emptied, and then on a successful macro arm the gated spans are merged

nikomatsakis (Feb 07 2020 at 15:23, on Zulip):

e.g., async fn (at one point)

nikomatsakis (Feb 07 2020 at 15:23, on Zulip):

and the parser accepts it, but records the span where it occurred

centril (Feb 07 2020 at 15:23, on Zulip):

or take | or-patterns today

nikomatsakis (Feb 07 2020 at 15:23, on Zulip):

then -- after parsing -- we check the set of feature gates to make sure that the syntax was enabled

nikomatsakis (Feb 07 2020 at 15:24, on Zulip):

we do this in part because macro-rules has things like $e:expr -- and the idea is that $e if matches this syntax without the feature gate, but we then remove that syntactic form, then we have changed the behavior of stable programs

nikomatsakis (Feb 07 2020 at 15:24, on Zulip):

(correct?)

centril (Feb 07 2020 at 15:24, on Zulip):

yep

centril (Feb 07 2020 at 15:24, on Zulip):

(here's the or-pattern example, https://doc.rust-lang.org/nightly/nightly-rustc/src/rustc_parse/parser/pat.rs.html#120 )

centril (Feb 07 2020 at 15:25, on Zulip):

note that fragments feature gated may not be manifest in the AST itself

nikomatsakis (Feb 07 2020 at 15:25, on Zulip):

One implication -- and this is somewhat contentious -- of the current setup is that you can only use cfg to include "unstable" syntax by embedding that syntax into a separate file

matklad (Feb 07 2020 at 15:25, on Zulip):

Is collection of spans more powerful than building a concrete syntax tree, and then sanitizing the tree?

nikomatsakis (Feb 07 2020 at 15:25, on Zulip):

e.g., #[cfg(false)] mod foo; works no matter that foo.rs contains, because it never parses foo.rs -- but maybe it's worth talking about the flow of how that comes about?

nikomatsakis (Feb 07 2020 at 15:25, on Zulip):

it implies that the parsing of files is entwined with cfg expansion

centril (Feb 07 2020 at 15:26, on Zulip):

For example let | p = 0; gates | despite not being in the AST

centril (Feb 07 2020 at 15:26, on Zulip):

@nikomatsakis that's https://github.com/rust-lang/rust/issues/64197, a bug that should be fixed in either case

nikomatsakis (Feb 07 2020 at 15:27, on Zulip):

Is collection of spans more powerful than building a concrete syntax tree, and then sanitizing the tree?

this is a good question, @matklad, I don't think so

centril (Feb 07 2020 at 15:27, on Zulip):

it is if the CST does not have the thing gated manifest in the CST

nikomatsakis (Feb 07 2020 at 15:27, on Zulip):

So, @centril, the point of #64197 is that presently "cfg expansion" is duplicated between parser and expansion?

centril (Feb 07 2020 at 15:28, on Zulip):

@nikomatsakis what happens is that the parser drives conditional compilation when it comes to modules

pnkfelix (Feb 07 2020 at 15:28, on Zulip):

it is if the CST does not have the thing gated manifest in the CST

where CST is "concrete syntax tree" ? But shouldn't that by definition ...

centril (Feb 07 2020 at 15:28, on Zulip):

and config.rs lives in the parser

nikomatsakis (Feb 07 2020 at 15:28, on Zulip):

i.e., the parser has a loop that is kind of like

pnkfelix (Feb 07 2020 at 15:28, on Zulip):

(did you write CST in one instance where you meant to write AST?)

centril (Feb 07 2020 at 15:29, on Zulip):

@pnkfelix no; CST is somewhat ambiguous

centril (Feb 07 2020 at 15:29, on Zulip):

("the real AST is HIR")

pnkfelix (Feb 07 2020 at 15:29, on Zulip):

or are you referring indeed to cases like mod bar; where the concrete syntax itself does not have the contents of bar.rs ?

centril (Feb 07 2020 at 15:29, on Zulip):

@pnkfelix I'm referring to whether the leading | in let | p = 0; is in the tree or not

centril (Feb 07 2020 at 15:30, on Zulip):

it's not in the current "AST"

nikomatsakis (Feb 07 2020 at 15:30, on Zulip):

before we go too deep, can we confirm that my summary here is correct?

nikomatsakis (Feb 07 2020 at 15:30, on Zulip):

(after that, I think it'd clearly be good to define our terms like CST/AST/HIR and make sure we're all using them to refer to the same things)

pnkfelix (Feb 07 2020 at 15:30, on Zulip):

Okay. I am under impression that in matklad's proposal, it might be included (if only as part of "trivia" ?)

centril (Feb 07 2020 at 15:30, on Zulip):

@nikomatsakis high level idea is correct, but I think you're getting into the weeds

centril (Feb 07 2020 at 15:31, on Zulip):

I'm sure both old and new parser can deal with it

matklad (Feb 07 2020 at 15:31, on Zulip):

Correct, I specifically use CST as a tree that, when printed, is guaranteed to yield exactly the source text. So which contains all tokens, and all trivia

centril (Feb 07 2020 at 15:31, on Zulip):

More specifically, the parser would do less, and leave things to conditional compilation later

nikomatsakis (Feb 07 2020 at 15:32, on Zulip):

The model we hope to move towards is one where that loop moves to the expander

nikomatsakis (Feb 07 2020 at 15:32, on Zulip):

and the parser itself is basically just a function that parses a single file

nikomatsakis (Feb 07 2020 at 15:32, on Zulip):

(Confirm/deny?)

centril (Feb 07 2020 at 15:33, on Zulip):

confirm

matklad (Feb 07 2020 at 15:33, on Zulip):

Confirm

nikomatsakis (Feb 07 2020 at 15:33, on Zulip):

OK :+1:

centril (Feb 07 2020 at 15:33, on Zulip):

@nikomatsakis I don't think there's any disagreement here btw; it seems everyone agrees that this is a bug that should be fixed

centril (Feb 07 2020 at 15:33, on Zulip):

only engineering hours has prevented fixing it

nikomatsakis (Feb 07 2020 at 15:33, on Zulip):

The reason I'm kind of going through this is

nikomatsakis (Feb 07 2020 at 15:34, on Zulip):

I want to kind of create a "high level picture" of how this process works, and clearly identify which bits are the responsibility of which

nikomatsakis (Feb 07 2020 at 15:34, on Zulip):

Anyway, that's as deep as I want to go on that topic, but I wanted to try and add now the pre-expansion gating in this picture a bit ...

centril (Feb 07 2020 at 15:35, on Zulip):

@matklad but assuming we store details like leading | then I agree in principle there's no problem with preexp gating

nikomatsakis (Feb 07 2020 at 15:37, on Zulip):

I have some more questions about pre-expansion gating

nikomatsakis (Feb 07 2020 at 15:37, on Zulip):

It seems relevant, but I'm also happy to take some notes and pursue them later

matklad (Feb 07 2020 at 15:37, on Zulip):

I guess, on the highest-possible level the current state is (module mod foo; bug fixed) is this:

nikomatsakis (Feb 07 2020 at 15:37, on Zulip):

(One thing I'm curious to dig a bit more into is recovery and prediction, for example)

matklad (Feb 07 2020 at 15:38, on Zulip):

(One thing I'm curious to dig a bit more into is recovery and prediction, for example)

I acutaly think that this is a well understood and boring bit. Like, you just write that code, it works.

nikomatsakis (Feb 07 2020 at 15:38, on Zulip):

I was going to write some bullets very much like that @matklad; I think what you're proposing could be seen as splitting that "parser: tokens -> ast" interace so that there are more intermediate steps?

matklad (Feb 07 2020 at 15:38, on Zulip):

Oh, well, depending on which part of the recovery you are talking about

matklad (Feb 07 2020 at 15:38, on Zulip):

tweaking parser to recover is easy

centril (Feb 07 2020 at 15:38, on Zulip):

My understanding is that we move to something more like syn, except for spans, we don't use spans

centril (Feb 07 2020 at 15:39, on Zulip):

tweaking parser to recover is easy

And to be clear, I think the current parser does a shit tonne of this

matklad (Feb 07 2020 at 15:39, on Zulip):

tweaking syntax trees to incorporate incomplete syntax is hard, and is, in some sence, the core of teh proposal

matklad (Feb 07 2020 at 15:40, on Zulip):

And to be clear, I think the current parser does a shit tonne of this

I haven't looked into this recently, but the last time I did (around original libsytnax2 RFC) the recovery wasn't quite the one needed for IDEs

centril (Feb 07 2020 at 15:40, on Zulip):

@matklad speaking of recovery; I think it's important that we be able to retain e.g. higher level combinators like https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_delim_comma_seq

matklad (Feb 07 2020 at 15:40, on Zulip):

But I do belive that these are detains

centril (Feb 07 2020 at 15:41, on Zulip):

@matklad but could you briefly elaborate on what sort of recovery you think was missing?

centril (Feb 07 2020 at 15:41, on Zulip):

(I think most of the current parser code is actually recovery...)

nikomatsakis (Feb 07 2020 at 15:42, on Zulip):

can we come back to this core parser interface? it seems like the details kind of matter.

@matklad said earlier (I believe) that the current model for the parser is that it takes as input:

and produces as output

I think that the question is whether we can refactor the parser to have two phases:

Is this correct?

EDIT: To elaborate, part of why I brought up recovery, is I wanted to understand whether recovery would like in parse-to-events or events-to-AST, I think probably the former...

nikomatsakis (Feb 07 2020 at 15:43, on Zulip):

(Sorry, I was typing that while @centril was asking their question, feel free to answer that first :)

matklad (Feb 07 2020 at 15:43, on Zulip):

@centril here's example where rustc doesn't even produce an ast: https://gist.github.com/matklad/5725e97192363c973c985e3d765d7fbf

matklad (Feb 07 2020 at 15:43, on Zulip):

(not 100% sure about that, but -Z ast-json should show AST if rustc got to that phase?)

matklad (Feb 07 2020 at 15:44, on Zulip):

rust-analyzer produces the tree, and recognizes the main function

matklad (Feb 07 2020 at 15:44, on Zulip):

the example is

use

fn main() {}
matklad (Feb 07 2020 at 15:44, on Zulip):

And that is exactly the code the IDE sees when a user adds a new use

centril (Feb 07 2020 at 15:44, on Zulip):

@matklad correct, yes; the current parser would return Err(...) and fail unless there's more elaborate recovery

centril (Feb 07 2020 at 15:45, on Zulip):

what the current parser does a lot of is semantic recovery

centril (Feb 07 2020 at 15:45, on Zulip):

e.g. auto x = 0; we will recognize

matklad (Feb 07 2020 at 15:45, on Zulip):

Yeah, and this is not the kind of recovery that is needed for ide use case. I suggest tabling recovery disucssion at this :)

centril (Feb 07 2020 at 15:45, on Zulip):

sure

matklad (Feb 07 2020 at 15:46, on Zulip):

To elaborate, part of why I brought up recovery, is I wanted to understand whether recovery would like in parse-to-events or events-to-AST, I think probably the former...

I think ide recovery belongs to the former, and semantic recovery kind of belongs to both

nikomatsakis (Feb 07 2020 at 15:46, on Zulip):

Does that imply my summary of the split is correct?

matklad (Feb 07 2020 at 15:46, on Zulip):

Like, semantic recovery would insert explicit "missing expression" nodes into places where there isn't any code at all

matklad (Feb 07 2020 at 15:47, on Zulip):

Yes, I belive the summary is correct

nikomatsakis (Feb 07 2020 at 15:47, on Zulip):

(Although I think I remember us discussing, specifically, whether we really want to have a "generate events" step, or whether we want a parser that can produce both AST and events, depdnding on configuration)

centril (Feb 07 2020 at 15:47, on Zulip):

The main (only?) sticking point for me is:

The tree is untyped. Each node has a “type tag”, SyntaxKind.

Syntax trees are simple value type. It is possible to create trees for a syntax without any external context.

Making the parser untyped comes with all of the problems that untypedness entails. It's a great property of the current parser that I can simply make some change to the AST and then the compiler will yell at me regarding all the places that I need to change... and ofc typedness prevents bugs... So I'm not a fan of:

// test box_pat
// fn main() {
//     let box i = ();
//     let box Outer { box i, j: box Inner(box &x) } = ();
//     let box ref mut i = ();
// }
fn box_pat(p: &mut Parser) -> CompletedMarker {
    assert!(p.at(T![box]));
    let m = p.start();
    p.bump(T![box]);
    pattern(p);
    m.complete(p, BOX_PAT)
}
matklad (Feb 07 2020 at 15:47, on Zulip):

i want to add though, that I think an important part of "neutral"ity is being untagged

centril (Feb 07 2020 at 15:48, on Zulip):

I would much prefer to e.g. have an interface based on associated types, which RA can dispatch to untyped things whereas rustc dispatches to typed things

centril (Feb 07 2020 at 15:48, on Zulip):

e.g. type Pat;

centril (Feb 07 2020 at 15:48, on Zulip):

and type Expr;

matklad (Feb 07 2020 at 15:48, on Zulip):

Yeah, this is the core question!

pnkfelix (Feb 07 2020 at 15:49, on Zulip):

so what are the main benefits of being untagged?

matklad (Feb 07 2020 at 15:49, on Zulip):

I agree that typedness has benefits, but I also think that this is a tradeoff

nikomatsakis (Feb 07 2020 at 15:49, on Zulip):

@centril can I make sure I understand your point? Are you saying that you like having the parser produce a "typed AST" (i.e., a set of enums that are tied very closely to rust's syntax) because it helps you ensure that the parser is "in sync". i.e., if I edit the AST to add a new field to some kind of expression, then I will get errors until I adjust the parser to supply values for that field?

pnkfelix (Feb 07 2020 at 15:49, on Zulip):

so what are the main benefits of being untagged?

(or of supporting untaggedness)

centril (Feb 07 2020 at 15:50, on Zulip):

@nikomatsakis enums are what we have today

matklad (Feb 07 2020 at 15:50, on Zulip):

@pnkfelix in terms of untagged trees, they are much better at representing incomplete trees and triva nodes

centril (Feb 07 2020 at 15:50, on Zulip):

what I'm suggesting is that we move from an initial (tagged) encoding with enums to a final tagless encoding based on associated types

matklad (Feb 07 2020 at 15:50, on Zulip):

Typed trees by definition exclude certain impossible states, but with broken code any state is possible

pnkfelix (Feb 07 2020 at 15:51, on Zulip):

I can believe this; i.e. I can believe they handle the use fn main() { } case better.

matklad (Feb 07 2020 at 15:51, on Zulip):

In terms of untagged parser, it massively simplies the interface.

nikomatsakis (Feb 07 2020 at 15:51, on Zulip):

@centril you didn't really answer my question though, did I understand you correctly as to the benefits?

nikomatsakis (Feb 07 2020 at 15:51, on Zulip):

(I believe similar benefits could be obtained in other ways, as I think you are now suggesting)

matklad (Feb 07 2020 at 15:51, on Zulip):

Like, its a "trait with handful of methods" vs "trait with hundreds of associated types"

centril (Feb 07 2020 at 15:51, on Zulip):

@nikomatsakis yes, but also typedness generally prevents bugs. I cannot put something of the wrong type where it doesn't belong

nikomatsakis (Feb 07 2020 at 15:52, on Zulip):

It seems to me that there may be some interaction with pre-expansion gating, as well. In particular, if we have untyped trees, we might be able to represent things like "a function whose body is not yet parsed"

pnkfelix (Feb 07 2020 at 15:52, on Zulip):

In terms of untagged parser, it massively simplies the interface.

this is a distinct feature of untagged-ness, and unless I misunderstand, it is at direct odds with @centril's desire for static checking, right?

nikomatsakis (Feb 07 2020 at 15:52, on Zulip):

(and which is ultimately stripped before it becomes necessary to parse it)

centril (Feb 07 2020 at 15:52, on Zulip):

@matklad I think you are overestimating how many types there are in ast.rs

matklad (Feb 07 2020 at 15:53, on Zulip):

this is a distinct feature of untagged-ness, and unless I misunderstand, it is at direct odds with @centril's desire for static checking, right?

Right

nikomatsakis (Feb 07 2020 at 15:53, on Zulip):

(time check: 52 minutes; velocity check: I'm feeling a bit overwhelmed by how many threads of discussion are happening at once, apologies for adding to them :)

nikomatsakis (Feb 07 2020 at 15:53, on Zulip):

/me will try to capture some of what's been said in last few minutes into doc

matklad (Feb 07 2020 at 15:53, on Zulip):

There are 246 syntax kinds in rust-analyzer

pnkfelix (Feb 07 2020 at 15:54, on Zulip):

can you link to that list?

matklad (Feb 07 2020 at 15:54, on Zulip):

rustc's ast is simpler, because it is an AST and not a CST, so it can elide details

centril (Feb 07 2020 at 15:54, on Zulip):

ugh... that's a whole heck of a lot more than in ast.rs

matklad (Feb 07 2020 at 15:54, on Zulip):

https://github.com/rust-analyzer/rust-analyzer/blob/1996762b1f2b9cb196cc879f0ce26d28a3c450c8/crates/ra_parser/src/syntax_kind/generated.rs#L12-L245

centril (Feb 07 2020 at 15:55, on Zulip):

specific to parser, the typedness also facilitates knowing what the actual grammar is

matklad (Feb 07 2020 at 15:55, on Zulip):

It also includes tokens, so I think we may save a 100 by not making type disticntions between tokens

matklad (Feb 07 2020 at 15:55, on Zulip):

but then we lose some type safety

centril (Feb 07 2020 at 15:55, on Zulip):

I think the tradeoff from a rustc centric POV clearly speaks in favor of being typed

pnkfelix (Feb 07 2020 at 15:56, on Zulip):

I can imagine there is a middle ground here, where we use associated types solely for the types already identified by ast.rs

centril (Feb 07 2020 at 15:56, on Zulip):

For incomplete things we also have ExprKind::Err btw

matklad (Feb 07 2020 at 15:56, on Zulip):

I also don't think that typing parsing code is that important

matklad (Feb 07 2020 at 15:57, on Zulip):

What is important, is having a fuzzable formal grammar, which you can check against the parser

matklad (Feb 07 2020 at 15:57, on Zulip):

Like, parser bugs are extremely easy and cheap to fix

centril (Feb 07 2020 at 15:57, on Zulip):

I don't agree, especially not when we have stability to contend with

pnkfelix (Feb 07 2020 at 15:58, on Zulip):

well ... I don't know about that

pnkfelix (Feb 07 2020 at 15:58, on Zulip):

(about being easy and cheap to fix, that is. especially when it comes to how recovery is implemented)

centril (Feb 07 2020 at 15:58, on Zulip):

If RA accepts syntax it shouldn't, that doesn't have stability implications; for rustc it does

Esteban Küber (Feb 07 2020 at 15:58, on Zulip):

I guess the comment was meant for correct code, not recovered code

matklad (Feb 07 2020 at 15:59, on Zulip):

I don't think I had a lot of parser bugs in rust-anlayzer, which could have be prevented by typeing.

Like, the today's bug is that we mixed up the order of async and unsafe.

centril (Feb 07 2020 at 15:59, on Zulip):

What is important, is having a fuzzable formal grammar, which you can check against the parser

We also don't have this

matklad (Feb 07 2020 at 15:59, on Zulip):

My gut feeling is that most parser bugs are like that

nikomatsakis (Feb 07 2020 at 16:00, on Zulip):

this is hard to answer definitively but it does feel like there may be a structure that gives some of the benefits of both

nikomatsakis (Feb 07 2020 at 16:00, on Zulip):

that said, it's been one hour

pnkfelix (Feb 07 2020 at 16:00, on Zulip):

yeah I gotta go

pnkfelix (Feb 07 2020 at 16:00, on Zulip):

(despite being late in the first place)

centril (Feb 07 2020 at 16:00, on Zulip):

@matklad btw, did you consider the question re. https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_delim_comma_seq and whether we can do similar things in the new model?

centril (Feb 07 2020 at 16:00, on Zulip):

these higher order combinators were important for recent cleanups to the parser

centril (Feb 07 2020 at 16:01, on Zulip):

things were quite messy before, and I would like not to get back into that situation

matklad (Feb 07 2020 at 16:01, on Zulip):

And not that we don't have parser bugs in rustc: https://github.com/rust-lang/rust/pull/37278

matklad (Feb 07 2020 at 16:01, on Zulip):

@centril high order combinations are combatible with event-based untagged parser I belive

nikomatsakis (Feb 07 2020 at 16:01, on Zulip):

I've done my best to make notes here, I'd appreciate it if folks took a look and made changes/additions/corrections. I think it's great to keep discussing btw, but maybe try to include notes in that doc? I think I have to step away for a bit too.

matklad (Feb 07 2020 at 16:01, on Zulip):

rust-analyzer's parsers started that way

matklad (Feb 07 2020 at 16:02, on Zulip):

Thanks @nikomatsakis !

centril (Feb 07 2020 at 16:02, on Zulip):

@matklad yes sure, having fixed many bugs myself, I know that rustc also has parser bugs. They often come as a result of recovery, especially recovery encoded in ad-hoc ways

nikomatsakis (Feb 07 2020 at 16:02, on Zulip):

That said, before I go, I want to ask one question. I wrote this:

but I feel a bit of disconnect about this. Is "parse-to-events" accurate, or is it "parse-to-CST"?

matklad (Feb 07 2020 at 16:02, on Zulip):

We can do both

centril (Feb 07 2020 at 16:02, on Zulip):

those are not mutually exclusive?

matklad (Feb 07 2020 at 16:03, on Zulip):

parse-to-events is easier to agree upon, as events are simpler than CST

nikomatsakis (Feb 07 2020 at 16:03, on Zulip):

I just feel like we didn't talk much about what these events would look like

matklad (Feb 07 2020 at 16:03, on Zulip):

But we can also decide that a specific CST format is reasonable, and parse directly to that

nikomatsakis (Feb 07 2020 at 16:03, on Zulip):

But yes I agree not necessarily mutually exclusive. I'll follow-up later.

centril (Feb 07 2020 at 16:04, on Zulip):

@matklad btw, if you don't have to run, quick question... @Esteban Küber often fancies using backtracking (e.g. cloning the parser state and then swapping it back) for better recovery

centril (Feb 07 2020 at 16:04, on Zulip):

can RA deal with this?

matklad (Feb 07 2020 at 16:05, on Zulip):

It doesn't do backtracking, but I don't see why it can't be added

matklad (Feb 07 2020 at 16:05, on Zulip):

Especially given that the parser state is relatively simpler

matklad (Feb 07 2020 at 16:06, on Zulip):

I just feel like we didn't talk much about what these events would look like

The interface rust-analyzer is using is literally this:

pub trait TreeSink {
    fn token(&mut self, kind: SyntaxKind, n_tokens: u8);

    fn start_node(&mut self, kind: SyntaxKind);
    fn finish_node(&mut self);

    fn error(&mut self, error: ParseError);
}
matklad (Feb 07 2020 at 16:06, on Zulip):

That is, for untyped events case

centril (Feb 07 2020 at 16:10, on Zulip):

btw, @Vadim Petrochenkov wasn't here, but it would be good to hear from them before we make changes

matklad (Feb 07 2020 at 16:15, on Zulip):

But we can also decide that a specific CST format is reasonable

This is actually an interesting point. I do think that the Green node layer from rust-analyzer is actually a pretty good CST in a sense that I don't think one can have a simpler interface, and that the impl of the interface does not seem outrageously ineffective.

So, if we are OK with

Than forgoing events and producing a green tree might be anintersting alternative

centril (Feb 11 2020 at 10:33, on Zulip):

@matklad btw; how are we going to handle editions if we don't have spans in the parser?

matklad (Feb 11 2020 at 10:39, on Zulip):

Why do you need spans for that? We don't handle editions in ra yet, but I think they could be handled nicely by passing an edition argument to the parse function, such that the parser can appropriately parse/reject edition-dependent constructs.

For edition lints, like "use dyn Trait", I think the same infra validation infra (post processing of CST) would work nicely.

bjorn3 (Feb 11 2020 at 10:54, on Zulip):

Would passing an edition argument work with macro calls between different editions work?


//edition 2015
macro_rules! my_macro {
(@expr $a:expr) => { $a };
($($a:tt)) => { let try = (); my_macro!(@expr $($a)); };
}

//edition 2018
my_macro!(try { a });

should work I think, as the macro body is parsed as 2015 edition, while I think the `try { a }` is parsed as 2018 edition when passed to `my_macro!(@expr $($a)*));`.
centril (Feb 11 2020 at 10:55, on Zulip):

When deciding on what to parse when it comes to e.g. .await we check the span on the current token, e.g.:

        if self.token.span.rust_2018() && self.eat_keyword(kw::Await) {
            return self.mk_await_expr(self_arg, lo);
        }

Changing this to a system based on "pass in the edition in parse" I am not sure how that would work out with edition hygiene (interactions with macros).
I think you'll need to check with @Vadim Petrochenkov to see if we can change systems here.

centril (Feb 11 2020 at 10:55, on Zulip):

For edition lints, like "use dyn Trait", I think the same infra validation infra (post processing of CST) would work nicely.

(We don't make parsing decisions wrt. dyn Trait based on the edition; that's done later.)

centril (Feb 11 2020 at 10:56, on Zulip):

Hmm... actually; come to think of it, doesn't the current setup of the rustc parser wrt. editions make it context sensitive? -- actually probably not...

matklad (Feb 11 2020 at 10:59, on Zulip):

Wow, this does seem like an unfortunate interaction....

So, this is for cases where we pass a 2018 edition expression to 2015 edition macro?

Something like log::info!("{}", my_future.await), where the intermediate result can also go via $tt phase...

That makes invisible delimiters model harder to make nice and self-contained :-(

centril (Feb 11 2020 at 11:01, on Zulip):

I don't recall exactly, but there are some tests I'm sure you could look at

matklad (Feb 11 2020 at 11:03, on Zulip):

I guess this is surmountable: when the parser consumes an invisible delimiter, the TokenSource also informs the parser about the currently active edition. But it does add complexity to the interface.

centril (Feb 11 2020 at 11:05, on Zulip):

That might work. Seems preferable as a matter of specification as compared to spans + editions

bjorn3 (Feb 11 2020 at 11:08, on Zulip):

What about macro_rules! m { ($a:tt) => { future.$a } } in 2015 edition crate and then m!(await) in 2018 edition crate. I assume there is no invisible delimiter in that case.

Last update: Jun 07 2020 at 10:25UTC