Stream: t-compiler/wg-rls-2.0

Topic: MBE discussion


matklad (Apr 09 2019 at 18:04, on Zulip):

@Igor Matuszewski @Edwin Cheng you are both interested in MBE. I wonder if we should shcedule a sync-up time to talk through the high-level design?

Edwin Cheng (Apr 09 2019 at 18:32, on Zulip):

Sure, im in GMT + 8

matklad (Apr 09 2019 at 18:37, on Zulip):

Let's try to schedule some time. I am unfortunately pretty busy this week, only Thursday will work for me: https://doodle.com/poll/z9ka63mz3xcy9wzr

Edwin Cheng (Apr 09 2019 at 18:51, on Zulip):

Done ! I did not know there is this kind of app for scheduling :joy:

Igor Matuszewski (Apr 09 2019 at 20:54, on Zulip):

Done!

matklad (Apr 09 2019 at 21:00, on Zulip):

Good, added to the calendar: https://calendar.google.com/calendar/embed?src=6u5rrtce6lrtv07pfi3damgjus%40group.calendar.google.com

Igor Matuszewski (Apr 11 2019 at 09:04, on Zulip):

:wave:

matklad (Apr 11 2019 at 09:05, on Zulip):

:hi:

matklad (Apr 11 2019 at 09:05, on Zulip):

@Edwin Cheng are you around?

Edwin Cheng (Apr 11 2019 at 09:07, on Zulip):

:wave:🏻

matklad (Apr 11 2019 at 09:07, on Zulip):

So, the rough agenda for the meeting:

matklad (Apr 11 2019 at 09:09, on Zulip):

I must confess that I don't really know deeply how macros work inside rustc. Reading rustc's code will be a major part of work here :)

Igor Matuszewski (Apr 11 2019 at 09:09, on Zulip):

I'll have a question regarding a high-level goal

matklad (Apr 11 2019 at 09:09, on Zulip):

Sure!

Igor Matuszewski (Apr 11 2019 at 09:09, on Zulip):

do we mean for it to be a part of the "librarification" goal? e.g. do we want it to be as pure as possible?

Igor Matuszewski (Apr 11 2019 at 09:10, on Zulip):

I'm not sure how much existing MBE infra relies on rustc internals like interning and so on but I imagine a very long-term goal would be to create a nice abstract library that rustc can also use? or would we have to rely too much on the parser and whatnot?

matklad (Apr 11 2019 at 09:10, on Zulip):

Good one! I think we shouldn't target it for librarification explicitly: librariifing nameres and expansion itself at the same time is more capacity than we have

Igor Matuszewski (Apr 11 2019 at 09:11, on Zulip):

okay, makes sense

Edwin Cheng (Apr 11 2019 at 09:11, on Zulip):

And i have another question

matklad (Apr 11 2019 at 09:11, on Zulip):

That is, I feel we should make progress on chalk/nameres/lexer first, and then librarify

matklad (Apr 11 2019 at 09:11, on Zulip):

@Edwin Cheng go ahead

Edwin Cheng (Apr 11 2019 at 09:11, on Zulip):

Sure we referenced how rustc works?

Igor Matuszewski (Apr 11 2019 at 09:12, on Zulip):

like should we follow what rustc does 1-1?

Edwin Cheng (Apr 11 2019 at 09:12, on Zulip):

Yes

matklad (Apr 11 2019 at 09:12, on Zulip):

I'd say probably not

matklad (Apr 11 2019 at 09:12, on Zulip):

I'd like to see how far we can go, using the purest possible interface, etc

matklad (Apr 11 2019 at 09:13, on Zulip):

at least initially.

Edwin Cheng (Apr 11 2019 at 09:13, on Zulip):

Okay

matklad (Apr 11 2019 at 09:13, on Zulip):

Closing all the gaps should be done as a part of future librarification

matklad (Apr 11 2019 at 09:13, on Zulip):

On the highest level, what we need is a proof of concept that large-scale macro expansion works in IDEs

matklad (Apr 11 2019 at 09:14, on Zulip):

So we should support common macros with a relatively efficient impl

Edwin Cheng (Apr 11 2019 at 09:14, on Zulip):

Make sense

Igor Matuszewski (Apr 11 2019 at 09:14, on Zulip):

another high level question - I didn't delve deep into untying the nameres/expansion but I assume these are still highly interconnected

matklad (Apr 11 2019 at 09:15, on Zulip):

@Igor Matuszewski the running hypothesis is that they are not really

matklad (Apr 11 2019 at 09:15, on Zulip):

Like, nameres could have a callback expand_macro(MacroId) -> SetOfNames

matklad (Apr 11 2019 at 09:16, on Zulip):

That is, it doesn't care about actual token trees which are produced, only about the names

Igor Matuszewski (Apr 11 2019 at 09:16, on Zulip):

hm, that actually makes sense

matklad (Apr 11 2019 at 09:17, on Zulip):

The hygiene bit is somewhat in the middle between the two though

Igor Matuszewski (Apr 11 2019 at 09:17, on Zulip):

what was the case where we had feared it needs to go back and forth?

matklad (Apr 11 2019 at 09:17, on Zulip):

About not being incremental enough? Or whcih case?

Igor Matuszewski (Apr 11 2019 at 09:17, on Zulip):

when one expansion expands to a glob use and then another expansion uses a (now) shadowed identifier? can

Igor Matuszewski (Apr 11 2019 at 09:17, on Zulip):

can't remember

matklad (Apr 11 2019 at 09:17, on Zulip):

Ah, time-travel

matklad (Apr 11 2019 at 09:18, on Zulip):

Yeah, the SetOfNames should also include set of (glob) imports, but this seems like a strictly name-res concern

Igor Matuszewski (Apr 11 2019 at 09:18, on Zulip):

I can't remember the details now :( sorry about that

Igor Matuszewski (Apr 11 2019 at 09:19, on Zulip):

okay so I imagine we try to go as far as we can using the purest possible interface and don't think about nameres right now, correct?

matklad (Apr 11 2019 at 09:19, on Zulip):

Correct!

matklad (Apr 11 2019 at 09:19, on Zulip):

Purity of interface is one of my personal high-level design goals here

matklad (Apr 11 2019 at 09:20, on Zulip):

let me describe what I've came up with so-far

matklad (Apr 11 2019 at 09:20, on Zulip):

first, we have mbe and proc macros. I think it would be cool if both used exactly the same interface. Basically, making macro_rules a special-cased proc macro

matklad (Apr 11 2019 at 09:21, on Zulip):

The interface is the ra_tt crate

matklad (Apr 11 2019 at 09:21, on Zulip):

It defines a TokenTree structure, which is more or less copy-pased from proc_macro2

matklad (Apr 11 2019 at 09:22, on Zulip):

crucially, ra_tt does not depend on any other compiler crate

matklad (Apr 11 2019 at 09:22, on Zulip):

Another bit that is has is that its tokens have identifies (each identifier as an u32 associated with it)

matklad (Apr 11 2019 at 09:23, on Zulip):

I "hope" that TokenIds should be enough to associated arbitrary information (hygiene & spans) with tokens

matklad (Apr 11 2019 at 09:23, on Zulip):

That is all for the "pure interface" bit

matklad (Apr 11 2019 at 09:24, on Zulip):

That is, macros consume and produce TokenTrees. The question is, how do we integrate thouse back in the AST?

matklad (Apr 11 2019 at 09:25, on Zulip):

Specifically, we need to convert from AST to tokens, from tokens to AST, and we need to preserve hygiene and span information along the way

Igor Matuszewski (Apr 11 2019 at 09:25, on Zulip):

so each token has an associated u32 with it, which can hold info like hygiene and spans, right?

matklad (Apr 11 2019 at 09:25, on Zulip):

@Igor Matuszewski right

Igor Matuszewski (Apr 11 2019 at 09:25, on Zulip):

(to reiterate)

matklad (Apr 11 2019 at 09:26, on Zulip):

basically, you'll have a hygiene: FxHasMap<u32, ExpansionId>

matklad (Apr 11 2019 at 09:26, on Zulip):

that is, a hash-map on the side

Igor Matuszewski (Apr 11 2019 at 09:26, on Zulip):

so we explicitly convert AST to tokens for a macro call?

Igor Matuszewski (Apr 11 2019 at 09:27, on Zulip):

since we have the concrete syntax tree (full-fidelity?) I imagine transforming this to tokens shouldn't be hard

matklad (Apr 11 2019 at 09:27, on Zulip):

yes https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ra_mbe/src/syntax_bridge.rs#L18

Edwin Cheng (Apr 11 2019 at 09:28, on Zulip):

Yes we already have this

matklad (Apr 11 2019 at 09:28, on Zulip):

Specifically, we walk the tree, prodce tokens (assigning incrementing local ids) and also produce a TokenMap which should help with managing spans, but which is not actually used :(

Igor Matuszewski (Apr 11 2019 at 09:29, on Zulip):

and what is tt::TokenId specifically?

matklad (Apr 11 2019 at 09:29, on Zulip):

The bit about local ids is important: for each macro call, the id of the first token will be zero. This helps with incrementality, b/c the token tree is the same regardless of the position of the macro_call

matklad (Apr 11 2019 at 09:29, on Zulip):

pub struct TokenId(pub u32);

Igor Matuszewski (Apr 11 2019 at 09:29, on Zulip):

and that's the u32 that's used as the key for hygiene info?

matklad (Apr 11 2019 at 09:30, on Zulip):

yes

matklad (Apr 11 2019 at 09:30, on Zulip):

and also as a key in the TokenId -> TextRange map

Igor Matuszewski (Apr 11 2019 at 09:30, on Zulip):

ah, alrighty

matklad (Apr 11 2019 at 09:31, on Zulip):

Yeah, so, we get an TokenTree. We then run macro black-box, which gives us another token-tree, but with the same (identify-wise) tokens

matklad (Apr 11 2019 at 09:32, on Zulip):

Which we feed back into the parser in token_tree_to_ast_item_list, and get the SyntaxTree back

matklad (Apr 11 2019 at 09:33, on Zulip):

which is kinda fun, because SyntaxTree is full-fidelity, but with token-trees we don't have whitespace, so the tree is missing all ws nodes

Igor Matuszewski (Apr 11 2019 at 09:33, on Zulip):

there's another question of "parsability" of the resulting token trees

matklad (Apr 11 2019 at 09:34, on Zulip):

ANd this is what I am not entirely sure: converting tt to Abstract syntax tree directly seems more resonable, but we don't have an AST layer between CST and HIR, and introducing one just for macros seems and overkill

matklad (Apr 11 2019 at 09:34, on Zulip):

@Igor Matuszewski what question exactly?

Igor Matuszewski (Apr 11 2019 at 09:34, on Zulip):

there are some cases in rustc macros IIRC where we produce a $crate quasi-keyword which is special-cased in rustc but isn't directly parseable

Igor Matuszewski (Apr 11 2019 at 09:34, on Zulip):

I think Vadim has worked on that to push back on it

matklad (Apr 11 2019 at 09:35, on Zulip):

Oh, yeah... But, we totally can have $crate identifier in the syntax tree

Igor Matuszewski (Apr 11 2019 at 09:35, on Zulip):

but I assume we should always have expansion that, when "pretty-printed" should be easily parseable as regular Rust, right?

matklad (Apr 11 2019 at 09:35, on Zulip):

Not really I fear.

matklad (Apr 11 2019 at 09:35, on Zulip):

$crate is the first reason

matklad (Apr 11 2019 at 09:35, on Zulip):

the second reason is precedence

Edwin Cheng (Apr 11 2019 at 09:35, on Zulip):

Do we need the span about the interpolated item in final token tree ?

matklad (Apr 11 2019 at 09:36, on Zulip):

@Igor Matuszewski basically the tree for $expr * 2 could expand to 1 + 1 * 2, where the tree would be right, but pretty-pritnging would be wrong

Igor Matuszewski (Apr 11 2019 at 09:37, on Zulip):

shouldn't pretty-printing factor in the precedence and insert disambiguating braces?

Igor Matuszewski (Apr 11 2019 at 09:37, on Zulip):

so (1 + 1) * 2

Igor Matuszewski (Apr 11 2019 at 09:37, on Zulip):

(hope I'm not missing something obvious here)

matklad (Apr 11 2019 at 09:37, on Zulip):

@Edwin Cheng yeah, we need to correlate source spans with target spans, so that features like goto definition work correctly

matklad (Apr 11 2019 at 09:38, on Zulip):
macro_rules! s {
    ($i:ident) => { struct $i; }
}

s!(Foo);

fn foo(_: Foo) {

}
matklad (Apr 11 2019 at 09:38, on Zulip):

You can try this code in rust-analyzer today

matklad (Apr 11 2019 at 09:38, on Zulip):

goto definition for _: Foo works

matklad (Apr 11 2019 at 09:39, on Zulip):

And it even goes to the offset of Foo in struct Foo

matklad (Apr 11 2019 at 09:39, on Zulip):

But, it does so in the wrong file :D

matklad (Apr 11 2019 at 09:39, on Zulip):

So, you end up on _r in macro_rules

Igor Matuszewski (Apr 11 2019 at 09:40, on Zulip):

Coming back to the "easily parseable" bit, that'd allow us to bypass the "AST" if we can always produce a valid CST

matklad (Apr 11 2019 at 09:40, on Zulip):

threading this span-remapping logic all the way to the IDE is actually one task that I feel we should do right now

Igor Matuszewski (Apr 11 2019 at 09:40, on Zulip):

which also would be good, in case you'd want to syntax highlight the macro-expanded code

matklad (Apr 11 2019 at 09:40, on Zulip):

@Igor Matuszewski yeah, I think using CST would work

matklad (Apr 11 2019 at 09:41, on Zulip):

It's jsut that it contains "unnceseccary" bits

matklad (Apr 11 2019 at 09:41, on Zulip):

but yeah, I also think it'll be convenient for ide features like highlighting

matklad (Apr 11 2019 at 09:42, on Zulip):

I guess this is a good overview of what we got right now, so perhaps we shoudl proced to discussing action items?

Igor Matuszewski (Apr 11 2019 at 09:42, on Zulip):

sure!

Edwin Cheng (Apr 11 2019 at 09:42, on Zulip):

sure!

matklad (Apr 11 2019 at 09:42, on Zulip):

@Edwin Cheng do you have any questions/comments? I think that you are now more familiar with exisitng infram than me :)

Edwin Cheng (Apr 11 2019 at 09:43, on Zulip):

For the design part i don't have any comments

matklad (Apr 11 2019 at 09:43, on Zulip):

Good!

matklad (Apr 11 2019 at 09:43, on Zulip):

So, I think there are two areas:

Edwin Cheng (Apr 11 2019 at 09:43, on Zulip):

But i face some problems, should i talk right now, or later?

matklad (Apr 11 2019 at 09:44, on Zulip):

@Edwin Cheng right now is a good point probably

Edwin Cheng (Apr 11 2019 at 09:44, on Zulip):

We discussed the single char punct problem before.

Edwin Cheng (Apr 11 2019 at 09:45, on Zulip):

Currently our parser expect composed token, e.g "DOTDOTDOT" ,,

matklad (Apr 11 2019 at 09:45, on Zulip):

right! (@Igor Matuszewski in the TokenTree, we represent << as two < tokens which are Joint, like syn, and that is slightly different from what we do in the parser)

Edwin Cheng (Apr 11 2019 at 09:46, on Zulip):

But it make all stuffs became very complicated.

matklad (Apr 11 2019 at 09:46, on Zulip):

Hm, perhaps we need to change the TokenSource interface to say

fn is_at_composed_token(tt: TokenKind)

?

matklad (Apr 11 2019 at 09:46, on Zulip):

That way, we don't have to write current2 or current3 in the parser

matklad (Apr 11 2019 at 09:47, on Zulip):

instead, the token source would take care of this

Edwin Cheng (Apr 11 2019 at 09:47, on Zulip):

Or should we change the parser such that it only expect what single char punct?

matklad (Apr 11 2019 at 09:47, on Zulip):

I guess Is_composed_token is a better apporahc

matklad (Apr 11 2019 at 09:48, on Zulip):

than, we can change TokenSource to be an Iterator-like thing, instead of a slice-like thing

matklad (Apr 11 2019 at 09:48, on Zulip):

and that should remove the need tor token-tree flattening

Edwin Cheng (Apr 11 2019 at 09:48, on Zulip):

Will it affect proc-macro?

matklad (Apr 11 2019 at 09:49, on Zulip):

I don't think so?

Edwin Cheng (Apr 11 2019 at 09:49, on Zulip):

Another question, why we do not have a tokenstream ?

Igor Matuszewski (Apr 11 2019 at 09:50, on Zulip):

tokenstream would be the "iterator-like" approach, right?

matklad (Apr 11 2019 at 09:50, on Zulip):

THis is actually a good action item:

Trying something like this:

trait TOkenSOuece {
   fn current(&self) -> SyntaxKind;
   fn is_at_composite(&self, kind: SyntaxKind);
   fn bump(&mut self);
   fn bump_composite(&mut sekf, SyntaxKind);
}
matklad (Apr 11 2019 at 09:52, on Zulip):

SO, proc-macro has both a token tree and a token stream, where "stream" is the tree, but without delimiters. Vadim Petrochenkov in some issue written that this is not orthogonal, and that ideally only token tree should be enough

matklad (Apr 11 2019 at 09:53, on Zulip):

argh, can't find the comment right now

matklad (Apr 11 2019 at 09:54, on Zulip):

but the bottom like is that

Edwin Cheng (Apr 11 2019 at 09:54, on Zulip):

I like the new trait, but how it solve the flatten problem of TokenSource <-> TokenTree ?

matklad (Apr 11 2019 at 09:55, on Zulip):

@Edwin Cheng currently TokenSource wraps an [TtToken], because we need to handle arbitrary pos. In the new interface, you always ask questions about the current pos, so you could use the cursor directly

matklad (Apr 11 2019 at 09:56, on Zulip):

(@Igor Matuszewski , if you are lost, we are talking about this interface to the parser https://github.com/rust-analyzer/rust-analyzer/blob/e6e2571bdf780d304c792d4317bbaf1d6f5d7a0a/crates/ra_parser/src/lib.rs#L32-L39, which is random-access, while it should be iterator-ish)

Edwin Cheng (Apr 11 2019 at 09:57, on Zulip):

But do the token source only needs the top level TtToken of a TokenTree ?

matklad (Apr 11 2019 at 09:58, on Zulip):

@Edwin Cheng sort-of. when bump method would go over (, the token source will dive one lever deeper, by pushing a parent subtree tot he stack

matklad (Apr 11 2019 at 09:58, on Zulip):

when we bump over ), the subtree is popped, and we are back to traversing the original TokenTree

matklad (Apr 11 2019 at 09:59, on Zulip):

Ok, so we are almost out of time, let's try to summarize action-items we have

Edwin Cheng (Apr 11 2019 at 09:59, on Zulip):

So how about token-source look ahead ?

Edwin Cheng (Apr 11 2019 at 10:00, on Zulip):

I mean your parser maybe needs to look ahead some tokens ? So would we need to go backward of the cursor?

matklad (Apr 11 2019 at 10:00, on Zulip):

@Edwin Cheng we need a constant amount of lookaheard and we don't need to look inside (, so that should be doable with iterato interface

matklad (Apr 11 2019 at 10:01, on Zulip):

@Igor Matuszewski would you be interested in tacking spans? That should make you touch almost every part of the analyser :D

Igor Matuszewski (Apr 11 2019 at 10:01, on Zulip):

Yeah, sounds good!

Igor Matuszewski (Apr 11 2019 at 10:01, on Zulip):

it'd be good to get into the nitty-gritty now ;)

matklad (Apr 11 2019 at 10:02, on Zulip):
matklad (Apr 11 2019 at 10:02, on Zulip):
matklad (Apr 11 2019 at 10:02, on Zulip):

note that 1 and 2 are sort-of ground-work for 3

matklad (Apr 11 2019 at 10:03, on Zulip):

I actually expect that someone just throws away the expander I hacked together, and replaces it with something proper :)

matklad (Apr 11 2019 at 10:03, on Zulip):

And that would require reading rustc very closely

matklad (Apr 11 2019 at 10:04, on Zulip):

Should we wrap up the meeting then? (I do have some more time, so we can discuss some specific questions as well)

Edwin Cheng (Apr 11 2019 at 10:04, on Zulip):

@matklad yesterday i just read a lot of rustc mbe code...:(

matklad (Apr 11 2019 at 10:04, on Zulip):

I understand that :( smiley :D

Edwin Cheng (Apr 11 2019 at 10:06, on Zulip):

I do have some more time too, but its up to you

matklad (Apr 11 2019 at 10:06, on Zulip):

I'll also write meeting notes for this meetings

Igor Matuszewski (Apr 11 2019 at 10:07, on Zulip):

I think it'd be good to wrap up now

Igor Matuszewski (Apr 11 2019 at 10:07, on Zulip):

and come back and sync later as we go, if that makes sense

matklad (Apr 11 2019 at 10:07, on Zulip):

yeah. Let me preemptifly create streams for span-mapping and iterator token source

Igor Matuszewski (Apr 11 2019 at 10:07, on Zulip):

@matklad where will be the meeting notes?

matklad (Apr 11 2019 at 10:08, on Zulip):

in the compiler-team repo I guess?

Igor Matuszewski (Apr 11 2019 at 10:13, on Zulip):

@matklad okay! let me know if you need some help with that

matklad (Apr 11 2019 at 10:41, on Zulip):

also, random cc @Alexander Regueiro: I saw that wg-macros thing, so it might be interested to you to see what happens with macros in RLS-2.0. Sorry for not thinking about cc-ing before the meeting :)

matklad (Apr 11 2019 at 10:45, on Zulip):

@Igor Matuszewski here's that time-traveling problem thread: https://github.com/rust-lang/rust/pull/53778

Alexander Regueiro (Apr 11 2019 at 15:41, on Zulip):

No worries. I think the plan is to keep developing macros in the current compiler alongside this effort (especially eager expansion), but it would be good to stay abreast of that. What's the timeline like, do you have an idea? @matklad

matklad (Apr 11 2019 at 15:45, on Zulip):

I hope that we'll have more-or-less working subset of mbe withing months, but any further plans are unclear

Edwin Cheng (Apr 20 2019 at 08:55, on Zulip):

@matklad Do you have any idea how we handle $crate token in mbe expansion ?

matklad (Apr 20 2019 at 13:57, on Zulip):

I would think that just creating a $crate identifier would work?

matklad (Apr 20 2019 at 13:58, on Zulip):

so, when we expand code, and see $ crate token trees, we produce a single ident token with $crate as text

Edwin Cheng (Apr 20 2019 at 15:19, on Zulip):

Sure, thats what i think too. OTOH, i did manage to fix the stackoverflow bug, but i found that I don’t want to enable the tt matcher, it is because:

1. There are still some bugs in mbe which fail to expand. I will try to fix first.
2. The real problem is, i found that it make the whole startup time very slow. (At least in my ancient pc), i think it related to winapi, which brings tons of structs in (previously it is def behind macro)

matklad (Apr 20 2019 at 15:21, on Zulip):

yeah, I expect properly supporting macros will make perf significantly worse....

It might be a good idea to profile this stuff to check if its macro-expansion whihc is slow, or if it is the name resolution (which is now exercised much harder). https://github.com/AtheMathmo/cpuprofiler is a useful tool for profiling

Edwin Cheng (Apr 20 2019 at 16:09, on Zulip):

https://github.com/AtheMathmo/cpuprofiler is a useful tool for profiling

It does not support Windows :joy:
Anyway, i would find some tools to profile it first.
But my first priority is making no macro expanding errors in name resolution first. Personally i prefer a bunch of small PRs and will submit it one by one.

L.apz (Apr 21 2019 at 20:10, on Zulip):

what kinda of test is a smoke test ? and where would it belong

matklad (Apr 21 2019 at 21:10, on Zulip):

@L.apz smoke test is the most simple test that executes given functionlity

matklad (Apr 21 2019 at 21:10, on Zulip):

for macros, I'd add it to type-inference tests

matklad (Apr 21 2019 at 21:11, on Zulip):

something like this

macro_rules! expr {
    ($e:expr) => { $e }
}

struct S;
fn foo() {
    let x = expr!(S);
}

and check that type of x is S

matklad (Apr 21 2019 at 21:14, on Zulip):

ah, the issue with panic b/c of local syntax ptr is fun...

matklad (Apr 21 2019 at 21:16, on Zulip):

where exactly do you get the panic? I wonder if we can fix it without refactoring all the tests

L.apz (Apr 21 2019 at 21:59, on Zulip):

I get the error when I tried to add a test within the type inference tests. The painc is caused when we iterate the types that type inference returns specifically this line : https://github.com/rust-analyzer/rust-analyzer/blob/bbc5c1d24e1a641b134f634516828301e8cfc320/crates/ra_hir/src/ty/tests.rs#L2538

matklad (Apr 21 2019 at 22:04, on Zulip):

Hm, so, one fix here is to remove macro-generated exprs from SourceMap....

Not sure what's the best way to do that

matklad (Apr 21 2019 at 22:04, on Zulip):

and also not sure what's the proper way to handle this long-term :)

matklad (Apr 21 2019 at 22:05, on Zulip):

So, and absolute quick fix would be to add a bool flag is_in_macro to Expr collector, which causes us to skip source-map if it is enabled

matklad (Apr 21 2019 at 22:05, on Zulip):

long-term, I guess we need to replace local ptrs in source map with global ones?

L.apz (Apr 21 2019 at 22:20, on Zulip):

Thanks for the pointers. I'll look at it tomorrow.

Edwin Cheng (Apr 22 2019 at 10:22, on Zulip):

@L.apz For issue 1187, i wrote an unit test :

#[test]
    fn test_vec() {
        let rules = create_rules(
            r#"
         macro_rules! vec {
            ($($item:expr),*) => {
                {
                    let mut v = Vec::new();
                    $(
                        v.push($item);
                    )*
                    v
                }
            };
}
"#,
        );
        assert_expansion(&rules, r#"vec!();"#, r#"{let mut v = Vec :: new () ;  v}"#);
        assert_expansion(
            &rules,
            r#"vec![1u32,2]"#,
            r#"{let mut v = Vec :: new () ; v . push (1u32) ; v . push (2) ; v}"#,
        );

        assert_eq!(
            expand_to_stmts(&rules, r#"vec![1u32,2]"#).syntax().debug_dump().trim(),
            r#"MACRO_STMTS@[0; 45)
  EXPR_STMT@[0; 45)
    BLOCK_EXPR@[0; 45)
      BLOCK@[0; 45)
        L_CURLY@[0; 1) "{"
        LET_STMT@[1; 20)
          LET_KW@[1; 4) "let"
          BIND_PAT@[4; 8)
            MUT_KW@[4; 7) "mut"
            NAME@[7; 8)
              IDENT@[7; 8) "v"
          EQ@[8; 9) "="
          CALL_EXPR@[9; 19)
            PATH_EXPR@[9; 17)
              PATH@[9; 17)
                PATH@[9; 12)
                  PATH_SEGMENT@[9; 12)
                    NAME_REF@[9; 12)
                      IDENT@[9; 12) "Vec"
                COLONCOLON@[12; 14) "::"
                PATH_SEGMENT@[14; 17)
                  NAME_REF@[14; 17)
                    IDENT@[14; 17) "new"
            ARG_LIST@[17; 19)
              L_PAREN@[17; 18) "("
              R_PAREN@[18; 19) ")"
          SEMI@[19; 20) ";"
        EXPR_STMT@[20; 33)
          METHOD_CALL_EXPR@[20; 32)
            PATH_EXPR@[20; 21)
              PATH@[20; 21)
                PATH_SEGMENT@[20; 21)
                  NAME_REF@[20; 21)
                    IDENT@[20; 21) "v"
            DOT@[21; 22) "."
            NAME_REF@[22; 26)
              IDENT@[22; 26) "push"
            ARG_LIST@[26; 32)
              L_PAREN@[26; 27) "("
              LITERAL@[27; 31)
                INT_NUMBER@[27; 31) "1u32"
              R_PAREN@[31; 32) ")"
          SEMI@[32; 33) ";"
        EXPR_STMT@[33; 43)
          METHOD_CALL_EXPR@[33; 42)
            PATH_EXPR@[33; 34)
              PATH@[33; 34)
                PATH_SEGMENT@[33; 34)
                  NAME_REF@[33; 34)
                    IDENT@[33; 34) "v"
            DOT@[34; 35) "."
            NAME_REF@[35; 39)
              IDENT@[35; 39) "push"
            ARG_LIST@[39; 42)
              L_PAREN@[39; 40) "("
              LITERAL@[40; 41)
                INT_NUMBER@[40; 41) "2"
              R_PAREN@[41; 42) ")"
          SEMI@[42; 43) ";"
        PATH_EXPR@[43; 44)
          PATH@[43; 44)
            PATH_SEGMENT@[43; 44)
              NAME_REF@[43; 44)
                IDENT@[43; 44) "v"
        R_CURLY@[44; 45) "}""#
        );
    }
Edwin Cheng (Apr 22 2019 at 10:25, on Zulip):

And its pass, so..... I would want to know how do you expand your macro ? Maybe related to be used an incorrect token_tree_to_xx ?

L.apz (Apr 22 2019 at 11:06, on Zulip):

@Edwin Cheng The method is used was mbe::token_tree_to_expr.The error also only shows up when you debug the extension and check the debug output otherwise it works fine.

Edwin Cheng (Apr 22 2019 at 12:13, on Zulip):

I just tried mbe::token_tree_to_expr one, and it works too. Seem like if it is a parsing error, it should be appeared in other places. Let see if it appear again and i will fix it. :)

Edwin Cheng (Apr 23 2019 at 15:43, on Zulip):

I have a stupid question:

in https://github.com/rust-lang/rust/blob/5d20ff4d2718c820632b38c1e49d4de648a9810b/src/libcore/internal_macros.rs#L81
It defined a mbe :

macro_rules! impl_fn_for_zst {
    ($(
        $( #[$attr: meta] )*
        struct $Name: ident impl$( <$( $lifetime : lifetime ),+> )? Fn =
            |$( $arg: ident: $ArgTy: ty ),*| -> $ReturnTy: ty
            $body: block;
    )+) => {
       ....
    }
}

and in https://github.com/rust-lang/rust/blob/316a391dcb7d66dc25f1f9a4ec9d368ef7615005/src/libcore/str/mod.rs#L4114
It invoke this macro by:

impl_fn_for_zst! {
    #[derive(Clone)]
    struct CharEscapeDebugContinue impl Fn = |c: char| -> char::EscapeDebug {
        c.escape_debug_ext(false)
    };

    #[derive(Clone)]
    struct CharEscapeUnicode impl Fn = |c: char| -> char::EscapeUnicode {
        c.escape_unicode()
    };
    #[derive(Clone)]
    struct CharEscapeDefault impl Fn = |c: char| -> char::EscapeDefault {
        c.escape_default()
    };
}

A part of macro rules is |$( $arg: ident: $ArgTy: ty ),*| -> $ReturnTy: ty and it try to match |c: char| -> char::EscapeUnicode,
but noted that the comma have to be presented in the macro invocation, but it is missing here.
Do i miss missing something here???

matklad (Apr 23 2019 at 15:44, on Zulip):

* is a special repetition syntax in mbe

matklad (Apr 23 2019 at 15:45, on Zulip):

$(foo),* means separated by ,, so, trailing comma is forbidden

matklad (Apr 23 2019 at 15:45, on Zulip):

$(foo,)* is terminated by ,, ie, trailing comma is required

matklad (Apr 23 2019 at 15:46, on Zulip):

$(foo),*,? I think is more or less "separated by ,, with an optional trailing comma"

matklad (Apr 23 2019 at 15:46, on Zulip):

though, iiuc, the last one can parse a comma by itself

Edwin Cheng (Apr 23 2019 at 15:47, on Zulip):

So it is optional ?? Do you have any document/ material i can read about that?

matklad (Apr 23 2019 at 15:48, on Zulip):

The trailing one is optional,

matklad (Apr 23 2019 at 15:50, on Zulip):

I guess the docs are https://doc.rust-lang.org/reference/macros-by-example.html

matklad (Apr 23 2019 at 15:50, on Zulip):

https://danielkeep.github.io/tlborm/book/README.html used to be a good resource, but it hasn't keep up with language changes

Edwin Cheng (Apr 23 2019 at 15:55, on Zulip):

I did read both actually, but no one mention the seperator itself in pattern is optional. :(

matklad (Apr 23 2019 at 15:56, on Zulip):

there's a bit about this here: https://danielkeep.github.io/tlborm/book/pat-trailing-separators.html

Edwin Cheng (Apr 23 2019 at 15:58, on Zulip):

Seem to be i am missing that part. Thanks alot.

Edwin Cheng (Apr 23 2019 at 15:58, on Zulip):

"Note that this cannot be used in all contexts. If the compiler rejects this, you will likely need to use multiple arms and/or incremental matching" :(

matklad (Apr 23 2019 at 16:00, on Zulip):

Yeah, I personally see ,* thing as one of the (lesser) warts of the existing macro systems. Everywhere else we are careful to allow trailing separators, but in macros they are forbidden!

It used to be the case that most stdlib macros forbade trailing , :D

Edwin Cheng (Apr 23 2019 at 16:00, on Zulip):

But actually , the link you sent me is actually talking about the question i mention. Thats why in the link they needs to use ($($exprs:expr),* $(,)*) => {...}; to overcome that.

Edwin Cheng (Apr 23 2019 at 16:02, on Zulip):

So back to origin, WHY it is optional ? I still don't get it

matklad (Apr 23 2019 at 16:05, on Zulip):

It is not really optional

matklad (Apr 23 2019 at 16:05, on Zulip):

let me give you a couple of examples

matklad (Apr 23 2019 at 16:05, on Zulip):

let's say we have pattern $(a),*

matklad (Apr 23 2019 at 16:06, on Zulip):

here are some cases where it matches succesfully

matklad (Apr 23 2019 at 16:06, on Zulip):

1.
2. a
3. a, a
4. a, a, a

matklad (Apr 23 2019 at 16:06, on Zulip):

And here are some failed cases

matklad (Apr 23 2019 at 16:07, on Zulip):

1. a, // tariling separator
2. , // only separator
3. a a // no separator

matklad (Apr 23 2019 at 16:07, on Zulip):

does this make sense?

Edwin Cheng (Apr 23 2019 at 16:08, on Zulip):

So what you mean is, the kleene operator is applied on the comma, not the whole pattern, right ?

matklad (Apr 23 2019 at 16:10, on Zulip):

no, it is applied to both pattern and comma

kennytm (Apr 23 2019 at 16:10, on Zulip):

$(a),* means matching a $(,a)* or matching empty

matklad (Apr 23 2019 at 16:10, on Zulip):

the important thing is the diffrence between $(a,)* and $(a),*

Edwin Cheng (Apr 23 2019 at 16:14, on Zulip):

Okay, i seem to get it, but why the book mentions $($exprs:expr),* $(,)*) => {...}; ??

kennytm (Apr 23 2019 at 16:14, on Zulip):

or, read it this way,

kennytm (Apr 23 2019 at 16:15, on Zulip):

so $($exprs:expr),* means matching Zero Or More expressions, with a , between each expression

kennytm (Apr 23 2019 at 16:15, on Zulip):

but this can't match the trailing comma in

1,2,3,4,
//     ^
kennytm (Apr 23 2019 at 16:17, on Zulip):

so a $(,)* is added after $($exprs:expr),* to match that trailing comma as well (should be written as $(,)? nowadays but it's a relatively new feature).

Edwin Cheng (Apr 23 2019 at 16:17, on Zulip):

Oh i see, that's because it is allowed for the trailing comma .. I just misunderstand that in the opposite way .. I thought it is because it allow no trailing comma.

Edwin Cheng (Apr 23 2019 at 16:19, on Zulip):

Thank you so much you guys , :heart:

L.apz (Apr 23 2019 at 20:01, on Zulip):

@matklad I'm looking at adding goto for macros but I keep on a getting some type inference errors

no method named `parse` found for type `&db::RootDatabase` in the current scope

Do you know what causes this or have any idea on something that I'm missing

matklad (Apr 23 2019 at 20:02, on Zulip):

sigh....

That's a rustc bug with incremental compilation, rm target/debug/.incremental -rf should fix it

matklad (Apr 23 2019 at 20:02, on Zulip):

temporarly :)

L.apz (Apr 23 2019 at 20:03, on Zulip):

thanks

Jeremy Kolb (Apr 23 2019 at 20:37, on Zulip):

I usually encounter this inra_ide_api so you only need to delete that one

Edwin Cheng (Apr 25 2019 at 04:35, on Zulip):

Just want to share with you gus, while i add tests to mbe by dogfooding all macro in ra, i found these two "amazing" macros:

I don't know how to handle these cases, should we just ignore it or still parse it ?

matklad (Apr 25 2019 at 05:59, on Zulip):

Oh, hellow winapi, my old friend! (https://github.com/intellij-rust/intellij-rust/blob/53b67c89eae86969bf244683aa5b5ccb2639d6b0/src/test/kotlin/org/rustSlowTests/CargoProjectResolveTest.kt#L216)

matklad (Apr 25 2019 at 05:59, on Zulip):

Long term, we definitely should just expand them

matklad (Apr 25 2019 at 06:01, on Zulip):

Medium term, it’s okay if we fail to expand them, as long as the analyzer doesn’t hang

Edwin Cheng (Apr 25 2019 at 17:24, on Zulip):

@matklad I have another mbe design problem want to discuss :

Edwin Cheng (Apr 25 2019 at 17:24, on Zulip):

(deleted)

Edwin Cheng (Apr 25 2019 at 17:29, on Zulip):

## Problems

Who do will handle multiple punct in mbe ?

### Lexer

Currently our lexer produces 2 kinds of Punct :

### How current mbe macro invocation works

1. When parser see a macro call, it will parse all its argument by ast::token_tree, which basically just use what lexer produced to make a tree. All whitespace information are preserved in this phase and only type I and type II punct will show up in ast::token_tree.
2. When hir expands a macro, it calls mbe::ast_to_token_tree to convert it to a tt::TokenTree. Although tt::TokenTree only store type I punct, while we extract type II punct to type I , we will store is_joint_next to preserve the whitespace information. But, this method cannot preverse white space information of type III . We have to look ahead to see if whether we need whitespace:

// Handle type II white space
for char in token.text().chars() {
    if let Some(char) = prev {
        token_trees.push(
            tt::Leaf::from(tt::Punct { char, spacing: tt::Spacing::Joint })
                .into(),
        );
    }
    prev = Some(char)
}
// Handle type III white space
if let Some(char) = prev {
    let spacing = match child_iter.peek() {
        Some(SyntaxElement::Token(token)) => {
            if token.kind().is_punct() {
                tt::Spacing::Joint
            } else {
                tt::Spacing::Alone
            }
        }
        _ => tt::Spacing::Alone,
    };

    token_trees.push(tt::Leaf::from(tt::Punct { char, spacing }).into());
}

3. While mbe match rule matchers, (e.g. $path:path), it will call parse_xxx from ra_syntax crate as a blackbox parser, So that's why the type III white space information above will be needed.
4. After macro expansion is finished, a tt::Subtree will be returned. And hir will call mbe::token_tree_to_xxx to convert it back to SyntaxNode tree. And all whitespace information will be lost while conversion. There are two cases:
a. tt::TokenTree contains no other macro calls. That's fine as we do not need these information any more. (in mbe point of view)
b. tt::TokenTree contains other macro calls. hir will try to exapnd it again. (Goto step 1). However, all whitespace information are lost, so we have a trouble.

Imagine following case:

macro_rules! foo {
    () => {
        bar![
        struct Foo {
            problem: ::other_crate::type,
        }]
    }
}

macro_rules! bar {
    (($tt:tt)*) => {
        // Something..
        ......
    }
}

Note that after bar macro expansion, the SyntaxNode of this line:
problem: ::other_crate::type
is [IDENT, COLON, COLONCOLON, ...].

However, when we try to expand bar using algorithm in step 2, the tt::TokenTree will become:
[IDENT, COLON(joint_to_next=true),COLON(joint_to_next=true), COLON(joint_to_next=true)]
And it is simply wrong.

matklad (Apr 25 2019 at 17:33, on Zulip):

I think completely loosing whitespace info should be ok for macro expansion

matklad (Apr 25 2019 at 17:33, on Zulip):

However, the fact that the lexer produces both joint and non-joint tokens is not

matklad (Apr 25 2019 at 17:34, on Zulip):

I think we should fix the lexer to always produce single-character tokesn

Edwin Cheng (Apr 25 2019 at 17:37, on Zulip):

But the last example still cannot resolve, right ?
even the lexer is only produce single char token, the SyntaxNode will be [IDENT, COLON, COLON COLON]...

matklad (Apr 25 2019 at 17:40, on Zulip):

Hm, riiight....

matklad (Apr 25 2019 at 17:41, on Zulip):

The brute-force solution would be to insert whitespace when we go from tt to ast

matklad (Apr 25 2019 at 17:42, on Zulip):

Like, initially, for foo, we have syntax tree from ws typed by the user

matklad (Apr 25 2019 at 17:42, on Zulip):

when we cook tt out of it, we get :, join_to_next=false

matklad (Apr 25 2019 at 17:43, on Zulip):

whn we then expand foo, we preserve this token and get join_to_next=false in the expantion

matklad (Apr 25 2019 at 17:43, on Zulip):

which is ok

matklad (Apr 25 2019 at 17:43, on Zulip):

what is not ok is that, when we try to parse that token tree as an item list, we render it into a tree without a space (whihc is what you are saying)

matklad (Apr 25 2019 at 17:43, on Zulip):

I think we can tweak that last step to insert a dummy space in that case?

matklad (Apr 25 2019 at 17:44, on Zulip):

that is, we don't need to preserve vs exactly as the user typed them

matklad (Apr 25 2019 at 17:44, on Zulip):

we just need some spacing which maintains jointness

Edwin Cheng (Apr 25 2019 at 17:45, on Zulip):

Yes we can :

impl<'a, Q: Querier> TreeSink for TtTreeSink<'a, Q> {
    fn token(&mut self, kind: SyntaxKind, n_tokens: u8) {
        if kind == L_DOLLAR || kind == R_DOLLAR {
            self.token_pos += n_tokens as usize;
            return;
        }

        for _ in 0..n_tokens {
            self.buf += &self.src_querier.token(self.token_pos).1;
            self.token_pos += 1;
        }
        self.text_pos += TextUnit::of_str(&self.buf);
        let text = SmolStr::new(self.buf.as_str());
        self.buf.clear();
        self.inner.token(kind, text);

        // Add a white space to token
        let (last_kind, _, last_joint_to_next ) = self.src_querier.token(self.token_pos-n_tokens as usize);
        if !last_joint_to_next && last_kind.is_punct() {
            let (cur_kind, _, _ ) = self.src_querier.token(self.token_pos);
            if cur_kind.is_punct() {
                self.inner.token(WHITESPACE, " ".into());
            }
        }
    }
Edwin Cheng (Apr 25 2019 at 17:46, on Zulip):

It is the code we transform tt to syntax node by SyntaxNodeBuilder

Edwin Cheng (Apr 25 2019 at 18:00, on Zulip):

However, as you said, we should start from making the lexer only product single char. Let me push a PR for the bugs fixes i did in last few days first. And make the lexer PR afterward.

matklad (Apr 25 2019 at 18:02, on Zulip):

I also wonder what we shoud do first:

matklad (Apr 25 2019 at 18:02, on Zulip):

I have a theory that the first one would be easier if we do the second one first

Edwin Cheng (Apr 25 2019 at 18:08, on Zulip):

The first one is quite straight forward. But the second one we still need to think about the design. The iterator will need to be able peek(3), and i still need to travesal a tree by index. Hm....

matklad (Apr 25 2019 at 18:09, on Zulip):

yeah, I haven't looked closely at that code for a while, so I trust your judgment here!

Edwin Cheng (Apr 29 2019 at 05:56, on Zulip):

Just found that it is a valid macro:

macro_rules! foo {
    ($a:ident, $b:ident, $c:tt) => {

        macro_rules! bar {
            ($bi:ident) => {
                fn $bi() -> u8 {$c}
            }
        }

        bar!($a);
        fn $b() -> u8 {$c}
    }
}

foo!(x,y, 1);

fn main() {
    println!("{}", x() + y());
}
matklad (Apr 29 2019 at 08:41, on Zulip):

oh yeah, macros can totally declare macros

Edwin Cheng (Apr 29 2019 at 12:22, on Zulip):

Finally I fixed all dogfooding bugs in mbe, except doc comment => $meta .
Here is the log of errror : https://gist.github.com/edwin0cheng/3f3f17bba11b8c262bf375895379a60e

Just curious, how would we handle DocComment ?

matklad (Apr 29 2019 at 13:08, on Zulip):

I think we should deshugar doc comments to #[doc = "..."] attributes when we convert syntax tree to tt

matklad (Apr 29 2019 at 13:08, on Zulip):

conversely, when we parse tt as syntax tree, there would be no doc comments, only attributes

Edwin Cheng (Apr 29 2019 at 13:24, on Zulip):

This is a solution I have never not thought of... nice

matklad (Apr 29 2019 at 13:27, on Zulip):

that is basically what rustc's parser is doig

Edwin Cheng (May 04 2019 at 06:04, on Zulip):

Do anyone know a crate /regex to escape a double quote string and unescape a double quote string but leave other escape char intact?
e.g. I said: "she said\" I said \" "\n <-> I said: \"she said\\\" I said \\\" \"\n

matklad (May 04 2019 at 08:10, on Zulip):

Perhaps https://doc.rust-lang.org/std/primitive.str.html#method.escape_default culd help?

Edwin Cheng (May 04 2019 at 08:24, on Zulip):

That one will escape all things, including \n etc.. i was thinking whether we only want to escape double quotes in doc comment de sugaring.

matklad (May 04 2019 at 08:28, on Zulip):

Hm, I think we need to escape all things?

matklad (May 04 2019 at 08:28, on Zulip):

Like, doc comments are not guranteeed to be valid literals

matklad (May 04 2019 at 08:29, on Zulip):
/// invalid escape: \u{zzzzz}
fn foo()
Edwin Cheng (May 04 2019 at 08:35, on Zulip):

Good point!

matklad (May 05 2019 at 17:09, on Zulip):

I am hacking on salsa now a bit, and I am pleasantly surprised by how far our support for macros has gone!

matklad (May 05 2019 at 17:09, on Zulip):

https://youtu.be/akdZefuvBAQ

matklad (May 05 2019 at 17:10, on Zulip):

I blindly goto definition of a field, and it seems to find one inside a real-world syn-macro! (the text range is wrong, but that shouldn't be terribly hard to fix)

matklad (May 05 2019 at 17:10, on Zulip):

and goto definition works for macros by example...

matklad (May 05 2019 at 17:10, on Zulip):

wow!

matklad (May 05 2019 at 18:18, on Zulip):

really, working with syn is so much more pleasant now

Edwin Cheng (May 06 2019 at 17:12, on Zulip):

Hey, finally i would start implement the new iterator approach of the TokenSource trait. (But i will be vacation this week, but let discuss an issues i don't know how to solve first. )
One of the goals of the new api is to simplify tree traversal of subtree (The current SubtreeSource and SubtreeWalker). For bump, we can just push the current level and move to deeper subtree. The hard part is how do we handle lookahead.

The trivial solution is we just look-ahead current level tokens, which is how rustc works. And It should work even multiple byte punct and deeper subtree. If we find that it is a subtree, we just return the delimiter. It is because normally the usage of lookahead is finding what structure is next to current token. On the other hand, this method is very efficient as we only need to do subtree.token_trees[n] .

However, this solution won't work in current implementation of ra because of the DOLLARtokens.
Imagine the following codes:

foo $               $
     1 + $     $ + 1
          a + b

Let say the current token is foo, if we want to lookahead(3), we have to walk the tree step by step. (check it is a subtree, go next level, and again). And it violates the goal to simplify the tree traversal of subtree algorithm.

@matklad Do you have any idea about this problem?

matklad (May 06 2019 at 17:13, on Zulip):

Hm, interesting!

matklad (May 06 2019 at 17:14, on Zulip):

One thing we should do is to take a look at how syn handles parsing

matklad (May 06 2019 at 17:14, on Zulip):

our token-tree is very much a mirror of proc_macro2 tt

matklad (May 06 2019 at 17:14, on Zulip):

and syn does have some nice API for bounded lookahead

matklad (May 06 2019 at 17:14, on Zulip):

I think we need to bound lookahead by some amount

matklad (May 06 2019 at 17:15, on Zulip):

that is, get rid of general lookahead(n: usize)

matklad (May 06 2019 at 17:15, on Zulip):

and have separate lookahead1, lookahead2, lookahead3

matklad (May 06 2019 at 17:15, on Zulip):

We indeed shouldn't descent into trees with non-empty delimiters in lookahead

matklad (May 06 2019 at 17:15, on Zulip):

but for dollars it seems like we should do this?

matklad (May 06 2019 at 17:16, on Zulip):

and that means that yeah, lookahdead for dollars will need to go deeper

Edwin Cheng (May 06 2019 at 17:18, on Zulip):

Then let me study how proc_macro2 tt works in this week first :)

matklad (May 06 2019 at 17:19, on Zulip):

Yeah, I feel like i might need to take a closer look at syn as well

matklad (May 06 2019 at 17:19, on Zulip):

the API is just awesome

Edwin Cheng (May 06 2019 at 17:22, on Zulip):

The API is just awesome

Yes, and quote too. Sometimes while workings on ra_mbe, i wish i could use quote! to build a tt::subtree :)

Edwin Cheng (May 06 2019 at 17:24, on Zulip):

And FYI, i added macro expansion support in my wasm toy : https://web-ra-syntax-node.herokuapp.com/

matklad (May 06 2019 at 17:26, on Zulip):

Hm, why don't we add it to rust-analyzer proper?

matklad (May 06 2019 at 17:27, on Zulip):

I mean we can totally have "expand macro" action/intention

Edwin Cheng (May 06 2019 at 17:30, on Zulip):

Is is possible to create another buffer (like Show SyntaxNode) and display the expanded selected macro invocation?

matklad (May 06 2019 at 17:30, on Zulip):

We do this for "show syntax tree" command

Edwin Cheng (May 06 2019 at 17:32, on Zulip):

So it is possible. Okay, let me add this to my TODO list :)

Jeremy Kolb (May 06 2019 at 17:35, on Zulip):

And FYI, i added macro expansion support in my wasm toy : https://web-ra-syntax-node.herokuapp.com/

Oh that is neat!

Edwin Cheng (May 21 2019 at 04:14, on Zulip):

Here is my study about how syn crate handle the tree-traversal problem:

matklad (May 21 2019 at 07:28, on Zulip):

Hm, TokenStream is basically a TokenTree::Subtree with delimeters = None?

matklad (May 21 2019 at 07:29, on Zulip):

There's a feeling that TokenStream is a historical accident, which is not really required: https://github.com/rust-lang/rust/pull/49597#issuecomment-378409251

Edwin Cheng (May 21 2019 at 07:39, on Zulip):

Hm, TokenStream is basically a TokenTree::Subtree with delimeters = None?

I mean the proc_macro2 TokenStream, which you can think of a stream of tokens.

Edwin Cheng (May 21 2019 at 07:41, on Zulip):

Or we can name it as TokenBuffer

matklad (May 21 2019 at 07:43, on Zulip):

Yeah, this is what I am talking about

matklad (May 21 2019 at 07:44, on Zulip):

TokenStream is just a Vec of tokens, or Group without delimiters: https://github.com/alexcrichton/proc-macro2/blob/373fab08aefccdd76afa29798def239b67783881/src/fallback.rs#L19-L21

matklad (May 21 2019 at 07:44, on Zulip):

so, it's not an essential type for the API

Edwin Cheng (May 21 2019 at 08:04, on Zulip):

Sorry, it is my fault to express my idea not very well (My English is not good). What i really mean is, we use TokenBuffer to replace the tt:TokenTree stored in hir and named it as TokenStream. Although it is not an essential type for the API, but it allow us to prevent the conversion back and forth.

The original reason we have SubtreeWalker , is to prevent the whole tree conversion but i start to believe we should do the conversion anyway.

matklad (May 21 2019 at 08:13, on Zulip):

Ah, yeah, that seems resonable!

matklad (May 21 2019 at 08:13, on Zulip):

Though, my understanding is that TokenBuffer uses unsafe heavily, for performance

matklad (May 21 2019 at 08:14, on Zulip):

I wonder if we should start with slower, but safe impl instead?

matklad (May 21 2019 at 08:14, on Zulip):

That is, can we replicate API of TokenBuffer in safe code?

Edwin Cheng (May 21 2019 at 09:51, on Zulip):

I don't know :) , but maybe i will try to make some prototype first.

OTOH, what is your opinion about the priorities of following things about macros ?
1. Improve name resolutions of mbe, mainly introduce proper macro scopes for local macros (support macro_use)
2. Proper handling different kinds of macro invocations (stmts, expr, pat)
3. NavigationTarget api
4. TokenSource api improvement
5. Start hacking proc-macro

matklad (May 21 2019 at 09:59, on Zulip):

I would like the following priority:

Edwin Cheng (May 21 2019 at 10:25, on Zulip):

Okay :)

matklad (May 21 2019 at 10:27, on Zulip):

To give a broader picture, the current goal of RLS-2.0 is to see what things are possible and not to make a perfect IDE. Experimenting with stuff like hygiene is better aligned with this goal than just making everything correct 100% :-)

Edwin Cheng (May 23 2019 at 16:33, on Zulip):
/// `TokenSource` abstracts the source of the tokens parser operates one.
///
/// Hopefully this will allow us to treat text and token trees in the same way!
pub trait TokenSource {
    type Cursor: TokenCursor;

    fn current(&self) -> TokenCursor;

    /// Lookahead 1 token
    fn Lookahead1(&self) -> TokenCursor;
    /// Lookahead 2 tokens
    fn Lookahead2(&self) -> TokenCursor;
    /// Lookahead 3 tokens
    fn Lookahead3(&self) -> TokenCursor;

    /// bump cursor to next token
    fn bump(&mut self);
}

/// `TokenCursor` abstracts the cursor of `TokenSource` operates one.
pub trait TokenCursor {
    /// What is the current token?
    fn token_kind(&self) -> SyntaxKind;

    /// Is the current token joined to the next one (`> >` vs `>>`).
    fn is_token_joint_to_next(&self) -> bool;

    /// Is the current token a specified keyword?
    fn is_keyword(&self, kw: &str) -> bool;
}

Do you think this new interface reasonable ? @matklad
I remember you said you want to use is_composite instead of is_token_joint_to_next. However, it will push back the responsibility of recognizing composite tokens to mbe and text stream. I don't know whether it is good approach.

matklad (May 23 2019 at 16:36, on Zulip):

Seems good to me!

matklad (May 23 2019 at 16:37, on Zulip):

TokenCursor might be named better, it's not really a courrsor, because it doesn't move. Perhaps just Token?

matklad (May 23 2019 at 16:38, on Zulip):

I think TokenCursor might contain references, so it should be paramterized over a lifetime

Edwin Cheng (May 23 2019 at 16:41, on Zulip):

Yeah, Token is much better name and that should be paramteriaed too. :+1:

Jeremy Kolb (May 23 2019 at 18:17, on Zulip):

maybe is_token_joined_to_next?

Edwin Cheng (May 24 2019 at 16:49, on Zulip):

@Jeremy Kolb Sure

Edwin Cheng (May 24 2019 at 16:50, on Zulip):

@matklad Do you think i should change the Parser to use lookahead_1 instead of nth too ?

matklad (May 24 2019 at 16:51, on Zulip):

Actually, I think it's better to just live nth, but panic if n is too large

Edwin Cheng (May 24 2019 at 16:57, on Zulip):

Okay, and i faced another problem to change the TokenSource interface. The associated item of new TokenSource trait make every parser code to have to include the generic type argument. For example:

fn lhs(p: &mut Parser)

became:

fn lhs<TT>( p: &mut Parser<TT>) where T : Token

I think i will be change it back to return (At least temp):

struct Token {
  token_kind : SyntaxKind,
  is_token_jointed_to_next: bool,
}
matklad (May 24 2019 at 17:10, on Zulip):

yeah, it seems like we want concrete type there!

Edwin Cheng (Jun 02 2019 at 05:10, on Zulip):

I want to work on the following item:

making goto defeinition use the correct ranges (I feel this could be the first step towards supporting hygiene)

If i understand correctly, we should start from refactoring "NavTarget" from the following comment in github, right ?

review NavigationTarget and other stuff in display to make sure it doesn't try to do semantic analysis (this is not really macro-related)

But i am a little bit lost, what do you mean about semantic analysis ? Would you mind to guide me what should i start with ?

matklad (Jun 02 2019 at 09:22, on Zulip):

sure, let me take a look...

matklad (Jun 02 2019 at 09:24, on Zulip):

The problem are node, docs and description methods

matklad (Jun 02 2019 at 09:25, on Zulip):

NavTarget intended to be a simple POD type to communicate with LSP, so it shouldn't really have any semantic meaning

matklad (Jun 02 2019 at 09:25, on Zulip):

The problem with these three methods is that they take &self, db and do some computation

matklad (Jun 02 2019 at 09:26, on Zulip):

But they can't really relibly to extract doc strings, etc

matklad (Jun 02 2019 at 09:27, on Zulip):

what we should do is to add docs and descripiton as fields to NavTarget, and fill them in during construction, when we have much better understand what is the hir entity we construct a nav target for

matklad (Jun 02 2019 at 09:31, on Zulip):

And the reason is why this is related to macros is that NavTarget::node simply doesn't work if there are macros: it purposefully knows only range in the original file, it can't featch the node from macro expansion

Edwin Cheng (Jun 11 2019 at 06:03, on Zulip):

As #1388 is landed, what is the next step ? Remove HirId::as_original_file ?

matklad (Jun 11 2019 at 08:11, on Zulip):

I think that would be the last step

matklad (Jun 11 2019 at 08:12, on Zulip):

first, we need to provide an alternative, and only then remove the old way (migrating all ide_api to it, not only goto def)

matklad (Jun 11 2019 at 08:12, on Zulip):

Let's talk through what needs to be done here!

matklad (Jun 11 2019 at 08:15, on Zulip):

Let's say we invoke goto_defeinition and get a Def back, here: https://github.com/rust-analyzer/rust-analyzer/blob/3f5f9f0560fa662e770b607f05ec4881e4d011c5/crates/ra_ide_api/src/goto_definition.rs#L66

matklad (Jun 11 2019 at 08:16, on Zulip):

That Def might originate in macro expansion, so we can't literally naviagate to it. Rather, we should navigate to the defining macro

matklad (Jun 11 2019 at 08:17, on Zulip):

That means the trick is setting full_range, focus_range and file_id correctly: https://github.com/rust-analyzer/rust-analyzer/blob/3f5f9f0560fa662e770b607f05ec4881e4d011c5/crates/ra_ide_api/src/display/navigation_target.rs#L23-L24

matklad (Jun 11 2019 at 08:22, on Zulip):

The behavior we want here is that we should try to set ranges correctly, falling back to the range of the whole macro invocation if that fails

matklad (Jun 11 2019 at 08:22, on Zulip):

let me give an example

matklad (Jun 11 2019 at 08:23, on Zulip):

Let's say, we have

macro_rules! define_fn {
   ($name:ident) => (fn $name() {})
}

define_fn!(
    foo
)

fn main() {
   foo() // <- user presses goto definition here
}
matklad (Jun 11 2019 at 08:24, on Zulip):

In this case, we should arrive at

define_fn!(
    <*>foo<*> // <- this range
)
matklad (Jun 11 2019 at 08:42, on Zulip):

However, for

macro_rules! define_fn {
   ($name:ident) => (fn $name() {})
}

define_fn!();

fn main() {
   foo()
}

we should navigate to the whole of define_fn!(), because the identifier is created inside the macro

Edwin Cheng (Jun 11 2019 at 08:45, on Zulip):

So the first step should be add a test for that, and implement fn HirFileId::original_file_and_range(&self, db: &RootDatabase, range: TextRange) -> (HirFileId, TextRange), right ?

matklad (Jun 11 2019 at 08:45, on Zulip):

I think the logic for this should be as follows.

When we expand a macro, we remember a mapping of ranges from macro expansion to a macro call (Vec<(TextRange, TextRange)>)

matklad (Jun 11 2019 at 08:45, on Zulip):

yeah, starting with a test is a good idea! Integration test in ra_ide_api would be best

matklad (Jun 11 2019 at 08:46, on Zulip):

as for API I am not exactly sure what we need it

matklad (Jun 11 2019 at 08:46, on Zulip):

original_file_and_range is maybe ok as a high-level API, but I'd also like to think about how low-level API should work

Edwin Cheng (Jun 11 2019 at 08:47, on Zulip):

Sure

Edwin Cheng (Jun 11 2019 at 08:47, on Zulip):

When we expand a macro, we remember a mapping of ranges from macro expansion to a macro call (Vec<(TextRange, TextRange)>)

What is the relationship of it with TokenId ?

matklad (Jun 11 2019 at 08:53, on Zulip):

I think we need HirFileId::parent_expansion() -> Option<(HirFileId, ExpansionInfo)>, as one-step expansion, will be better

matklad (Jun 11 2019 at 08:53, on Zulip):

the ExpansionInfo will have methods to map ranges in both directions

matklad (Jun 11 2019 at 08:54, on Zulip):

What is the relationship of it with TokenId ?

So, TokenIds should be a way implement the mapping yeah

matklad (Jun 11 2019 at 08:55, on Zulip):

when we expand macro, we get a sort of &Token -> &Token map, and we need to somehow reconstruct the ranges map our of that info

Edwin Cheng (Jun 11 2019 at 08:55, on Zulip):

Seem like we already have the TextRange mapping ?
https://github.com/rust-analyzer/rust-analyzer/blob/b79e6294a68fd41f0a3dbd9eb907dfe99646d77e/crates/ra_mbe/src/syntax_bridge.rs#L12

matklad (Jun 11 2019 at 08:55, on Zulip):

It might be the case that we don't need to reconstruct the actual Vec<TextRange, TextRange> mapping

matklad (Jun 11 2019 at 08:55, on Zulip):

Yeah, TokenMap is a bit of that infra

matklad (Jun 11 2019 at 08:56, on Zulip):

but it never was finished :-)

Edwin Cheng (Jun 11 2019 at 08:57, on Zulip):

How about that: Let me combined all information here, and TRY to make an implementation to pass that test I mentioned before, and then discuss it in that PR ?

matklad (Jun 11 2019 at 08:58, on Zulip):

SGTM!

matklad (Jun 11 2019 at 08:58, on Zulip):

Hm, now that I think about it, we used to have this working some time ago

matklad (Jun 11 2019 at 08:58, on Zulip):

but then I've removed it, when we went to a proper macro expansion... Let me dig out some old codes...

matklad (Jun 11 2019 at 09:08, on Zulip):

https://github.com/rust-analyzer/rust-analyzer/commit/5299a35e3dc484ea2e7d42cfeed89aee806425d3

matklad (Jun 11 2019 at 09:08, on Zulip):

so this is a bit from that old implementation

matklad (Jun 11 2019 at 09:09, on Zulip):

it handles extend selection

matklad (Jun 11 2019 at 09:09, on Zulip):

and is probably the most interesting case for macro expanstion

matklad (Jun 11 2019 at 09:10, on Zulip):

it first takes original ranges, transforms it into a range inside the macro, runs expand selection in the macro, and maps the resulting range back

matklad (Jun 11 2019 at 09:10, on Zulip):

so, that's why you need bidirectional mapping

Edwin Cheng (Jun 11 2019 at 09:18, on Zulip):

Um.. interesting. So we need something like this which replace TokenMap ?
https://github.com/rust-analyzer/rust-analyzer/blob/792899587647f5aa0293c2588173677682187c0a/crates/ra_analysis/src/macros.rs#L44

Edwin Cheng (Jun 11 2019 at 09:18, on Zulip):

Or extend TokenMap.

matklad (Jun 11 2019 at 09:25, on Zulip):

yeah, i think we need to extend token map. Or maybe build a ranges map from token map

Last update: Nov 19 2019 at 17:40UTC