Stream: t-compiler/rust-analyzer

Topic: parsing in r-a


kev (Dec 11 2020 at 14:37, on Zulip):

I've just made a summary of the first two phases that matklad drew. Please let me know if you see any mistakes or if it is just wrong.

text (base_db)

Stores the raw text (lexeme and tokens in a database). This is the result of the lexer.

Capabilities: You can search for tokens and reconstruct the raw source file that you parsed with this.



Note: Unlike Some parsers, it stores the whitespace too. The position/span of a node is not explicit in this representation. So with rast, we see WHITESPACE@120..121 . This can be derived from iterating through all of the nodes on this layer and using their lengths.



CST (syntax)



This is the result of the parser. It span information here is relative. So for example WHITESPACE@120..121 is the span of a whitespace node, but it is relative to all of the other nodes. Therefore, if we insert a whitespace into the file, the CST will be invalidated, but the base_db will not since in the base_db the position is implicit; based on the length of each node.

Capabilities: There are no new capabilities with this AFAICT. You can search for Expressions and do all of the regular operations you can do on an AST.



Note: Since this gets invalidated on every keystroke, it should be very quick to build. It is more or less an inconvenience structure that we only need to build per the file that is open.

Jonas Schievink [he/him] (Dec 11 2020 at 14:40, on Zulip):

The text query is really just the raw file text, unparsed and unlexed

kev (Dec 11 2020 at 14:45, on Zulip):

Jonas Schievink said:

The text query is really just the raw file text, unparsed and unlexed

Oh I see, thanks for correcting. I thought that this section was about GreenNodes and not salsa

Last update: Jul 26 2021 at 14:15UTC