Stream: t-compiler/wg-rls-2.0

Topic: Comparison of rust-analyzer's parser with tree-sitter?


osa1 (Jul 27 2020 at 06:53, on Zulip):

I'm curious how does rust-analyzer's parser compare with tree-sitter in terms of error recovery and incremental parsing? For example, what would be better or worse if I replaced rust-analyzer's parser with tree-sitter-rust?

matklad (Jul 27 2020 at 07:38, on Zulip):

In terms of error recovery, rust-analyzer's parser is better (though this might be an artifact of how a specific grammar is specified in tree-sitter, I haven't looked very closely into this). In terms of incrementality, tree-sitter is better.

Replacing our parser with tree sitter would be much worse, but not because of these two metrics.

matklad (Jul 27 2020 at 07:39, on Zulip):

The amount of accidental complexity to pull in a parser generator is big. It is generally justified when you need to parse a lot of languages. But we only need to parse one, so, in terms of overall cost, I expect a hand-written parser to be much more efficient.

osa1 (Jul 27 2020 at 07:53, on Zulip):

Thanks! Why is error recovery better in rust-analyzer? Is it because we can do language-specific decisions on how to recover (which tree-sitter cannot do as far as I understand from the documentation)? Is there a summary on how error recovery in parsing is done in rust-analyzer currently? Or else where do I look in the source code?

matklad (Jul 27 2020 at 07:57, on Zulip):

https://matklad.github.io/2018/06/06/modern-parser-generator.html#abandoning-cfg (from "The most important feature of hand-written parsers is a great support for error recovery and partial parses. It")

matklad (Jul 27 2020 at 07:57, on Zulip):

For the example, rust-analyze gives

SOURCE_FILE@0..22597
  FN_DEF@0..7
    FN_KW@0..2 "fn"
    WHITESPACE@2..3 " "
    NAME@3..6
      IDENT@3..6 "foo"
    PARAM_LIST@6..7
      L_PAREN@6..7 "("
  WHITESPACE@7..9 "\n\n"
  STRUCT_DEF@9..31
    STRUCT_KW@9..15 "struct"
    WHITESPACE@15..16 " "
    NAME@16..17
      IDENT@16..17 "S"
    WHITESPACE@17..18 " "
    RECORD_FIELD_DEF_LIST@18..31
      L_CURLY@18..19 "{"
      WHITESPACE@19..23 "\n   "
      RECORD_FIELD_DEF@23..29
        NAME@23..24
          IDENT@23..24 "f"
        COLON@24..25 ":"
        WHITESPACE@25..26 " "
        PATH_TYPE@26..29
          PATH@26..29
            PATH_SEGMENT@26..29
              NAME_REF@26..29
                IDENT@26..29 "u32"
      WHITESPACE@29..30 "\n"
      R_CURLY@30..31 "}"
matklad (Jul 27 2020 at 07:57, on Zulip):

tree sitter

source_file [0, 0] - [5, 0])
  ERROR [0, 0] - [4, 1])
    identifier [0, 3] - [0, 6])
    struct_pattern [2, 0] - [4, 1])
      type: type_identifier [2, 0] - [2, 6])
      ERROR [2, 7] - [2, 8])
        identifier [2, 7] - [2, 8])
      field_pattern [3, 3] - [3, 9])
        name: field_identifier [3, 3] - [3, 4])
        pattern: identifier [3, 6] - [3, 9])
osa1 (Jul 27 2020 at 07:58, on Zulip):

Interesting! Thanks for the concrete example and the blog post, I'll read it. What's the source that you're parsing in these examples?

matklad (Jul 27 2020 at 07:58, on Zulip):

The example in the post

osa1 (Jul 27 2020 at 10:46, on Zulip):

Hey @matklad -- thanks for the answers, they're really helpful. I want to look at rust-analyzer source for the error recovery code used in the example above, where should I look?

matklad (Jul 27 2020 at 11:00, on Zulip):

https://github.com/rust-analyzer/rust-analyzer/blob/bc9fab156596d05ddb6b3fa57acd0fbd0755f2a0/crates/ra_parser/src/grammar/items.rs#L322-L352

Laurențiu Nicola (Jul 27 2020 at 11:04, on Zulip):

Unrelated, but there's this pattern here: https://github.com/rust-analyzer/rust-analyzer/blob/bc9fab156596d05ddb6b3fa57acd0fbd0755f2a0/crates/ra_parser/src/grammar/items.rs#L347-L351

Laurențiu Nicola (Jul 27 2020 at 11:04, on Zulip):

eat already checks the current token and returns false if they don't match

Last update: Sep 27 2020 at 14:30UTC