Stream: t-compiler/rust-analyzer

Topic: Comparison of rust-analyzer's parser with tree-sitter?

osa1 (Jul 27 2020 at 06:53, on Zulip):

I'm curious how does rust-analyzer's parser compare with tree-sitter in terms of error recovery and incremental parsing? For example, what would be better or worse if I replaced rust-analyzer's parser with tree-sitter-rust?

matklad (Jul 27 2020 at 07:38, on Zulip):

In terms of error recovery, rust-analyzer's parser is better (though this might be an artifact of how a specific grammar is specified in tree-sitter, I haven't looked very closely into this). In terms of incrementality, tree-sitter is better.

Replacing our parser with tree sitter would be much worse, but not because of these two metrics.

matklad (Jul 27 2020 at 07:39, on Zulip):

The amount of accidental complexity to pull in a parser generator is big. It is generally justified when you need to parse a lot of languages. But we only need to parse one, so, in terms of overall cost, I expect a hand-written parser to be much more efficient.

osa1 (Jul 27 2020 at 07:53, on Zulip):

Thanks! Why is error recovery better in rust-analyzer? Is it because we can do language-specific decisions on how to recover (which tree-sitter cannot do as far as I understand from the documentation)? Is there a summary on how error recovery in parsing is done in rust-analyzer currently? Or else where do I look in the source code?

matklad (Jul 27 2020 at 07:57, on Zulip): (from "The most important feature of hand-written parsers is a great support for error recovery and partial parses. It")

matklad (Jul 27 2020 at 07:57, on Zulip):

For the example, rust-analyze gives

    FN_KW@0..2 "fn"
    WHITESPACE@2..3 " "
      IDENT@3..6 "foo"
      L_PAREN@6..7 "("
  WHITESPACE@7..9 "\n\n"
    STRUCT_KW@9..15 "struct"
    WHITESPACE@15..16 " "
      IDENT@16..17 "S"
    WHITESPACE@17..18 " "
      L_CURLY@18..19 "{"
      WHITESPACE@19..23 "\n   "
          IDENT@23..24 "f"
        COLON@24..25 ":"
        WHITESPACE@25..26 " "
                IDENT@26..29 "u32"
      WHITESPACE@29..30 "\n"
      R_CURLY@30..31 "}"
matklad (Jul 27 2020 at 07:57, on Zulip):

tree sitter

source_file [0, 0] - [5, 0])
  ERROR [0, 0] - [4, 1])
    identifier [0, 3] - [0, 6])
    struct_pattern [2, 0] - [4, 1])
      type: type_identifier [2, 0] - [2, 6])
      ERROR [2, 7] - [2, 8])
        identifier [2, 7] - [2, 8])
      field_pattern [3, 3] - [3, 9])
        name: field_identifier [3, 3] - [3, 4])
        pattern: identifier [3, 6] - [3, 9])
osa1 (Jul 27 2020 at 07:58, on Zulip):

Interesting! Thanks for the concrete example and the blog post, I'll read it. What's the source that you're parsing in these examples?

matklad (Jul 27 2020 at 07:58, on Zulip):

The example in the post

osa1 (Jul 27 2020 at 10:46, on Zulip):

Hey @matklad -- thanks for the answers, they're really helpful. I want to look at rust-analyzer source for the error recovery code used in the example above, where should I look?

matklad (Jul 27 2020 at 11:00, on Zulip):

Laurențiu (Jul 27 2020 at 11:04, on Zulip):

Unrelated, but there's this pattern here:

Laurențiu (Jul 27 2020 at 11:04, on Zulip):

eat already checks the current token and returns false if they don't match. EDIT: ah, it's bump, not eat, and I guess it's not so common, so it's probably fine.

Last update: Jul 26 2021 at 12:45UTC