Stream: t-compiler

Topic: parser/lexer structure


Russell Cohen (May 07 2020 at 14:55, on Zulip):

I was working on a lexer PR and realized that the parser uses the lexer API in a fairly strange way -- the lexer maintains a cursor structure that't intended to read through the entire input, but no usages ever read more than one token out of it. (They reinitialize it every time)

I assume it's a relic of a different refactoring. I suspect it could even be a mild performance issue since we _reallocate_ the Cursor for every single token.

Is this on purpose? Should this be cleaned up?

eddyb (May 07 2020 at 15:01, on Zulip):

cc @matklad

matklad (May 07 2020 at 15:03, on Zulip):

Are you talking about this Cursor: https://github.com/rust-lang/rust/blob/4802f097c86452cd2e09d44e88dbcb8e08266552/src/librustc_lexer/src/cursor.rs#L7-L12 ?

Russell Cohen (May 07 2020 at 15:04, on Zulip):

Yeah -- that cursor, and this usage: https://github.com/rust-lang/rust/blob/master/src/librustc_parse/lexer/mod.rs#L121

matklad (May 07 2020 at 15:04, on Zulip):

It is intended to work only for a single token. The benefit of the current interface is that it's easily restartable from any point

matklad (May 07 2020 at 15:05, on Zulip):

(as opposed to an interface, which just gives you a stateful iterator of all tokens in the input)

Russell Cohen (May 07 2020 at 15:05, on Zulip):

:+1: makes sense

matklad (May 07 2020 at 15:05, on Zulip):

Mostly, Cursor is just an impl detail of rustc_lexer, the real interface is just "give me the first token for this input".

matklad (May 07 2020 at 15:06, on Zulip):

This interface is meaningfully more restricted than "give me an interator of tokens", becuase you can't, for example, count parenthesis in lexer, and that is a good thing, if, for exapmle, you want to incrementally re-lex a substring

Last update: May 29 2020 at 18:05UTC