Stream: t-compiler/major changes

Topic: Major change proposal: Support collecting compiler-team#276


Charles Lew (Apr 19 2020 at 02:13, on Zulip):

Frankly i don't really know much about compiler internals yet, let alone figuring out how to fit this piece architecturally into the existing rustc architecture. So i opened this major change proposal mostly seeking for mentoring on this.

I implemented the a few earlier bullet points of the RFC: performing normalization at the parser site and proc-macro-server site; linting uncommon codepoints for the name of each item within the lint architecture; this piece seems harder.

Charles Lew (Apr 19 2020 at 03:50, on Zulip):

nikomatsakis: Well, not necessarily, you can certainly have a query that works the entire AST or something
nikomatsakis: which we would usually write as one that walks each function and invokes a sub-query, to actually get some caching

simulacrum: plausible that this could be embedded into resolution though. We have a pretty global view of all idents there I think

Thanks for the pointers! With your hints I digged into the code a little this morning (in my timezone) and began understanding how we might want to make this work.

Charles Lew (Apr 19 2020 at 03:57, on Zulip):

Actually i found that there is a LintBuffer that persisted during the whole early linting stage. This was introduced in
https://github.com/rust-lang/rust/commit/0374e6aab7ef60b523872556ae4aca33c59fbfc9#diff-771d6215d1fb5e7f5503514f9895f7bdR371 and stored as part of the EarlyContext of linting.

Charles Lew (Apr 19 2020 at 04:10, on Zulip):

So i can either finish this linting within librustc_resolve and add any possible outputs to this lint_buffer, or i can keep a big table of symbols in it and leave the checking logic to librustc_lint.

Which is better?

Charles Lew (Apr 19 2020 at 04:15, on Zulip):

I will try to read the code of librustc_resolve next, it seems librustc_resolve is already emitting a few lints like UNUSED_IMPORTS, UNUSED_LABELS, UNUSED_MACROS, etc.

simulacrum (Apr 19 2020 at 12:12, on Zulip):

@Charles Lew we should be free to use the early linting stuff

simulacrum (Apr 19 2020 at 12:13, on Zulip):

one question I have though is whether we expect the list to be cross-crate

simulacrum (Apr 19 2020 at 12:13, on Zulip):

in that case it might be a bit more difficult

Charles Lew (Apr 19 2020 at 14:08, on Zulip):

The RFC did not elaborate on whether the list should be cross-crate. Personally i think the contents of the crate itself should be enough, but it would be nice if we can include the wildcard use imported identifiers and macro expanded identifiers, as a bonus point, i think...

simulacrum (Apr 19 2020 at 15:03, on Zulip):

Yeah, if it's "things in scope" I imagine that would be simpler than "all identifiers anywhere". OTOH, if you don't get all idents, I guess some of the usefulness of the lint is lost

simulacrum (Apr 19 2020 at 15:03, on Zulip):

(since afaict it only warns if you actually have two things -- so if you don't import the right thing because of the unicode confusables then you'd not get the lint)

nikomatsakis (Apr 20 2020 at 20:00, on Zulip):

I'd cc @Vadim Petrochenkov on this :)

Vadim Petrochenkov (Apr 21 2020 at 16:58, on Zulip):

@Charles Lew

Vadim Petrochenkov (Apr 21 2020 at 17:01, on Zulip):

Search for nfc_normalize in rustc code.
The locations with nfc_normalize are exactly the locations where you need to insert the identifier skeletons into some HashMap<Symbol, SomeDataForDiagnostics> kept in ParseSession.

Vadim Petrochenkov (Apr 21 2020 at 17:05, on Zulip):

One more location where we can create arbitrary new identifier tokens is src\librustc_builtin_macros\concat_idents.rs, but it doesn't use nfc_normalize because if you concatenate two normalized tokens you get a normalized token as well (if that's not true, then we have bug). So we need to populate the confusable identifier table in that place as well (if we want to warn on macro-produced identifiers at all).

Vadim Petrochenkov (Apr 21 2020 at 17:05, on Zulip):

The table can then be processed in an early lint pass that goes through it and reports everything suspicious.

Vadim Petrochenkov (Apr 21 2020 at 17:08, on Zulip):

Oh, and I don't think this is a major change that require any special process.

Charles Lew (Apr 22 2020 at 01:25, on Zulip):

@Vadim Petrochenkov Thank you for your instructions!

Charles Lew (Apr 25 2020 at 02:07, on Zulip):

Have put up an initial implementation at https://github.com/rust-lang/rust/pull/71542 modulo the concat_idents part.

Last update: May 07 2021 at 07:45UTC