Frankly i don't really know much about compiler internals yet, let alone figuring out how to fit this piece architecturally into the existing rustc architecture. So i opened this major change proposal mostly seeking for mentoring on this.
I implemented the a few earlier bullet points of the RFC: performing normalization at the parser site and proc-macro-server site; linting uncommon codepoints for the name of each item within the lint architecture; this piece seems harder.
nikomatsakis: Well, not necessarily, you can certainly have a query that works the entire AST or something
nikomatsakis: which we would usually write as one that walks each function and invokes a sub-query, to actually get some caching
simulacrum: plausible that this could be embedded into resolution though. We have a pretty global view of all idents there I think
Thanks for the pointers! With your hints I digged into the code a little this morning (in my timezone) and began understanding how we might want to make this work.
Actually i found that there is a
LintBuffer that persisted during the whole early linting stage. This was introduced in
https://github.com/rust-lang/rust/commit/0374e6aab7ef60b523872556ae4aca33c59fbfc9#diff-771d6215d1fb5e7f5503514f9895f7bdR371 and stored as part of the
EarlyContext of linting.
So i can either finish this linting within
librustc_resolve and add any possible outputs to this lint_buffer, or i can keep a big table of symbols in it and leave the checking logic to
Which is better?
I will try to read the code of
librustc_resolve next, it seems
librustc_resolve is already emitting a few lints like
@Charles Lew we should be free to use the early linting stuff
one question I have though is whether we expect the list to be cross-crate
in that case it might be a bit more difficult
The RFC did not elaborate on whether the list should be cross-crate. Personally i think the contents of the crate itself should be enough, but it would be nice if we can include the wildcard
use imported identifiers and macro expanded identifiers, as a bonus point, i think...
Yeah, if it's "things in scope" I imagine that would be simpler than "all identifiers anywhere". OTOH, if you don't get all idents, I guess some of the usefulness of the lint is lost
(since afaict it only warns if you actually have two things -- so if you don't import the right thing because of the unicode confusables then you'd not get the lint)
I'd cc @Vadim Petrochenkov on this :)
nfc_normalize in rustc code.
The locations with
nfc_normalize are exactly the locations where you need to insert the identifier skeletons into some
HashMap<Symbol, SomeDataForDiagnostics> kept in
One more location where we can create arbitrary new identifier tokens is
src\librustc_builtin_macros\concat_idents.rs, but it doesn't use
nfc_normalize because if you concatenate two normalized tokens you get a normalized token as well (if that's not true, then we have bug). So we need to populate the confusable identifier table in that place as well (if we want to warn on macro-produced identifiers at all).
The table can then be processed in an early lint pass that goes through it and reports everything suspicious.
Oh, and I don't think this is a major change that require any special process.
@Vadim Petrochenkov Thank you for your instructions!
Have put up an initial implementation at https://github.com/rust-lang/rust/pull/71542 modulo the