Stream: t-compiler

Topic: pretty-printing comments


matklad (May 06 2019 at 18:18, on Zulip):

@Vadim Petrochenkov I am continuing my work on extracting the lexer, and the next major road-block seems to be this little gem:

https://github.com/rust-lang/rust/blob/c3b8ab5199af4a3c11d14b0cbdb17a641e8eee71/src/libsyntax/parse/lexer/comments.rs#L350

My understanding is that this function exists to make pretty-printing of existing rust code more sane, by trying hard to maintain comments and exact form of literals. It is pretty hacky and I wonder if we can somehow refactor it away...

Where exactly do we use pretty printing? My understanding is that macro expansion goes via pprinting, and it probably doesn't need comments. Perhaps it doesn't need exact forms of literals as well? Is there anything else which needs comments?

matklad (May 06 2019 at 18:19, on Zulip):

As to why this is a roadblock: this function heavily relies on lexer's internal state, which I would like to get rid of. I can hack around it, of course, but I wonder if there's a clender solution, given that this functions smells of technical debt pretty heavily...

Vadim Petrochenkov (May 06 2019 at 21:58, on Zulip):

Ah, I remember I've seen that thing once when fixing some issues in pretty-printing.

Vadim Petrochenkov (May 06 2019 at 22:02, on Zulip):

Remaining uses of pretty-printing are
1) print/re-tokenize fallback for proc macros when the tokens are not available directly
2) printing snippets for error reporting, but AFAIK it's usually (and better) done by fetching the snippet from the codemap by span now.
3) literally pretty-printing (--pretty=expanded, etc), and this one looks really bad without comments

Vadim Petrochenkov (May 06 2019 at 22:04, on Zulip):

Are Literals collected by gather_comments_and_literals really used?

Vadim Petrochenkov (May 06 2019 at 22:06, on Zulip):

pprust certainly has logic for pretty-printing literals from AST which is good enough.
So the literal part would be the first candidate for removal.

Vadim Petrochenkov (May 06 2019 at 22:15, on Zulip):

Also, if we want to pass precise tokens to proc macros (https://github.com/rust-lang/rust/issues/43081) we'll need to keep both the original and escaped versions of literals somehow, and the simplest way is to keep both versions in AST `Lit.

Vadim Petrochenkov (May 06 2019 at 22:16, on Zulip):

This would mostly fix https://github.com/rust-lang/rust/issues/60495 as well (no need to re-escape, since we have the originals).

Vadim Petrochenkov (May 06 2019 at 22:30, on Zulip):

Yeah, keep the original literals in AST and kill the comments::Literal code and escaping code.

Vadim Petrochenkov (May 06 2019 at 22:32, on Zulip):

Perhaps I'll do it myself tomorrow, should be easy and fun.

Vadim Petrochenkov (May 06 2019 at 22:32, on Zulip):

Not sure about comments though.

varkor (May 06 2019 at 22:49, on Zulip):

speaking of pretty-printing comments, if you're refactoring, there's this ancient bug that's related: https://github.com/rust-lang/rust/issues/6678

matklad (May 07 2019 at 12:41, on Zulip):

Good, cc @matklad if you get to fixing the literals. I think handing only comments after that would be much simpler. And also don't see an easy and better way to handle comments for --pretty-expanded. Switching wholesale to libsyntax2 is def not easy :-)

Might be a good idea to check how rustfmt handles comments though

Vadim Petrochenkov (May 07 2019 at 23:26, on Zulip):

Some status update:
I tried to replace "semantic" AST literals (ast::Lit) with their token versions (token::Lit), rather than keep them both together in AST and the result is pretty good so far.
It should be possible to use literals as tokens at least until lowering to HIR with few minor issues:
- A fake AST-only Lit::Bool is needed
- The result of include_bytes!() is not precisely representable as a token (can be non UTF-8), has to be escaped, then unescaped again later during lowering to its semantic form.
- Literals in built-in attributes apparently should to be compared semantically (#[inline = "\x61lways"] == #[inline = "always"]) (at least for compatibility), and attributes are used through the whole compilation, but are represented as AST. Annoying.

Vadim Petrochenkov (May 07 2019 at 23:30, on Zulip):

So it may be reasonable to add the token representations to AST first, then optimize by gradually refactoring away all the uses of non-token representation (need to check perf though), I'm going to do that tomorrow or on the holidays.

matklad (May 08 2019 at 11:05, on Zulip):

It should be possible to use literals as tokens at least until lowering to HIR

Love this!

Vadim Petrochenkov (May 17 2019 at 00:09, on Zulip):

@matklad
So, I tried to address https://github.com/rust-lang/rust/pull/60679#discussion_r283094395 (the lack of parse session in attr/mod.rs) and it mostly works (https://github.com/petrochenkov/rust/commits/apa).

Vadim Petrochenkov (May 17 2019 at 00:09, on Zulip):

But at what cost!

Vadim Petrochenkov (May 17 2019 at 00:13, on Zulip):

You have to pass the session around a lot.
There's a partial remedy though - parse attributes once during lowering to HIR and use only their lowered semantic forms after that.

Vadim Petrochenkov (May 17 2019 at 00:15, on Zulip):

(Still need to figure out what to do with attributes parsed during rustdoc rendering though, it doesn't have a parse session, perhaps I need to create one.)

matklad (May 17 2019 at 04:49, on Zulip):

Cool! I wonder if we really need to pass in the whole ParseSession?
Shouldn’t just diagnosis handler be enough? The parser API currently
requires the session, but if previous API worked only with handler, perhaps
the Parser API for attrs could be tweaked?

Vadim Petrochenkov (May 17 2019 at 09:11, on Zulip):

I wanted to eliminate (Attribute/MetaItem/NestedMetaItem)::from_tokens which duplicate equivalent parser methods.

Vadim Petrochenkov (May 17 2019 at 09:12, on Zulip):

Duplicate equivalent parser methods, but without access to anything - no parser, no parse session, no diagnostic handler.

Vadim Petrochenkov (May 17 2019 at 09:15, on Zulip):

To use the equivalent parser methods we need a Parser which we create using the passed ParseSession.

Vadim Petrochenkov (May 17 2019 at 09:16, on Zulip):

We can create a fresh ParseSession for each attribute as well instead of passing the existing one, but that would be more expensive (need to measure).

Vadim Petrochenkov (May 17 2019 at 09:22, on Zulip):

To resolve the original issue (no diagnostic handler to pass to Lit::from_token), we can indeed pass the handler only (no more convenient than passing parse session) or create a fresh handler, but that wouldn't allow to get rid of the duplicate parsing code in ...::from_tokens.

matklad (May 17 2019 at 12:52, on Zulip):

Yeah, but I wonder does the code inside Parser that handle attributes really needsParseSession?

That is, can we:

I don't really understand all of the dependencies here, so this might not work for some good reasons

matklad (May 17 2019 at 12:53, on Zulip):

In other words, can we make bothParser and attrs/mod.rs to just call into separate meta-item parsing routins, which require only a subset of parser's functionality?

Vadim Petrochenkov (May 17 2019 at 15:07, on Zulip):

Attributes include paths, and path grammar, in its turn, includes everything in generic arguments.

Vadim Petrochenkov (May 17 2019 at 15:12, on Zulip):

And new parse_attr_path ignoring generics in addition to the normal parse_path would look pretty similar to Attribute::from_tokens.

Vadim Petrochenkov (May 17 2019 at 15:13, on Zulip):

As I understand it, making the handler mandatory in attrs/mod is not easier than making parse session available, so we can go for latter basically.

Vadim Petrochenkov (May 17 2019 at 15:16, on Zulip):

(I didn't yet figure out what happens if you create a fresh diagnostics handler and use it instead of the primary one in session.parse_sess.diagnostic_handler.)

matklad (May 18 2019 at 09:30, on Zulip):

Attributes include paths, and path grammar, in its turn, includes everything in generic arguments.

aha, that makes sense!

Last update: Nov 22 2019 at 04:40UTC