Stream: t-compiler/wg-rls-2.0

Topic: rust-analyzer#923: producing nice code for refactorings


matklad (Mar 03 2019 at 20:29, on Zulip):

This is the stream to discuss rust-analyzer#923. This issue would be the beginning for the code formatting subsystem, so if you are into designing code formatters, feel free to suggest solutions!

pnkfelix (Mar 04 2019 at 13:03, on Zulip):

Hmm. Would it be bad to try to generalize rustfmt to be a library that could handle this problem?

pnkfelix (Mar 04 2019 at 13:05, on Zulip):

(It seems like the most interesting part, from an API design POV, is determining how to describe the context of the snippet of code you want to insert. Obviously you could just feed in the entire surrounding source code, but I would hope one could avoid it.)

matklad (Mar 04 2019 at 13:12, on Zulip):

Yeah, this is about generalizing rustfmt essentially. That is, rustfmt2 should have an API for "reformat this syntax node"

matklad (Mar 04 2019 at 13:13, on Zulip):

I don't think that feeding surrounding code is a problem per-se. Rather, it's modifications of surrounding code that we need to avoid

matklad (Mar 04 2019 at 13:14, on Zulip):

But it shouldn't be too hard to condense surrounding code to a description like "the indent level is N, the immediate predecessor has this type, the immediate successor has this type"

matklad (Mar 04 2019 at 13:14, on Zulip):

(you need predecessor/successor information to add correct amount of whitespace around the newly inserted node)

matklad (Mar 04 2019 at 13:15, on Zulip):

And yeah, I think the red-green style syntax trees would be a good fit for rustfmt, so I game for rewriting rustfmt long/medium term as well =)

vipentti (Mar 04 2019 at 13:15, on Zulip):

How much do we want to specialize formatting for rust-analyzer ? Do we want to fix the simple common cases such as auto indenting but leave more advanced formatting for rustfmt 2.0 API ?

matklad (Mar 04 2019 at 13:16, on Zulip):

Specifically, currently rustfmt fails if there are syntax errors in the file, but, with rust-analyzer style trees, this should be easy to fix.

matklad (Mar 04 2019 at 13:16, on Zulip):

@vipentti I think we should start with just auto-indenting

matklad (Mar 04 2019 at 13:16, on Zulip):

When we finalize the syntax-tree data structure, I think we should rewrite rustmft

vipentti (Mar 04 2019 at 13:18, on Zulip):

Hmm, could the formatting benefit from implementing some structured editing API in rust-analyzer#779 ?

matklad (Mar 04 2019 at 13:19, on Zulip):

@vipentti I doubt it actually: formatting is mostly about modifying trivia, so structured editing does not help much

matklad (Mar 04 2019 at 13:20, on Zulip):

There are cases like "adding trailing comma", where structured editing would be helpful, but there aren't too many such cases.

matklad (Mar 04 2019 at 13:20, on Zulip):

indentation and line breaking are much more difficult parts of code formatting.

vipentti (Mar 04 2019 at 13:23, on Zulip):

Would we want to "fix" wrong indentation as well ? e.g.

fn foo() {
struct Bar;

                         impl Bar {}
}
vipentti (Mar 04 2019 at 13:24, on Zulip):

(when using add-impl for example)

matklad (Mar 04 2019 at 13:24, on Zulip):

So, if struct Bar is pre-existing code, and impl Bar is the newly added code, we should not touch struct Bar, but we should properly indent impl Bar

vipentti (Mar 04 2019 at 13:28, on Zulip):

I imagine this requires us to traverse up the tree to determine the actual indentation for the impl Bar unless we want to just indent it "properly" regards the fn foo() in this case. Which may or may not be properly indented itself

vipentti (Mar 04 2019 at 13:30, on Zulip):
mod bar {
        fn foo() {
        struct Bar; // using add impl
            impl Bar {} // Gets indented here
        }
}
matklad (Mar 04 2019 at 13:30, on Zulip):

I think indentation should be relative to the parent item.

matklad (Mar 04 2019 at 13:31, on Zulip):

That is, we traverse not to the root, but to the "owner" node (ast::Block in this case), and figure out the indentation relative to that

Cole Lawrence (Apr 02 2019 at 16:29, on Zulip):

I'm interested in taking a look into this. I'm going to see where I get with replicating the tests linked in the issue and then report back.

matklad (Apr 02 2019 at 16:31, on Zulip):

Cool! just yesterday I was looking at various assits, whihc have formatting code, like this interspersed with the logic. It would be awesome to get rid of this!

Cole Lawrence (Apr 02 2019 at 16:33, on Zulip):

Thanks for the heads up, I'm going to start by looking for instances of that I think about how that could be reused. Then, I'll come up with something and seek feedback on my approach.

matklad (Apr 02 2019 at 16:37, on Zulip):

That is, something like "introduce a variable" should produce a poorly-formatted syntax tree with variable inserted, ask the formatting subsystem to fix this single node, and use the fixed result as the final result.

Cole Lawrence (Apr 02 2019 at 16:49, on Zulip):

So, in this case,

mod g {
    struct A<|> {

    }
}

Add derive

mod g {
    #[derive(<|>)]
struct A {

    }
}

I should think about it more like:
1. Insert the necessary symbols (produces what is seen above) and then
2. Look at formatting just the #[derive(<|>)] to } lines?

Cole Lawrence (Apr 02 2019 at 16:50, on Zulip):

And perhaps look into a way to input the parent node as a parameter to gain indentation information from?

matklad (Apr 02 2019 at 16:50, on Zulip):

yeah, perhaps at just reformatting the derived node

matklad (Apr 02 2019 at 16:51, on Zulip):

That is, the assist should produce

mod g {
       #[derive()] struct A {
       }
}

and formatter should add a newline

matklad (Apr 02 2019 at 16:52, on Zulip):

Nodes know their parents, so it is always possible to take a look at the context

Cole Lawrence (Apr 02 2019 at 16:52, on Zulip):

In the end, we could simply remove the whitespace counting code (like the one you referenced just a moment ago), and format.

Cole Lawrence (Apr 02 2019 at 16:53, on Zulip):

Do you think I'd be able to simply leverage the existing fmt code in RA? Or would some modifications need to be made?

matklad (Apr 02 2019 at 16:54, on Zulip):

I am afraid pretty substantial modifications would be needed, if we want ot reuse rustfmt

matklad (Apr 02 2019 at 16:55, on Zulip):

feature wise,

matklad (Apr 02 2019 at 16:55, on Zulip):

These two points imply that we should use our syntax tree for formatting

matklad (Apr 02 2019 at 16:56, on Zulip):

and changing rustfmt's syntax tree would be a huge task

matklad (Apr 02 2019 at 16:56, on Zulip):

So, I think it makes sense to start experimenting with from-scratch formatter which does only subset of things

matklad (Apr 02 2019 at 16:56, on Zulip):

to nail down the API, etc.

matklad (Apr 02 2019 at 16:57, on Zulip):

after we are comfortable with that, I'd game for just rewriting rustfmt as a library on top of ra_syntax

Cole Lawrence (Apr 02 2019 at 17:00, on Zulip):

Okay, that sounds like a good long-term plan. For now, perhaps I'll start looking at creating a simple function which takes in the node to be formatted. For now, this function will correct whitespace issues, but it will be designed to eventually support large parts of rustfmt's capabilities

matklad (Apr 02 2019 at 17:01, on Zulip):

sgtm!

Cole Lawrence (Apr 02 2019 at 17:01, on Zulip):

In RA, would it be more common to mutate the node, return back a clone with changes, or return a diff to be applied?

Cole Lawrence (Apr 02 2019 at 17:02, on Zulip):

I haven't started looking deeply into it, yet. But if we can retrieve the parent node from a child node, it seems like it could be a simple fn format(&config, &node) -> Node
Or maybe for now, fn format(&node) -> Node

Cole Lawrence (Apr 02 2019 at 17:04, on Zulip):

I'll leave the configuration passing choices up for a later discussion...

matklad (Apr 02 2019 at 17:04, on Zulip):

That is a good question, which I feel would be the core of the design.

Syntax trees in Rust-Analyzer are an immutable data structure, so you can't mutate them in place. You can, however, cheaply create a copy of a file with some subtree substituted.

Cole Lawrence (Apr 02 2019 at 17:05, on Zulip):

Can I create a copy of a node with a child node replaced cheaply?

matklad (Apr 02 2019 at 17:06, on Zulip):

I feel formatting should be split into two bits:

For adjusting ws, I think a nice API would be to walk a tree and compute, for each whitespace token, and edit whihc should be applied to each token

matklad (Apr 02 2019 at 17:06, on Zulip):

yeah, you can replace children cheaply

matklad (Apr 02 2019 at 17:06, on Zulip):

Though, there's really no good high-level API for it

matklad (Apr 02 2019 at 17:07, on Zulip):

you could take a look at the reparsing module, which does tree modification for incremental reparsing

matklad (Apr 02 2019 at 17:09, on Zulip):

I also feel that for formatting specifically, a mutable tree might be a good fit.

Formatting than could be cast as applyng a series of transformations until a fixed point. That is, you can, for example, traverse the tree, apply the required indent as you go, and looking at the parent indent, without maintaining an explicit diff

matklad (Apr 02 2019 at 17:09, on Zulip):

It's interesting that IntellIJ, for formatting, builds a separate so-called formatting block tree, and runs on it

Cole Lawrence (Apr 02 2019 at 17:10, on Zulip):

Is it trivial to track cursor location in the existing node structures?

matklad (Apr 02 2019 at 17:10, on Zulip):

Is it trivial to track cursor location in the existing node structures?

now

matklad (Apr 02 2019 at 17:10, on Zulip):

In general, tracking anything after edit is hard

matklad (Apr 02 2019 at 17:11, on Zulip):

The problem is, when you replace a node, you get back a file, and you need to find the "clone" of the node in it yourself

Cole Lawrence (Apr 02 2019 at 17:12, on Zulip):

I could definitely see this mutable tree structure (similar to IntelliJ's) also being able to keep track of the cursor.
Especially since assists provide cursor locations after application, and if you reformat your file, it'd be nice to not lose your cursor location.

matklad (Apr 02 2019 at 17:12, on Zulip):

I think it would be a good idea to take a closer look at IntelliJ's formatted: https://www.jetbrains.org/intellij/sdk/docs/tutorials/custom_language_support/formatter.html. Not that it has good docs :)

Cole Lawrence (Apr 02 2019 at 17:12, on Zulip):

I don't quite sure I have a full picture of what is available, but that's my first thought.

Cole Lawrence (Apr 02 2019 at 17:13, on Zulip):

I'll look into that formatter reference. I'm sure I'll get a few ideas

matklad (Apr 02 2019 at 17:13, on Zulip):

To be fair, I also don't quite understand how this formatting API should look like. I guess we'll have to churn through a couple of different approaches before we find a good one!

Cole Lawrence (Apr 02 2019 at 17:14, on Zulip):

I'm completely on board with trial-and-error

matklad (Apr 02 2019 at 17:18, on Zulip):

The more I think about it, the more I like the idea of creating a mutable formatting tree on top of immutable syntax tree, and working with that...

That should give us a pretty flexible API.

matklad (Apr 02 2019 at 17:21, on Zulip):

So formatting works like this:

matklad (Apr 02 2019 at 17:22, on Zulip):

A good thing would be if trailing whitespaces are unrepresentable in the format tree :D

Cole Lawrence (Apr 02 2019 at 17:23, on Zulip):

"computes the edit for the diff" would the diff be a vec of AtomicTextEdits?

matklad (Apr 02 2019 at 17:23, on Zulip):

yeah

matklad (Apr 02 2019 at 17:24, on Zulip):

fundamtelly, the result of any operation should be a diff, b/c that's what we send to the client

matklad (Apr 02 2019 at 17:24, on Zulip):

but I think it's a good idea to work with a stateful trees internally

Cole Lawrence (Apr 02 2019 at 17:26, on Zulip):

Why not track ws in the shadow formatting tree? Then, we could simply update the ws nodes as we walk through the tree...

Cole Lawrence (Apr 02 2019 at 17:27, on Zulip):

I'm thinking that the whitespace could potentially influence the resulting format.

Cole Lawrence (Apr 02 2019 at 17:28, on Zulip):

A good example of this is dartfmt which gives the author a lot of control over the formatting by preserving some whitespacing and correcting others based on that.

Cole Lawrence (Apr 02 2019 at 17:28, on Zulip):

https://github.com/dart-lang/dart_style#readme

Cole Lawrence (Apr 02 2019 at 17:31, on Zulip):

Anyway, just a trail of thoughts. We won't know much until I've banged around in the code for a little bit ;-)

matklad (Apr 02 2019 at 17:31, on Zulip):

We should track ws in shadow tree, but we should track it in a more structured way

matklad (Apr 02 2019 at 17:32, on Zulip):

or maybe not, I don't know.

What I know is that the repr in syntax tree, where ws is just a string, is annoying to work with, because computing indent is a pain.

Cole Lawrence (Apr 02 2019 at 17:37, on Zulip):

That makes sense to me

matklad (Apr 02 2019 at 17:39, on Zulip):

@Cole Lawrence I guess a concrete first step here is to think hard about the problem, identify a small slice we can tackle and open a GitHub issue with the design/PR. Oh, and INtelliJ Rust formatter is here, if I haven't linked it before: https://github.com/intellij-rust/intellij-rust/tree/master/src/main/kotlin/org/rust/ide/formatter

Cole Lawrence (Apr 02 2019 at 18:43, on Zulip):

Does it make sense to consider macros at all at this point?

Cole Lawrence (Apr 02 2019 at 18:50, on Zulip):

Are there goals for others to extend rls? For example, to provide formatting for their macro libraries, or provide assistance for generating Serde serializers based on a JSON Schema?

Cole Lawrence (Apr 02 2019 at 20:42, on Zulip):

What does compute_ws do exactly? I'm having trouble understanding what purpose this has.

matklad (Apr 02 2019 at 21:02, on Zulip):

Now, macros are definitely out of scope

matklad (Apr 02 2019 at 21:03, on Zulip):

Compute_ws returns the space between two tokens. Like “no space before closing }

matklad (Apr 02 2019 at 21:04, on Zulip):

As for the extension, that should be possible, but in a very far future:)

Cole Lawrence (Apr 02 2019 at 21:27, on Zulip):

I'm having a hard time understanding the relationship between HIR, AST, SyntaxTokens, and the "Source Text" (String) levels of representation.

My understanding is that our refactoring code currently (mostly) operates at the "Source Text" level, by looking at the derived HIR data (e.g. implementations, references, resolved type, etc.) AND THEN sending back up the changes to the raw "Source Text" directly.
So...
Would it be desirable to instead commit our refactorings at the HIR level, and then derive the necessary "Source Text" changes from the HIR tree changes?
My thought process is that if we could refactor at the HIR level, we could theoretically know that the result will be "correct" (maybe incomplete).
For example, "introduce as local variable" should be able to be applied to any expression, but identifying what are expressions at the "Source Text" level forces us to reimplement some of our parsing logic.
This could also start to close the gap between compiler suggestions and the operations if compiler suggestions are communicated as HIR changes.

Cole Lawrence (Apr 02 2019 at 22:13, on Zulip):

For example, the way TypeScript handles the communication between codefixes and the compiler https://github.com/Microsoft/TypeScript/tree/master/src/services/codefixes

matklad (Apr 03 2019 at 06:06, on Zulip):

I would say that we have only two principally different levels of representation: Source and HIR.

For Source, we have several different views into the data: text (string), untyped SyntaxNode tree, typed tree of ast::FooBar values. However these are all views: there's one to one bidirectional correspondence between text and both trees (and typed tree uses literally the same representation as untyped tree). The core property of this layer is that it preserves all information from the source code exactly: all the comments and whitespace are in the tree.

HIR, in contrast, is by design a lossy implementation: different source might result in the same HIR. For this reason, it is important that the final transformation to source is computed at the level of the source code: if you transform HIR directly and map it back to source, you'll lose all the comments.

What we can and should do, though, is to use HIR to compute all the info, required to apply refactoring. See https://github.com/rust-analyzer/rust-analyzer/issues/1095 for an example of this. The struct fields example highlights the difference between HIR and source: when we add missing fields, we ideally should add them in the same order as the one in the struct declaration. However, HIR does not care about the order of fields, only syntax tree does. So, they need to work together to produce the desired result.

but identifying what are expressions at the "Source Text" level forces us to reimplement some of our parsing logic.

Note that this is not exactly what happens there: we should identify a specific expression at the cursor position, and that is very source-level specific. If the cursor is exactly between two nodes, you need to check both the left node and the right node, for example.

Last update: Nov 12 2019 at 17:05UTC