Stream: t-compiler/rust-analyzer

Topic: Completion refactoring


Kirill Bulatov (Oct 17 2020 at 11:22, on Zulip):

I have a recuring feeling that something is odd with the way we try to apply the completions and the recent related MR https://github.com/rust-analyzer/rust-analyzer/pull/6262 had given me this feeling again, so I want to discuss it a bit and hear others' thoughts.

Currently, we deal with the completions relatively simple: https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/ide/src/completion.rs#L120
First, we create two objects: a CompletionContext with all the data the completions need and a mut Completions object to insert the completion options into.
Then both objects get passed to each completion, that contains an internal logic for everything, related to a certain completion.

In general, it works fine before we start to add exceptions and cover the corner cases.
What's more interesting, some corner cases cases are still not covered, and when another appears others' code bloats.

Example: attribute text completion. Before that, nobody cared what was completed after #[ and we did propose to complete a lot there, even though it was not correct.
When I've added this completion, I've had to do the following:

IMO, this all scales pretty bad: CompletionContext now has 48 fields, the majority of them are used either in a single particular completion module (to derive the completion data out of it) or in every module, but just to check that it's appropriate to complete here. Also, some completion modules now have 5 such checks and will get some more, most probably.

My proposal: add a trait similar to

trait Completion {
    type CompletionData;
    fn derive_completion_data(ctx: &CompletionContext) -> Option<Self::CompletionData>;
    fn complete(acc: &mut Completions, ctx: &CompletionContext) -> Option<()>;
}

so that we can move the majority of the CompletionContext fields into CompletionData + remove all the checks there too.
This way we can allow no completion-specific data in the context: let it all to be derived during the CompletionData and leave only the shared things like Semantics, RootDatabase and similar.

This approach does not seem to solve the checks issue entirely, we still have to check for some extra things inside derive_completion_data, but IMO a better api can solve this: since the derive_completion_data method is focused on getting the data, it should be possible.

Any thoughts or ideas on the topic? (maybe @matklad ?)

Kirill Bulatov (Oct 17 2020 at 11:26, on Zulip):

For a context, I'm slowly working on the automatic use insertion during the completion and this particular task is very good at showing the places we have to restrict from having unnecessary completions:

Screenshot-2020-10-17-at-14.23.40.png

Screenshot-2020-10-17-at-14.23.59.png

But with the current approach this means plenty of new fields added into the CompletionContext + plenty of extra checks to add into the existing completions, if we want to make it proper.

Florian Diebold (Oct 17 2020 at 12:00, on Zulip):
  1. the no_completion_required flag added in that PR doesn't seem like a good idea to me. IMO the particular completions should be more careful about when they show up instead. For example if we apparently complete unqualified paths in the for i i<|> case, that seems to me like a bug in the code setting is_trivial_path or is_pat_binding_or_const, since neither of these should be true in that case.
  2. IMO the many fields of CompletionContext aren't necessarily a problem, though some of them could maybe be combined into enums instead. I don't really see how the trait would work; the CompletionData has to be kept somewhere, doesn't it? If this is just about grouping the data, we could just use structs and functions without a trait. I'm not that convinced that it would make it clearer though.
  3. why is this more of a problem with automatic use insertion than before? shouldn't autoimport just add to the existing dot and unqualified path completions? E.g. why would we suddenly complete a static function like handle_alloc_error on a dot receiver?
Florian Diebold (Oct 17 2020 at 12:10, on Zulip):

some of them could maybe be combined into enums instead

e.g. I could imagine having just one field that says what kind of identifier we are on (unqualified path, segment of a qualified path, method/field name, item name, attribute, etc.)

popzxc (Oct 17 2020 at 12:49, on Zulip):

I have no strong opinion on the design decisions of the rust analyzer, so my thoughts are just thoughts of the project user who sporadically submits pull request to add a required feature / fix a problem I personally met.

Regarding the no_completion_required flag I partially agree that it isn't a correct way to deal with the problem. The correct way is to not provide a suggestion in these cases at all.
However, the correct way has two significant issues:

  1. In the current state, I found the infrastructure to be centered on the "what can we suggest to user?" rather than "what should we not?". Suggestions are spawned from different places and in the current form implementing the same fix would require checking the same thing in the multiple places, which is much less convenient and would have resulted in a bigger amount of code.
  2. The problem already exists and affects the UX. Discussing and implementing a fully correct way to deal with the issue could have resulted in longer time to fix the problem.

What I've tried to do in my pull request is to achieve the required functionality without integrating the solution deeply into the code base. It can be quickly removed as soon as proper solution is found, and it should work fine for the nearest future.

popzxc (Oct 17 2020 at 12:49, on Zulip):

Regarding the problem itself, my vision is that the current approach indeed lacks a few abstractions. I think currently CompletionContext attempts to do two jobs at once: understand which types of hints are applicable in the current position, and which hints can be produced. It could be a simpler problem, if spawning suggestions was done in several subsequent steps, like:

  1. Check whether types of completions are applicable in the current context (types / keywords / variables / etc).
  2. Create a list of completions for the required types only.
  3. Provide an individual score for each suggestion, depending on both completion itself and context in which completion is going to be applied. First point estimates how close suggestion is to what user typed so far (which later may evolve in a sort of fuzzy search algorithm), while the second point thinks how good the suggestion would work in the code, being placed there.

With each step isolated, and it probably would be less error-prone and easier to scale, as the amount of data required for each individual step would be lesser.

Kirill Bulatov (Oct 17 2020 at 12:50, on Zulip):

// My answer to initial Florian's reply, looks like it took me too long to type the reply :)

  1. I will look for the bug later, thank you for checking.
  2. The trait is an attempt to make the addition of the new completion module simpler, closer to the way we do it for assists and diagnostics: it takes to add a new module and a one-line change to register the module in the parent module.
    For completions it takes a new module, sometimes changes to other modules, because you see that they complete in that place too; usually changes in the completion context and lastly the registering one-liner.

Writing the new assist also does not involve scrolling 40-something fields in search of a proper combination to extract the data/check the applicability.

Feels a bit more cumbersome, so I look for ways to improve the situation with a common trait, similar to our diagnostics approach.

The data stays in the actual types for CompletionData, the struct can be defined in the same new module.

  1. Maybe a bad example, my bad.
matklad (Oct 17 2020 at 21:32, on Zulip):

Haven't caught up with the discussion, but I am noticing that completion is slow to compile if you add debut prins to type infernece and such

matklad (Oct 17 2020 at 21:32, on Zulip):

in particular, it uses compiles assists and ssr

matklad (Oct 17 2020 at 21:33, on Zulip):

perhaps its time to move completion to a separate crate

popzxc (Oct 18 2020 at 07:48, on Zulip):

I am potentially interested in working on the completions module. If there will be agreement on the proposed design, I can nominate myself to bring it to the life.

matklad (Oct 18 2020 at 08:13, on Zulip):

Cool! Still hanvn't caught up in the discussion, but I think, regardless of what design we choose, it would be better to split completions into a crate before that

matklad (Oct 18 2020 at 08:14, on Zulip):

That is, there's some obvious work to be done even before having consensus on design :)

popzxc (Oct 18 2020 at 10:13, on Zulip):

Done, I guess. https://github.com/rust-analyzer/rust-analyzer/pull/6276

matklad (Oct 18 2020 at 13:20, on Zulip):

Finally got the time to catch up with the discussion :0)

100% agree that current architecture of completions is a bit of a mess. I also think that the current state of completion might be a mess. Up until relatively recently, completions from rust-analyzer were a sort of fragile miracle, so we tried to keep as many variants as possible. Today, when coverage of the base language is pretty thorough, it makes sense to focus on better filtering and sorting -- not providing erroneous completions, putting good stuff on top. This might require architectural rethinking.

What I like about current architecture is how all completion variants are independent. I like how we "compose" them together with simple function calls here:

https://github.com/rust-analyzer/rust-analyzer/blob/886cfd68212bb0b4487d6a822476c350a6eb114f/crates/completion/src/lib.rs#L121-L135

Each specific completion kind also doesn't look terrible to me:

https://github.com/rust-analyzer/rust-analyzer/blob/886cfd68212bb0b4487d6a822476c350a6eb114f/crates/completion/src/complete_unqualified_path.rs#L9-L40

(well, attributes and keywords might be exceptions, they do feel messy).

What I don't like is the initial filling of completion context, there's definitely some room for improvement here:

https://github.com/rust-analyzer/rust-analyzer/blob/886cfd68212bb0b4487d6a822476c350a6eb114f/crates/completion/src/completion_context.rs#L350-L499

Finally, I love how low-tech the current infra is -- it's just a bunch of function calls. I think we should strive to preserve low-techness and avoid traits :-) Assists infra might be a good example here. I am pretty sure that whatever overall code shape we design, it would be implementable without traits :-)

One plausible issue with Ctx is that it stores one-shot fields, fields used only in a single completion kind. I think we should move this fields to be just local variables in corresponding completion variants. This probably means that file_with_fake_ident should become a field of Ctx such that completions can inspect it. This seems like a piece of work which can be done in isolation. I wonder how much thinner can we make CompletionCtx after that?

The second issue is that a lot of completion functions have clauses like

if in_edge_case1 || in_edge_case2 || in_edge_case3 {
   return None
}

Intuitively, such cutting off of cases that won't work seems fragile. I'd much rather prefer to see

if !(in_applicable_case) {
   return None
}

Ie, instead of listing all of the cases that won't work, we list only specific case that will work. It's worth investigating why this approach doesn't work well. I have a hypothesis that this is because our parser does too good error recovery.

For example, for for _ intelliJRulezzz, we get a tree which looks like this

FOR_EXPR
  PLACEHOLDER_PAT "_"
  PATH_EXPR
    IDENT         "intelliJRulezzzz"

Kotlin's parser gets this instead:

FOR_EXPR
  PLACEHOLDER_PAT "_"
  SYNTAX_ERROR
    IDENT         "intelliJRulezzzz"

We should change the parser to be more like IntelliJ. I don't think there's a general rule here, but I also feel that there maybe many specific rules, fixing all of which would get rid of most of the bad ifs.

So, action items would be:

These are very tactical small steps, but I'd be curious where the full series of such steps leads to.

popzxc (Oct 20 2020 at 04:39, on Zulip):

@matklad OK, now with the completions crate moved out, I guess the next step is what you mentioned in the comment: https://github.com/rust-analyzer/rust-analyzer/pull/6276#issuecomment-711148963

I'm not sure if I'll have the time to work on it till Saturday, but hopefully will do it this weekend.

matklad (Oct 20 2020 at 09:09, on Zulip):

Yup, that would be a good point! If you feel like refactoring the physical architecture more, shortening that compilation critical path would be useful too!

Josh Mcguigan (Mar 08 2021 at 04:15, on Zulip):

I found this thread when trying to understand some things about the structure of the completions code. Specifically I was working on adding scoring for function calls.

https://github.com/JoshMcguigan/rust-analyzer/compare/master...JoshMcguigan:complete-fn-score

I think this additional scoring would be useful, as it ensure functions with the desired return type are sorted to the top of the completions. But I found what feels like the cases being discussed above regarding the fragility and over-connectedness of the various completion types. Specifically, I had to add an active_name method, even though there is already an active_name_and_type method, because sometimes the active_name_and_type method returns None in places that my completion cares about. So the completion I was working on uses this new active_name method and self.completion.expected_type.

I've spent some time trying to understand why the type stored in the expected_type field on CompletionContext would be different than the type returned by active_name_and_type, but I haven't been able to figure out if that difference is intentional. It seems like this is a case of some completions want slightly different information than others.

Just to clear up my own understanding, should it be possible for all completions to work against a single expected_type? Does active_type mean something different than expected_type?

matklad (Mar 09 2021 at 20:21, on Zulip):

@Josh Mcguigan urgh, yeah, you are correct, expected_name_and_type doesn't make sense, we should just use the one recorded on the ctx I think

Last update: Jul 27 2021 at 21:30UTC