I've finally wrapped up my course about Rust, so I have more time to hack on the analyzer.
I am doing some refactorings now, and observing that it is pretty slow. Will hopefully look into that. I have three hypothesis:
The diagnostics thing is definitely real. I've also noticed sometimes that things get so bad it hangs up new line insertion.
And this is the third hypothesis: slow diagnostics shouldn't block interactive features...
I wonder what's the best way to design for that? I guess, in the main loop, we should have two thread pools:
I think Chalk integration is at least a big part of it, and I think I'll switch it to my fuel branch
Yeah, but I am also pretty sure that we don't do a good job prioritizing queries. For example, when you open the new project and ask for document symbobls, you have to wait until the whole project is processed, but, iirc, this needs only syntax.
I guess, in addition to fuel, we should have "weights", which allow us to selectively slow-down particular subsystems, to make sure that only relevant functionality is degraded, and not the whole server
Am I right in remembering that we do not support the cancellation request? Is it possible that cancellation requests are coming through and we aren't processing them so things are getting backed up?
We don't support client-initiated cancellation, but I think it shouldn't be relevant
I've done some quick profiling and the picture I see is that chalk is sometimes quite slow
2937ms - diagnostics 2243ms - implements_query
There are also quite some queries which take a loong time themselves, but don't have long deps listed
I think this is because thouse queries are actually blocked on that one slow chalk query
8568ms - crate_def_map_query
is this normal?
For the initial load -- yeah
11447ms - diagnostics
yeah, some Chalk queries can basically take arbitrarily long as long as we don't limit them
We probably want to add cancellation support to chalk at some point as well
... maybe we should just drive the chalk solver loop ourselves :thinking:
if we did, cancellation and fuel would be easy
would this be enough?
Like, if the loop yields next solution, there can be arbitrary computations inside (nested loops, for example)
@Florian Diebold could you also create "chalk is slow" issue in the chalk repo? :-) You've looked into it way more than I did, so you should better understand what "slow" means. I hope if we put this on the radar for the chalk developers, they'll come up with some nice optimizations :-)
they are usually fast, though, I think. I'm not very familiar with the details, but when it needs to solve a nested goal, it just does one step there and then returns a 'no results yet, might have more' error (https://github.com/rust-lang/chalk/blob/3238c6d0bc7d09b6eebe069ebd47dfabae787082/chalk-engine/src/logic.rs#L51)
hm yeah, maybe I should try to reproduce some slow cases in the chalk test harness
Just to clarify, the slowness is due to chalk exploring some exponential search space, and not due to the fact that some simple optimizations (aka accidentally quadratic) are missing?
My guess would be that Chalk is really inefficient in searching some goals, instead of it getting too much input
Yeah, the non-enumerable goals problem is one where Chalk is looking at
Sized(?T) first, which is a really bad search order
the cases I've seen are mostly that you have something like
where ?0: Foo, ?0: Send and it tries enumerating every single type that implements
Send, even though there might be just a few that implement
Foo; or variations of that
which is exacerbated by us having lots of cases where there is no solution, so in our case
?0: Foo might actually have no solution and it really tries to enumerate all types
Not sure if changing the ordering would help in that case
if we know that a certain clause has no solutions at all, we can immediately stop, right?
Ah yes, it'll know whether
Foo is implemented, and only search through the
Send part if some of those impls depend on that
This seems similar to RDBMS's join problem, can we solve it same fashion? I.e. estimating number of impls per trait
that's the direction that https://github.com/rust-lang/chalk/issues/80 goes in, I think
https://github.com/frankmcsherry/blog/blob/master/posts/2018-05-19.md#addendum-2018-05-21-treefrog-leapjoin is an interesting read on the topic
ugh, so slow
you're right I just flipped trace on and cancellation is pretty rare
cargo check block typing (e.g. adding a newline) in RA?
No I don't believe that it does. I've seen this happen with
cargo watch disabled.
As well as no issue when running
cargo check while typing
I just saw RA thinking that
cargo check is running while it wasn't
rust-analysis a good way to bench parts of ra?
ran valgrind massif and callgrind on top of it as it analyzed ra's repo an hour ago. looks like it captured a good bit of data. I won't be available to look closer until this afternoon though
attaching a zip containing data and two screenshots (of kcachegrind and massif-visualizer) if anyone can make better use of them in the meantime ra-massif-callgrind.zip
@Laurențiu Nicola Currently the toggling of animation about
cargo check is just a simple string comparison to
'[Finished running'. So maybe it fail to parse it. However, it should not related to this issue.
@matklad did you intend to disable Chalk in https://github.com/rust-analyzer/rust-analyzer/commit/26463f189ff75e92990375ee5ae08d3903d39e66?
that one time when you decide that it's fine to send a trivial PR without bors...
valgrind --tool=callgrind ~/projects/contribute/rust-analyzer/target/release/ra_cli analysis-stats on a handful of crates, such as RA itself, cranelife, crossterm, and regex.
for each project,
ra_parser::parser::Parser::nth is calling a handful of functions from
ra_mbe::subtree_source that make-up for the majority of instruction reads/calls, drastically outnumbering everything else
rust-analysis is doing.
rust-analysis on a fresh
cargo new <crate> without any changes. it reported 696965 calls to
WalkCursor::current, 9689599 calls in
WalkerOwner::get from within
is_token_joint_to_next, 15248879 calls in
WalkerOwner::get from within
<ra_mbe::subtree_source::SubtreeTokenSource as ra_parser::TokenSource>::token_kind in an empty fresh project. running a
perf record also has
ra_mbe::subtree_source::WalkerOwner::get at the top of the list of overhead.
disclaimer: today is the first time I've used callgrind, so although the above stood out, I might be missing something :)
I think callgrind gives you precise statistic regarding the number of calls, but is not very precise when it comes to real time measurnments
I assume that includes
rust-src? Is the standard library code mbe-heavy?
makes sense, the manual said to focus more on the instruction reads/Ir in the source view, but the above functions stood out in that regard too.
rust-src sounds like a possibility. I need to look in what all
analysis-statsas a command includes. noticed it used in another PR for a bench and rolled with it