So @matklad if I wanted to do some benchmarking here, what would you suggest? I saw the regression test you pointed at for adding a newline, should I just try to convert that to a benchmark?
yeah, that would be a good start!
I am not sure we are really ready to adding "true" benchmarks to rust-analyzer's CI, but we certantly can write benches for one-off profiling
it is an integration test, which creates a real instance of rust-analyzer, which reads real files from the hard drive
In particular, it reads sources of standard library from the sysroot, so it can be used for load testing
server is basically a channel to which you can send LSP requests and read responses
for benchmarking salsa, I think what we should do is:
DidOpenTextDocumentrequest. This should trigger syntax highlighting and diagnosits
DidChangeTextDocumentnotification, which changes whitespace in the document
The subtle bit here is that diagnostics computation is a side-effect of a change, it's not a direct response to some request. The logic which kicks diagnostic computation is here: https://github.com/rust-analyzer/rust-analyzer/blob/6b88735fe6cd3b259816c7c90a2675ee057c9e4c/crates/ra_lsp_server/src/main_loop.rs#L293-L301
You also would probably want to look at the logs/profiling info. The relevan env-vars look like this:
env RA_PROFILE='*>16' RUST_LOG=ra_lsp_server=info
(more infor about cryptic
*>16 syntax in the
You can set the logs in the tests:
Oh, and a good way to benchmark from-scratch analysis is this command
cargo run --release -p ra_cli -- analysis-stats ../chalk/. But yeah, for salsa we specifically interested in incremental analysis, and that's harder to set up
that makes we wonder if we should add a
--diff argument to the
analysis-stats command, which allows one to check re-analysis as well?
that makes we wonder if we should add a
--diffargument to the
analysis-statscommand, which allows one to check re-analysis as well?
seems like a good idea, @matklad
I didn't get a chance to do much yesterday but i'm tinkering with this a bit now
I could imagine trying to add a
--diff to that command -- but i'm first trying it out to see what it does =)
Have you seen the benchmark setup we have for
(In particular, the patches)
Might be nice to be able to be compatible with that at some point
No, I haven't seen that. I am somewhat skeptical that we can come up with really good benchmarks for latency of IDE features.
testing rustc with patch is easy: you need to check that full compilation is fast.
testing ra would be tricky, because you also have to select the subset of operations you want to do. The common scenario is something like "user types something, it kicks diagnostic processing, user asks for completion, and diagnostics and completion now race against each other". Setting up such scenarios is not just "invoke all of the queries".
What we can and should cerntaly do is to check that diagnostics and highlights for a single file after update have reasonable performance. (as diagostics and highlihgts exercise most of the queries). However, while this is the easiest thing to test, it also is the least useful one, as diagnostics can be computed in the background totally fine.
What we really interested in are real-time features, like completion or that dreaded on-enter handler. For those, I think investing in observability of the live server is more important than synthetic benchmarks (by the way, I love how
RA_PROFILE shows real-time profiles, that's super useful)
Also, it should probably be called
--patch and not
well I spent the morning just kind of reading into RA code and seeing how the pieces fit together
I didn't actually do any measuring yet
I certainly take your point that capturing the "real world" experience is complex
Though it still seems to me that you could measure a lot of individual things, the total of which are important -- e.g., how fast can we do completions with base X and edit Y
(and perhaps a stream of edits)
It should be possible to do a 'typing' benchmark where we e.g. add a single letter and then request diagnostics, code actions and a 'real-time' request like onEnter or completions, right? and e.g. simulate typing a whole function that way
Yeah, that should be possible.
I also wonder if we could capture a trace of real interraction, and have a benchmark for "how fast can we replay the trace"
Yeah, perhaps I should stop complaining that it's hard and .. just add a benchmark? Seems totally doable
I guess, my experience with IntelliJ is harming me in this case: I haven't sloved benchmarking of IntelliJ Rust at all, but a significant part of that was that
Btw, @Florian Diebold I must say that
analysis-stats is an awesome tool!