Stream: t-compiler/wg-rls-2.0

Topic: benchmark for rust-analyzer#1331


nikomatsakis (May 30 2019 at 12:35, on Zulip):

So @matklad if I wanted to do some benchmarking here, what would you suggest? I saw the regression test you pointed at for adding a newline, should I just try to convert that to a benchmark?

matklad (May 30 2019 at 12:35, on Zulip):

yeah, that would be a good start!

matklad (May 30 2019 at 12:36, on Zulip):

I am not sure we are really ready to adding "true" benchmarks to rust-analyzer's CI, but we certantly can write benches for one-off profiling

matklad (May 30 2019 at 12:37, on Zulip):

Let me breifly explain how https://github.com/rust-analyzer/rust-analyzer/blob/6b88735fe6cd3b259816c7c90a2675ee057c9e4c/crates/ra_lsp_server/tests/heavy_tests/main.rs#L349-L411 works.

matklad (May 30 2019 at 12:37, on Zulip):

it is an integration test, which creates a real instance of rust-analyzer, which reads real files from the hard drive

matklad (May 30 2019 at 12:38, on Zulip):

In particular, it reads sources of standard library from the sysroot, so it can be used for load testing

matklad (May 30 2019 at 12:38, on Zulip):

server is basically a channel to which you can send LSP requests and read responses

matklad (May 30 2019 at 12:41, on Zulip):

for benchmarking salsa, I think what we should do is:

matklad (May 30 2019 at 12:42, on Zulip):

The subtle bit here is that diagnostics computation is a side-effect of a change, it's not a direct response to some request. The logic which kicks diagnostic computation is here: https://github.com/rust-analyzer/rust-analyzer/blob/6b88735fe6cd3b259816c7c90a2675ee057c9e4c/crates/ra_lsp_server/src/main_loop.rs#L293-L301

matklad (May 30 2019 at 12:43, on Zulip):

You also would probably want to look at the logs/profiling info. The relevan env-vars look like this:

env RA_PROFILE='*>16' RUST_LOG=ra_lsp_server=info

(more infor about cryptic *>16 syntax in the docs/dev/README.md)

matklad (May 30 2019 at 12:44, on Zulip):

You can set the logs in the tests:

https://github.com/rust-analyzer/rust-analyzer/blob/6b88735fe6cd3b259816c7c90a2675ee057c9e4c/crates/ra_lsp_server/tests/heavy_tests/main.rs#L20-L22

matklad (May 30 2019 at 13:02, on Zulip):

Oh, and a good way to benchmark from-scratch analysis is this command cargo run --release -p ra_cli -- analysis-stats ../chalk/. But yeah, for salsa we specifically interested in incremental analysis, and that's harder to set up

matklad (May 30 2019 at 13:03, on Zulip):

that makes we wonder if we should add a --diff argument to the analysis-stats command, which allows one to check re-analysis as well?

nikomatsakis (May 31 2019 at 09:06, on Zulip):

that makes we wonder if we should add a --diff argument to the analysis-stats command, which allows one to check re-analysis as well?

seems like a good idea, @matklad

nikomatsakis (May 31 2019 at 09:07, on Zulip):

I didn't get a chance to do much yesterday but i'm tinkering with this a bit now

nikomatsakis (May 31 2019 at 09:07, on Zulip):

I could imagine trying to add a --diff to that command -- but i'm first trying it out to see what it does =)

nikomatsakis (May 31 2019 at 09:07, on Zulip):

Have you seen the benchmark setup we have for perf.rust-lang.org?

nikomatsakis (May 31 2019 at 09:07, on Zulip):

(In particular, the patches)

nikomatsakis (May 31 2019 at 09:07, on Zulip):

Might be nice to be able to be compatible with that at some point

matklad (May 31 2019 at 10:31, on Zulip):

No, I haven't seen that. I am somewhat skeptical that we can come up with really good benchmarks for latency of IDE features.

testing rustc with patch is easy: you need to check that full compilation is fast.

testing ra would be tricky, because you also have to select the subset of operations you want to do. The common scenario is something like "user types something, it kicks diagnostic processing, user asks for completion, and diagnostics and completion now race against each other". Setting up such scenarios is not just "invoke all of the queries".

What we can and should cerntaly do is to check that diagnostics and highlights for a single file after update have reasonable performance. (as diagostics and highlihgts exercise most of the queries). However, while this is the easiest thing to test, it also is the least useful one, as diagnostics can be computed in the background totally fine.

What we really interested in are real-time features, like completion or that dreaded on-enter handler. For those, I think investing in observability of the live server is more important than synthetic benchmarks (by the way, I love how RA_PROFILE shows real-time profiles, that's super useful)

matklad (May 31 2019 at 10:57, on Zulip):

Also, it should probably be called --patch and not --diff

nikomatsakis (May 31 2019 at 12:34, on Zulip):

well I spent the morning just kind of reading into RA code and seeing how the pieces fit together

nikomatsakis (May 31 2019 at 12:35, on Zulip):

I didn't actually do any measuring yet

nikomatsakis (May 31 2019 at 12:35, on Zulip):

I certainly take your point that capturing the "real world" experience is complex

nikomatsakis (May 31 2019 at 12:35, on Zulip):

Though it still seems to me that you could measure a lot of individual things, the total of which are important -- e.g., how fast can we do completions with base X and edit Y

nikomatsakis (May 31 2019 at 12:35, on Zulip):

(and perhaps a stream of edits)

Florian Diebold (May 31 2019 at 12:41, on Zulip):

It should be possible to do a 'typing' benchmark where we e.g. add a single letter and then request diagnostics, code actions and a 'real-time' request like onEnter or completions, right? and e.g. simulate typing a whole function that way

matklad (May 31 2019 at 12:42, on Zulip):

Yeah, that should be possible.

matklad (May 31 2019 at 12:42, on Zulip):

I also wonder if we could capture a trace of real interraction, and have a benchmark for "how fast can we replay the trace"

matklad (May 31 2019 at 12:44, on Zulip):

Yeah, perhaps I should stop complaining that it's hard and .. just add a benchmark? Seems totally doable

matklad (May 31 2019 at 12:45, on Zulip):

I guess, my experience with IntelliJ is harming me in this case: I haven't sloved benchmarking of IntelliJ Rust at all, but a significant part of that was that

matklad (May 31 2019 at 12:46, on Zulip):

Btw, @Florian Diebold I must say that analysis-stats is an awesome tool!

matklad (May 31 2019 at 12:46, on Zulip):

Thanks!

Last update: Nov 12 2019 at 15:50UTC