Not sure whether this is the right place to ask—if not please do point me elsewhere! Just wondering whether there are any efforts underway (or if not, whether it is on any roadmap) to implement something akin to "live unit testing" for Rust (where the IDE continuously updates visual indicators of unit test results AS ONE CODES: see for example https://wallabyjs.com/ and https://docs.microsoft.com/en-us/visualstudio/test/live-unit-testing )? Compilation time obviously a hinderance but perhaps moving toward a realistic possibility with https://github.com/bjorn3/rustc_codegen_cranelift/issues/1087 ... cc @bjorn3
Seems like this would require running the compiler on not-yet-saved files, aka VFS support
and that, yeah :)
@Jonas Schievink Perhaps. Or else mirroring the source files in an on-disk cache to which modifications (not yet saved to the working tree) are stored and compiling that (I believe this is how Wallaby does it, via IDE plugins).
yeah, but that's pretty error-prone
This is a nice idea! When using nightly rustc, VFS support is easy to implement. All you have to do is implement
FileLoader and pass it to
rustc_driver in a custom driver. This is pretty much what rls does as far as I know. As for the referenced cg_clif issue, I made some progress on lazy compilation. The same Cranelift changes necessary for this would enable hot code swapping from the Cranelift side. I don't know how easy it will be to abuse incremental compilation for this purpose though, but it is certainly high on my todo list.
we don't call rustc in-process, so it's not so easy :grimacing:
If only rust-analyzer were to require nightly...
Maybe someone could take https://github.com/rust-analyzer/rust-analyzer/pull/5765 and push it over the edge
I don't see pinning r-a to nightly Rust versions as too big of a problem, as long as we only update the version when really necessary.
(I feel like we update dependencies way too frequently, my target dir is over 30 GB already)
@Jeremy Kolb the test explorers are great but do they update live? I think this requires instrumentation in order to determine which tests have been affected by any given modifications, or else one will need to rerun the entire suite after every edit. Perhaps @Rich Kadel's work on https://github.com/rust-lang/rust/issues/79121 could help here
That would require pretty far-reaching code analysis
@Jonas Schievink Or just run tests (with instrumentation) and record what code regions each has exercised/covered
I'm not sure but my guess is that it's probably not live.
I'd say the pieces for this are coming together, but we're probably still a few years away from being able to do this
Sounds like a github issue sketching out the design and dependencies could be helpful?
heh, I suggested something like this a while back: https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/Speculative.20CI.3F/near/200533333
ugh, the official discord hid the #infra channel for some reason so you can't look at it anymore
I had found a project doing this in python
@Joshua Nelson I think in the context of CI this doesn't really help. For running unit-tests live while editing code, it's plausible because we can compile everything, run all tests and collect coverage, and then know which tests cover which code (though even that would be relying on the tests being deterministic). But I don't think you can reliably do the analysis which tests depend on which code statically, so this wouldn't help you reduce CI times
right, this is dynamic - you need state between CI runs
to know which tests depend on which code, and also which files have been modified
hmm, I didn't think about non-determinism
even then -- if your IDE live-testing doesn't run some test it should, it's a slight annoyance. If the CI doesn't run some test it should, that's a huge problem
With lazy compilation instrumenting would be trivial. It wouldn't even be necessary to add instrumentation calls in the jitted code. Just record the function call when you lazily compile a function.
:thinking: how would that work if you're running multiple tests -- you wouldn't know that the second test also called the same function, would you?
You could reset the GOT used for swapping all calls from the lazy compilation stub to the jitted code back to the lazy compilation stubs after every test and then as optimization keep a pointer to the jitted code in a side-table.
Opened bjorn3/rustc_codegen_cranelift#1113 for the instrumentation.
@bjorn3 I guess you can do it at the function level, but instrumenting code paths within each function provides much richer feedback: eg test coverage and failure paths
@eggyal True, but it also has much higher overhead. When trying to use cg_clif as frontend for yorick a while ago, I had to instrument the start of each basic block. When the instrumentation function immediately returned without doing anything, the overhead was 15% in the instrumentation function. Which doesn't account for any pessimizations of regalloc. When instrumentation was disabled at runtime using a global flag, the overhead was 30% after writing inline asm to prevent cranelift from spilling registers clobbered in the enabled case. Maybe the overhead could be improved by being smarter about where to add the instrumentation calls. One instrumentation call at the top of each function may be more doable.
Or maybe only instrument user code and keep all code in dependencies uninstrumented?
Thanks @bjorn3, that's definitely significant. However I'd observe from using Wallaby that, after the initial run of the full test suite, one rarely sees more than a few unit tests invoked by any given edit... and, being unit tests, execution typically is of the order of milliseconds—so even a 30% uplift may not be very material in practice?
@eggyal The overhead may not be very important in this case. Also thinking about it some more, the overhead can be significantly reduced in this case by simply having a global for every instrumentation point and writing directly to that global without doing a call each time.
I actually don't think we need any fancy techniques here
We can just re-use our
checkOnSave infra, but with using
cargo test instead of
I think the most work here is getting the UI bits (which are not part of standard LSP, and as such would require a bunch of custom code)
@matklad that would rerun all the tests, right? We've been discussing how to only rerun changed tests.
But I agree an opt-in way to rerun everything is good in the meantime, since incremental retesting is a bit of a research project
The discussion converged to that, but we don't need that to implement the original feature request
Just revisiting this a bit... @matklad you mentioned most the work would be on UI as reporting test results is not in standard LSP, but VS does have Live Unit Testing for other languages so I guess MS have an API for it even if it’s not public/open? Is there any way we could find out more, rather than (as you say) reinvent the wheel (which appears to be Wallaby’s approach)?
I think just thoroughly googing around and reading the sources of other extensions would do the trick
(lurking). If I understand this discussion right, VSCode has an issue for standardizing the test interface: https://github.com/microsoft/vscode/issues/107467
(it's part of the January 2021 iteration plan: https://github.com/microsoft/vscode/issues/112419)
My mind was just wandering over this once again, and on re-reading the thread I think it may be worth adding to what bjorn3 said:
simply having a global for every instrumentation point and writing directly to that global without doing a call each time.
I believe this is exactly how
-Zinstrument-coverage works, albeit that the counters are incremented via LLVM intrinsics that obviously aren't available in cg-clif; furthermore that approach does not instrument every block as the counts for many can be calculated from those of others (eg in
if ... else only one branch need be counted as the other is simply the difference between the parent and that counted one). Might it be worth adding similar instrumentation intrinsics to cg-clif? That feels like something I could take a crack at.
@eggyal For determining which functions are used, only per-function instrumentation is necessary, not per-block instrumentation. I do see value in full
-Zinstrument-coverage support (maybe even compatible with LLVM), but for now I think per-function coverage should be easier. I am happy with a PR for either option.
Created https://github.com/rust-analyzer/rust-analyzer/issues/8420 to track this