Hi everyone, I hope you are all doing well. I am a computer science student interested in contributing to the Rust community. Specifically, my mentor (Will Crichton) and I would like to implement telemetry for
Similar work was done at Google where they found that dependency-related bugs were both the majority of all errors developers found (more than 40%) as well as the one that took the longest to fix. Here is a link to the study if you would like to take a look: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42184.pdf
We are very interested in conducting a similar study for Rust. What are the most common errors? How long does it take to solve them? Etc.
I saw that telemetry was removed from
rustup last year as part of a simplification effort (https://github.com/rust-lang/rustup/pull/1642/commits). The feature was never made visible for users, it didn't function correctly on Windows, and it only performed local analysis. I would love to bring it to
rustc (fully-functional!) as we think this analysis would be very useful for language developers.
What do you think about this functionality? Where do you think it would be more appropriate to deploy this tool? I am considering adding it to
rustc. What would you suggest?
I don't think we would want to include HTTP (or w/e) communication in the compiler itself, as even if it's "benign" people would likely not be too happy with that
We were also considering adding it on to RLS, though it seems there are rumors to deprecate it altogether
@simulacrum where do you think it would be more appropriate to add it to?
rust-analyzer, the pseudo-successor, or RLS might be a better fit, but would be very much not a representative sampling, I think.
Do we have a sense of where the study by Google drew its data from?
I personally think that perhaps sticking it inside rustc as a optional thing to spit out into something like .rustup which we could ask users to upload as part of a survey could work. e.g., land it today, it accumulates over time, and then we ask in the next survey that people who are using Rust give us the blob
They keep persistent logs of each build. "The results of all builds in Google’s cloud-based build system are saved in persistent logs describing the result of each build (succeeded or failed) and the errors produced by all compiles during the build. These logs are the main source of data for this study. We analyzed build logs to extract information about build sessions, success/failure ratios, and error messages."
hm, yeah, so they have a unique advantage in sampling internal data
That's what we had initially thought about! Adding it to rustc and asking users to opt-in if they want to
Thanks for your comments, @simulacrum. I would love to hear the opinion of others as well! :-)
I think rustc might not be quite the right place -- e.g., rustup might be a bit better maybe -- mostly because it's better suited to this sort of end-user analysis I think
rustc would likely not want the baggage
on the other hand, we would likely want to omit the raw content of the spans -- I imagine people may be wary if they're working in a non-open source environment
that might be easier inside rustc
Can you explain a bit more what you mean when you say that
rustc would likely not want the baggage?
I think that it'd require
cargo and perhaps
rustup to work together for this kind of work. E.g.
rustc will have to, ultimately, produce the sanitised information necessary for analysis.
cargo probably has to aggregate it, manage opt-in, etc. on a crate level; and then either
rustup will end up responsible for uploading that to an analysis service