Stream: t-compiler/wg-diagnostics

Topic: Telemetry


Georgia Sampaio (Feb 20 2020 at 00:21, on Zulip):

Hi everyone, I hope you are all doing well. I am a computer science student interested in contributing to the Rust community. Specifically, my mentor (Will Crichton) and I would like to implement telemetry for rustc.

Similar work was done at Google where they found that dependency-related bugs were both the majority of all errors developers found (more than 40%) as well as the one that took the longest to fix. Here is a link to the study if you would like to take a look: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42184.pdf

We are very interested in conducting a similar study for Rust. What are the most common errors? How long does it take to solve them? Etc.

I saw that telemetry was removed from rustup last year as part of a simplification effort (https://github.com/rust-lang/rustup/pull/1642/commits). The feature was never made visible for users, it didn't function correctly on Windows, and it only performed local analysis. I would love to bring it to rustc (fully-functional!) as we think this analysis would be very useful for language developers.

What do you think about this functionality? Where do you think it would be more appropriate to deploy this tool? I am considering adding it to rustc. What would you suggest?

simulacrum (Feb 20 2020 at 00:24, on Zulip):

I don't think we would want to include HTTP (or w/e) communication in the compiler itself, as even if it's "benign" people would likely not be too happy with that

Georgia Sampaio (Feb 20 2020 at 00:26, on Zulip):

We were also considering adding it on to RLS, though it seems there are rumors to deprecate it altogether

Georgia Sampaio (Feb 20 2020 at 00:28, on Zulip):

@simulacrum where do you think it would be more appropriate to add it to?

simulacrum (Feb 20 2020 at 00:28, on Zulip):

rust-analyzer, the pseudo-successor, or RLS might be a better fit, but would be very much not a representative sampling, I think.

simulacrum (Feb 20 2020 at 00:29, on Zulip):

Do we have a sense of where the study by Google drew its data from?

simulacrum (Feb 20 2020 at 00:30, on Zulip):

I personally think that perhaps sticking it inside rustc as a optional thing to spit out into something like .rustup which we could ask users to upload as part of a survey could work. e.g., land it today, it accumulates over time, and then we ask in the next survey that people who are using Rust give us the blob

Georgia Sampaio (Feb 20 2020 at 00:31, on Zulip):

They keep persistent logs of each build. "The results of all builds in Google’s cloud-based build system are saved in persistent logs describing the result of each build (succeeded or failed) and the errors produced by all compiles during the build. These logs are the main source of data for this study. We analyzed build logs to extract information about build sessions, success/failure ratios, and error messages."

simulacrum (Feb 20 2020 at 00:31, on Zulip):

hm, yeah, so they have a unique advantage in sampling internal data

Georgia Sampaio (Feb 20 2020 at 00:32, on Zulip):

That's what we had initially thought about! Adding it to rustc and asking users to opt-in if they want to

Georgia Sampaio (Feb 20 2020 at 00:34, on Zulip):

Thanks for your comments, @simulacrum. I would love to hear the opinion of others as well! :-)

simulacrum (Feb 20 2020 at 00:34, on Zulip):

I think rustc might not be quite the right place -- e.g., rustup might be a bit better maybe -- mostly because it's better suited to this sort of end-user analysis I think

simulacrum (Feb 20 2020 at 00:34, on Zulip):

rustc would likely not want the baggage

simulacrum (Feb 20 2020 at 00:40, on Zulip):

on the other hand, we would likely want to omit the raw content of the spans -- I imagine people may be wary if they're working in a non-open source environment

simulacrum (Feb 20 2020 at 00:41, on Zulip):

that might be easier inside rustc

Georgia Sampaio (Feb 20 2020 at 00:43, on Zulip):

Can you explain a bit more what you mean when you say that rustc would likely not want the baggage?

Daniel Silverstone (Feb 20 2020 at 07:54, on Zulip):

I think that it'd require rustc cargo and perhaps rustup to work together for this kind of work. E.g. rustc will have to, ultimately, produce the sanitised information necessary for analysis. cargo probably has to aggregate it, manage opt-in, etc. on a crate level; and then either cargo or rustup will end up responsible for uploading that to an analysis service

Last update: Apr 03 2020 at 17:20UTC