Stream: t-compiler

Topic: proposal: add cryptographic hashes to debuginfo #69718


nikomatsakis (Mar 19 2020 at 15:39, on Zulip):

Hey so @Ryan Levick mentioned to me that PR #69718 would be really useful for Microsoft folks trying to get Rust building internally. I don't know all the details but I'm keen to enable Microsoft to do more Rust experimentation.

The PR itself adds a compiler flag (presently a -Z flag, but presumably it would eventually become a -C flag) to include a cryptographic hash of the source files in debug info. This seems like a pretty useful feature -- I'm assuming it's done in some way that's compatible with various C compilers etc but I don't really know (see thoughts on a quick writeup below...)./

I meant to nominate this for the meeting for a brief discussion, but let me raise it here. We don't have a very clear process around "small ad-hoc features" of this kind, though we often add them with an FCP (I just opened compiler-team#261 to track #t-compiler/wg-meta with drafting a quick policy here).

In any case, the general procedure I think we should do -- and in specific what I want to do here -- is:

Any thoughts on the abve, particularly with respect to #69718?

nikomatsakis (Mar 19 2020 at 15:40, on Zulip):

( cc @Arlo Siemsen, the PR author )

eddyb (Mar 19 2020 at 15:44, on Zulip):

o/

nikomatsakis (Mar 19 2020 at 15:44, on Zulip):

(The PR itself seems fairly small.)

eddyb (Mar 19 2020 at 15:44, on Zulip):

I only saw this thread accidentally

eddyb (Mar 19 2020 at 15:44, on Zulip):

please ping me on all debuginfo stuff :P

nikomatsakis (Mar 19 2020 at 15:45, on Zulip):

Sorry! You'd have gotten pinged for the FCP, at least :)

eddyb (Mar 19 2020 at 15:45, on Zulip):

see #68980

centril (Mar 19 2020 at 15:46, on Zulip):

Questions:

simulacrum (Mar 19 2020 at 15:46, on Zulip):

One question I might have is whether it makes sense to add this as a flag vs. doing it by default -- I can't imagine a SHA256 hash or w/e would actually take that long (on source code), so probably wouldn't add too much to compile times, even.

eddyb (Mar 19 2020 at 15:47, on Zulip):

@centril it's just giving debuggers something we already have

centril (Mar 19 2020 at 15:47, on Zulip):

simulacrum said:

One question I might have is whether it makes sense to add this as a flag vs. doing it by default -- I can't imagine a SHA256 hash or w/e would actually take that long (on source code), so probably wouldn't add too much to compile times, even.

Would be good to measure though

eddyb (Mar 19 2020 at 15:47, on Zulip):

which is the ability to ignore a file if it has changed

eddyb (Mar 19 2020 at 15:47, on Zulip):

we do the same for cross-crate file loading

bjorn3 (Mar 19 2020 at 15:47, on Zulip):

For cg_clif this is an easy addition. I use gimli to emit arbitrary DWARF.

Wesley Wiser (Mar 19 2020 at 15:47, on Zulip):

:thumbs_up: from me. That process makes sense to me. The only thing I would add is that we should really be filling out the Rust unstable book as well especially for flags that we expect interested users besides the compiler team to be using.

On a side note, .Net compilers do this with their PDBs and it's very nice because if you forget to build and then try to debug a process, the debugger will immediately highlight that things are weird because the source file doesn't match the code running in the process.

centril (Mar 19 2020 at 15:48, on Zulip):

I also think the process itself seems sensible

Wesley Wiser (Mar 19 2020 at 15:49, on Zulip):

I think the Microsoft symbol server stuff might also rely on these signatures but I'm not sure.

centril (Mar 19 2020 at 15:49, on Zulip):

@eddyb ok so there's no exposure wrt. stability and lang evolution in other words; if so, then I have no concerns

nikomatsakis (Mar 19 2020 at 15:53, on Zulip):

(One question -- this write-up I describe is really a kind of mini-RFC. I wonder where they should live. Probably asking folks to open a real RFC is bit heavyweight, although I think in principle it would be nice. But storing the write-up in a gist that is linked from the tracking issue feels unsatisfying, as the gist could be deleted etc. They could be stored into the compiler-team repo, or inline on tracking issue, or -- perhaps -- in a section at the end of the rustc-dev-guide? This isn't really blocking the PR per se, but I think it'd be nice to have more of a record of "why was this flag added" that we can find later)

centril (Mar 19 2020 at 15:54, on Zulip):

there's also the unstable book as a possible candidate

nikomatsakis (Mar 19 2020 at 15:55, on Zulip):

I view the unstable book as "end user facing",

nikomatsakis (Mar 19 2020 at 15:55, on Zulip):

so I expect it to have docs on how to use the feature more than design notes,

nikomatsakis (Mar 19 2020 at 15:55, on Zulip):

but also when things are stabilized it gets removed...

nikomatsakis (Mar 19 2020 at 15:55, on Zulip):

but yeah, might be a place to collect them I guess

nikomatsakis (Mar 19 2020 at 15:56, on Zulip):

@eddyb I assigned the PR to you since you seemed "on it"

centril (Mar 19 2020 at 15:56, on Zulip):

@nikomatsakis rustc guide is probably the place for internal design notes then

nikomatsakis (Mar 19 2020 at 15:56, on Zulip):

Yeah, I think so too

nikomatsakis (Mar 19 2020 at 15:56, on Zulip):

I think we can have a section

eddyb (Mar 19 2020 at 15:56, on Zulip):

sounds good, left comments on the parts I care about

centril (Mar 19 2020 at 15:57, on Zulip):

though the line is blurry; some of that info would probably be useful for end users too

eddyb (Mar 19 2020 at 15:57, on Zulip):

i.e. the redundancy with src_hash

eddyb (Mar 19 2020 at 15:57, on Zulip):

I'd rather us not hash source files more than once

nikomatsakis (Mar 19 2020 at 15:57, on Zulip):

centril said:

though the line is blurry; some of that info would probably be useful for end users too

my assumption is we'll extract that out into the unstable book

eddyb (Mar 19 2020 at 15:58, on Zulip):

tbh I don't even know if we need a flag

centril (Mar 19 2020 at 15:58, on Zulip):

:+1: -- makes sense

eddyb (Mar 19 2020 at 15:58, on Zulip):

we can just switch src_hash to MD5 and do all of this without an opt-in

nikomatsakis (Mar 19 2020 at 15:58, on Zulip):

eddyb said:

tbh I don't even know if we need a flag

(I'm ok with making it the default if you think that's the right call)

eddyb (Mar 19 2020 at 15:58, on Zulip):

DWARF is very extension-friendly so no reasonable tools would break from the extra info being there

nikomatsakis (Mar 19 2020 at 15:59, on Zulip):

though I think it doesn't change much process-wise, apart from it being "insta stable"

eddyb (Mar 19 2020 at 15:59, on Zulip):

(as in, if you don't recognize a property, you just ignore it)

centril (Mar 19 2020 at 15:59, on Zulip):

@eddyb maybe give it a bit of time in nightly before making it the default?

centril (Mar 19 2020 at 15:59, on Zulip):

just to let it bake a bit

nikomatsakis (Mar 19 2020 at 15:59, on Zulip):

iow I would still want a short write-up

eddyb (Mar 19 2020 at 15:59, on Zulip):

like, IMO this is a bugfix

nikomatsakis (Mar 19 2020 at 15:59, on Zulip):

but yeah that'st he other option, add a flag, and then decide later to make it the default

eddyb (Mar 19 2020 at 16:00, on Zulip):

@centril hmm we could have a flag that toggles whether it's put into the debuginfo but still compute src_hash via MD5

nikomatsakis (Mar 19 2020 at 16:00, on Zulip):

eddyb said:

like, IMO this is a bugfix

say a bit more?

eddyb (Mar 19 2020 at 16:00, on Zulip):

I'll suggest that on the PR

centril (Mar 19 2020 at 16:00, on Zulip):

@nikomatsakis agree; I think this is key; atm it feels like flags are being added a bit left and center and there's no "order"

eddyb (Mar 19 2020 at 16:00, on Zulip):

@nikomatsakis as in we could say that it's a bug we weren't emitting hashes

nikomatsakis (Mar 19 2020 at 16:00, on Zulip):

I'd like to know why you view it as a bugfix

nikomatsakis (Mar 19 2020 at 16:01, on Zulip):

OK. It's not like it's required in some sense

nikomatsakis (Mar 19 2020 at 16:01, on Zulip):

but it's a relatively minor bit of extra info to include, you're saying

eddyb (Mar 19 2020 at 16:01, on Zulip):

not having it can cause debuggers to show the wrong snippet

nikomatsakis (Mar 19 2020 at 16:01, on Zulip):

how so?

nikomatsakis (Mar 19 2020 at 16:01, on Zulip):

because they don't realize source file changed?

eddyb (Mar 19 2020 at 16:02, on Zulip):

for the reason reason we already hash the source file and don't show snippets in diagnostics if it changed

nikomatsakis (Mar 19 2020 at 16:02, on Zulip):

(from when it was compiled, I mean)

eddyb (Mar 19 2020 at 16:02, on Zulip):

yeah, there is no way to check short of a hash

nikomatsakis (Mar 19 2020 at 16:05, on Zulip):

OK. I'm starting to see your perspective.

nikomatsakis (Mar 19 2020 at 16:05, on Zulip):

It does feel like an "externally visible commitment" though

eddyb (Mar 19 2020 at 16:05, on Zulip):

eh we don't guarantee what's in our debuginfo

eddyb (Mar 19 2020 at 16:06, on Zulip):

it's always been a best-effort type of thing

nikomatsakis (Mar 19 2020 at 16:06, on Zulip):

That's true.

nikomatsakis (Mar 19 2020 at 16:06, on Zulip):

Regardless, I'm fairly persuaded we don't need a flag for it

nikomatsakis (Mar 19 2020 at 16:07, on Zulip):

The reason I said it's a "kind of" commitment is that I think it's important to MS tooling -- so we wouldn't want to remove it. But that's ok, that's all the more reason to do it of course.

nikomatsakis (Mar 19 2020 at 16:07, on Zulip):

(I don't really know any details, could be wrong)

eddyb (Mar 19 2020 at 16:07, on Zulip):

anyway left a comment with the simplified version I had in mind

nikomatsakis (Mar 19 2020 at 16:09, on Zulip):

I guess the only purpose for the flag would be to help people choose between MD5/SHA1, and you're (I think) saying "MD5 is good enough"

eddyb (Mar 19 2020 at 16:10, on Zulip):

but also if DWARF doesn't support SHA1... that's kind of bad

eddyb (Mar 19 2020 at 16:11, on Zulip):

worst case we always compute both and use the best one supported, unless it's a perf drawback

eddyb (Mar 19 2020 at 16:11, on Zulip):

or we make the target spec choose

pnkfelix (Mar 19 2020 at 16:13, on Zulip):

hey @nikomatsakis , why wouldn't we use the major change proposal for this process?

pnkfelix (Mar 19 2020 at 16:13, on Zulip):

is that too heavyweight?

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

I was wondering that

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

that reminds me that @Santiago Pastorino and I kind of sketched out some details and started an RFC for that

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

but we didn't quite finish it

centril (Mar 19 2020 at 16:13, on Zulip):

isn't major change primarily about well... a major change to the codebase?

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

it might be a good choice indeed

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

it is in some sense a bit lightweight

centril (Mar 19 2020 at 16:13, on Zulip):

as opposed to e.g., a feature that isn't very invasive codebase wise

nikomatsakis (Mar 19 2020 at 16:13, on Zulip):

in that we didn't require a full FCP

pnkfelix (Mar 19 2020 at 16:14, on Zulip):

I think in some ways, having a smaller number of processes, even if some are not perfect fits, is better than a plethora of processes to choose from

nikomatsakis (Mar 19 2020 at 16:14, on Zulip):

but then again

nikomatsakis (Mar 19 2020 at 16:14, on Zulip):

as long as the compiler flag is untsable, that's probably fine

nikomatsakis (Mar 19 2020 at 16:14, on Zulip):

I would only want a full FCP for something irreversible

nikomatsakis (Mar 19 2020 at 16:14, on Zulip):

pnkfelix said:

I think in some ways, having a smaller number of processes, even if some are not perfect fits, is better than a plethora of processes to choose from

agreed

centril (Mar 19 2020 at 16:14, on Zulip):

at least when I think of major change proposals I think of "changes to architecture"

nikomatsakis (Mar 19 2020 at 16:14, on Zulip):

it just requires extending the definition of "major change" to include "adding a new compiler flag" :)

centril (Mar 19 2020 at 16:15, on Zulip):

heh :slight_smile:

Santiago Pastorino (Mar 19 2020 at 16:24, on Zulip):

nikomatsakis said:

that reminds me that Santiago Pastorino and I kind of sketched out some details and started an RFC for that

did I? :P

Santiago Pastorino (Mar 19 2020 at 16:24, on Zulip):

ahh now read it, about the major changes proposal, ok ok

Arlo Siemsen (Mar 19 2020 at 18:48, on Zulip):

I have a short write up about the feature. https://gist.github.com/arlosi/27162cfbd4bfc9bfabc25bfbe4777160
Is the next step for this to create a PR to add it to the rustc guide? Should be in the rustc guide before it is implemented?

Wesley Wiser (Mar 19 2020 at 18:55, on Zulip):

I'd say to go ahead and open the PR for the rustc guide even if the implementation hasn't landed yet. We can wait to merge the rustc guide pr until the implementation is merged.

Arlo Siemsen (Mar 19 2020 at 19:57, on Zulip):

PR is up https://github.com/rust-lang/rustc-dev-guide/pull/623

nikomatsakis (Mar 24 2020 at 16:34, on Zulip):

@Arlo Siemsen thanks for the write-up! That's great

Arlo Siemsen (Mar 24 2020 at 16:35, on Zulip):

Glad it helps. Let me know if there's more we need to add. Since we're looking at making it a target option (rather than a compiler option), should I still open a tracking issue?

nikomatsakis (Mar 24 2020 at 16:36, on Zulip):

@eddyb convinced me that this is a relatively small change,

nikomatsakis (Mar 24 2020 at 16:36, on Zulip):

but reading your write-up did make me wonder exactly how it should be exposed

nikomatsakis (Mar 24 2020 at 16:36, on Zulip):

in particular, just always doing MD5 (like clang) is I guess not recommended

nikomatsakis (Mar 24 2020 at 16:37, on Zulip):

I guess a target option seems fine, are those "insta stable"?

nikomatsakis (Mar 24 2020 at 16:37, on Zulip):

I guess they probably are, I'm not sure that we have a notion of a "nightly only" target option.

Arlo Siemsen (Mar 24 2020 at 16:37, on Zulip):

Yes, I think that would make it insta stable.

nikomatsakis (Mar 24 2020 at 16:37, on Zulip):

I guess I think "target option" makes sense if -- on the various targets -- there is an obvious "correct choice" based on the surrounding tooling

nikomatsakis (Mar 24 2020 at 16:38, on Zulip):

it seems like most people just won't care what kind of hash is used

nikomatsakis (Mar 24 2020 at 16:38, on Zulip):

as long as their debugger supports it

Arlo Siemsen (Mar 24 2020 at 16:38, on Zulip):

We really want to use SHA256 for PDBs, however the support for SHA256 isn't available until LLVM 11.

Arlo Siemsen (Mar 24 2020 at 16:38, on Zulip):

I agree, most people are going to be fine with anything supported by the debugger.

Arlo Siemsen (Mar 24 2020 at 16:39, on Zulip):

The thought with making it a target option was to choose the best hash available for the given target.

nikomatsakis (Mar 24 2020 at 16:41, on Zulip):

OK. I could see an argument for including both, even. i.e., have the target info supply the default, but give the ability to override.

nikomatsakis (Mar 24 2020 at 16:42, on Zulip):

That seems like it'd be easy enough to do?

nikomatsakis (Mar 24 2020 at 16:42, on Zulip):

But I guess the question is whether people would ever actually want to override. I'm fine with just doing target info to start.

Arlo Siemsen (Mar 24 2020 at 16:42, on Zulip):

Yes, I think that would be easy enough. I'm not sure people would want to override it.

nikomatsakis (Mar 24 2020 at 16:43, on Zulip):

That's an irreversible decision, though, so I think it'd be great to add just a note or two about in the write-up, and we can do a quick FCP.

nikomatsakis (Mar 24 2020 at 16:43, on Zulip):

I'll nominate it also for discussion in the meeting.

Arlo Siemsen (Mar 24 2020 at 16:43, on Zulip):

Sounds good. I can add it to the write up PR.

nikomatsakis (Mar 24 2020 at 16:44, on Zulip):

nikomatsakis said:

That's an irreversible decision, though, so I think it'd be great to add just a note or two about in the write-up, and we can do a quick FCP.

is it? I could be wrong, but I'm assuming people can make their own target options.

nikomatsakis (Mar 24 2020 at 16:44, on Zulip):

Thanks!

nikomatsakis (Mar 24 2020 at 16:44, on Zulip):

Ping me when it's done and I'll go kick off the FCP

Arlo Siemsen (Mar 24 2020 at 16:44, on Zulip):

Yes, I think you can create your own target file via JSON.

Wesley Wiser (Mar 24 2020 at 16:48, on Zulip):

https://doc.rust-lang.org/rustc/targets/custom.html

eddyb (Mar 24 2020 at 17:11, on Zulip):

can we avoid stabilization by making it not parse from JSON?

eddyb (Mar 24 2020 at 17:11, on Zulip):

so it's internal-only?

eddyb (Mar 24 2020 at 17:11, on Zulip):

(I guess that could mean we can just have a method that returns the best supported hash format)

eddyb (Mar 24 2020 at 17:11, on Zulip):

(and not let the target spec control it)

Arlo Siemsen (Mar 24 2020 at 17:16, on Zulip):

If we make it a real target option, it has to be parsed through JSON, since even the hard-coded targets round-trip through JSON.

eddyb (Mar 24 2020 at 17:16, on Zulip):

I guess that's an insurance policy against target spec bugs

Arlo Siemsen (Mar 24 2020 at 17:18, on Zulip):

To avoid stabilization, we could key off is_like_msvcin the target I suppose.

eddyb (Mar 24 2020 at 17:18, on Zulip):

that's the only situation in which we emit PDBs, right?

eddyb (Mar 24 2020 at 17:18, on Zulip):

what does LLVM even do if you give it e.g. SHA1 for a DWARF target?

Arlo Siemsen (Mar 24 2020 at 17:18, on Zulip):

It drops it.

Arlo Siemsen (Mar 24 2020 at 17:19, on Zulip):

And you get no hash.

eddyb (Mar 24 2020 at 17:19, on Zulip):

yikes

Arlo Siemsen (Mar 24 2020 at 17:19, on Zulip):

I think MSVC is the only place we emit PDBs.

eddyb (Mar 24 2020 at 17:19, on Zulip):

I bet there's no way to check through LLVM's API whether a hash is supported on a target

Arlo Siemsen (Mar 24 2020 at 17:20, on Zulip):

Not that I know of.

eddyb (Mar 24 2020 at 17:20, on Zulip):

anyway, yeah, is_like_msvc could be okay for now if we don't want to add an insta-stable option name

Arlo Siemsen (Mar 24 2020 at 17:20, on Zulip):

@nikomatsakis I updated https://github.com/rust-lang/rustc-dev-guide/pull/623

nikomatsakis (Mar 24 2020 at 17:53, on Zulip):

Heh so I think it'd be fine to just start emitting hashes always using our own "best judgement" about when to do so

nikomatsakis (Mar 24 2020 at 18:00, on Zulip):

OK, so, current plan is to just pick a default and add a -Z flag?

nikomatsakis (Mar 24 2020 at 18:00, on Zulip):

seems ok to me

nikomatsakis (Mar 24 2020 at 18:01, on Zulip):

I'd prob go w/ the original plan of making a stabilization issue and note down the options we considered. I don't care so much whether we use MCP or FCP, although technically there is a "user facing" change (we're adding the basic feature of including hashing in debuginfo), but I think it's a pretty minor thing and if people are really concerned we have plenty of time before stable.

nikomatsakis (Mar 24 2020 at 18:01, on Zulip):

So just mentioning it in the meeting probably suffices

nikomatsakis (Mar 24 2020 at 18:02, on Zulip):

I imagine sometime later if we ever get requests to customize per platform or to expose the flag we can figure out later the best way to do it

nikomatsakis (Mar 24 2020 at 18:02, on Zulip):

probably we never will

Arlo Siemsen (Mar 24 2020 at 18:24, on Zulip):

Ok, to avoid insta-stabilizing anything, I'll update the PR to:

eddyb (Mar 30 2020 at 17:56, on Zulip):

@Arlo Siemsen sorry I didn't take a look at the PR earlier, I put it off for a day or two and then it ended up on the second page of GH notifications :(

eddyb (Mar 30 2020 at 17:57, on Zulip):

I need to start unwatching PRs that I don't need to be involved in, just so it's easier to keep track of the ones where I should be involved

Arlo Siemsen (Mar 31 2020 at 17:59, on Zulip):

No problem. I re-merged the PR and squashed all the commits. Let me know if there's anything else we need.

eddyb (Apr 05 2020 at 15:58, on Zulip):

(I had the hubris to try and rename the topic but Zulip is fighting me, this is an attempt to move the two messages before my first message in the topic)

eddyb (Apr 05 2020 at 16:00, on Zulip):

@nikomatsakis so it looks like "move previous and following messages to this topic" doesn't move other people's messages, this means you'd have to move your own messages to "proposal: add source file hashes to debuginfo #69718" :(

Last update: Jun 04 2020 at 16:15UTC