Stream: t-compiler/wg-incr-comp

Topic: Traceability of bugs


view this post on Zulip Félix Fischer (Jul 14 2020 at 16:27):

So it seems so far that we need some better way to spot bugs in incr-comp. Let's talk about ideas here?

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 16:28):

Somehow preserving the cache on ICE is IMO the highest-impact change. People often wipe it accidentally (which sometimes just needs another edit).

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:29):

Idea: if ICE was triggered, store a copy of the cache so that it can be sent for inspection.
Drawback: space usage
Possible mitigations: discarding MIR and llvm bitcode before storing. Compressing the cache (in gzip or similar)

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:32):

Idea: store the history of changes together with the cache.
Justification: it's basically a cross-platform way of tracking the incr-comp work.
Drawbacks: you'd need to store the project's codebase somehow.
Mitigations: ??

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:32):

(I put ?? There because I don't remember if we mentioned drawbacks for that idea)

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:33):

@Jonas Schievink yeah, I agree. Backing up the cache that broke incr-comp seems the most straightforward way of adding traceability to the process.

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 16:33):

We only need a single diff

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 16:36):

Or rather, we need the original code, and the diff, which I don't think we can obtain right now, unless we store the whole crate source code in the cache

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:36):

Yes

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:37):

I forgot that @pnkfelix's work is still in progress. They're looking to answer the question "could incr-comp benefit from changes that might make it history-dependent or not?"

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:38):

Or rather, we need the original code, and the diff, which I don't think we can obtain right now, unless we store the whole crate source code in the cache

You're right!

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:39):

I've added it to the entry

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 16:43):

What's currently making the cache platform-dependent? More than just endianness and pointer size (which is little and 64-bit in almost all cases)?

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:44):

@Wesley Wiser @davidtwco ^

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:45):

(deleted)

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:45):

I ask them because I have no idea

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:45):

It seems to be a bit of a rabbit hole, from what I've heard. But I don't know for certain.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:45):

(deleted)

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:45):

(deleted)

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:46):

(Wait did Zulip actually ping you three times, or is it just me who's seeing this?)

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 16:46):

The message was duplicated 3 times

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:46):

Fuck

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:47):

(There, better... damn, this app needs some work. Okay, sorry for that. Zulip mobile seems to stumble when there's bad internet)

view this post on Zulip davidtwco (Jul 14 2020 at 16:48):

I'm not sure what else would contribute to incr-comp data being platform-dependent.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:49):

Same (at least in terms of being able to read the data on a different platform)

view this post on Zulip davidtwco (Jul 14 2020 at 16:50):

I wonder if we could mark platform-dependent queries as such and have the triple be an implicit argument of some sort so that regular invalidation is all that is necessary - might be enough to debug some incr-comp bugs from other platforms?

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:50):

Oh, cool! Then I misunderstood what was exactly not cross-platform. Maybe the post-MIR steps?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:50):

But I would be happy with just having the data, platform dependent or not.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:50):

Well

view this post on Zulip davidtwco (Jul 14 2020 at 16:50):

That said, maybe it already works something like that - I should understand more about it before speculating.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:51):

What's the goal of making it platform independent?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:51):

Is this just so we can investigate bugs or so we can actually use the data in the cache?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:51):

The former seems useful the latter not so much IMO

view this post on Zulip davidtwco (Jul 14 2020 at 16:51):

I assume the former.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:51):

I think the former

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:52):

Ok, just making sure we're all talking about the same things

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:52):

In that case, I'm not sure it matters that the results of some queries are platform dependent as long as the on-disk format isn't (or at least, has well defined semantics for conversion).

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:53):

Yas

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:53):

But even then, I'd prefer to steer the direction of this project toward just getting the data as it is now first.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:53):

As long as you can convert it to something one can debug later, I'm happy with it being platform-dependent

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:54):

We often have a tendency to come up with the best long term solution and then block incremental improvements on that long term plan. When there's a lot of benefit to be had to the flawed but better thing right now

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:54):

But even then, I'd prefer to steer the direction of this project toward just getting the data as it is now first.

I've got no complaints with that either ^^

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:54):

Just having this data would improve things substantially IMO

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:55):

It's also not that hard to get a Windows VM on Linux or vice versa.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:55):

Yup. If I understand it correctly, we basically got no way to get data on our hands rn regarding these kinds of bugs right?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:55):

And that's probably covers 75% of the bugs reports we've seen.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:56):

That would be amazing

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:56):

I think occasionally we do get a copy of the user's incr-cache if they knew to upload it to the bug.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:56):

But I'm not sure we know what to do with that.

view this post on Zulip Félix Fischer (Jul 14 2020 at 16:58):

Right, right. No, that makes sense. If we haven't gone through the process of answering what to do with the evidence, then how would we use it when we have it?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:58):

Perhaps someone else has a plan with what to do with it but I'm not aware of anything

view this post on Zulip Wesley Wiser (Jul 14 2020 at 16:59):

It seems like there could perhaps be some sort of validation process we could just run on the data to see if some obvious things are wrong. Ie: this slice claims to have a length of 6billion elements. That's probably wrong.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:00):

But then if have such an algorithm, it seems like we could run this at ICE time (what I suggested above).

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:00):

I realize I'm steering the conversation one way so if anyone wants to talk about something else, please feel free to interrupt me

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:01):

It's okay

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:01):

So to reproduce bugs, we don't just need the incremental cache, but also the source code that was attempted to be compiled, and even that is hard to come by sometimes

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:01):

The number one question on my mind is: Is the cache getting corrupted somehow or is the cache valid and we have bug when we use that valid data?

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:01):

But I guess easy to save when the ICE handler also saves the cache

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:01):

This is a very green field, I feel. There might be many directions to pursue which could give big benefits

view this post on Zulip pnkfelix (Jul 14 2020 at 17:01):

Jonas Schievink said:

What's currently making the cache platform-dependent? More than just endianness and pointer size (which is little and 64-bit in almost all cases)?

there might be file paths that would need. to be normalized/relativized. Not sure.

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:03):

Well, a checker of the cache could be really useful not only just at the event of an ICE. It could also be used to debug incr-cache. Imagine giving it as a compiler flag (-check-incr-cache) and run it through some tests to see if it breaks the cache.

view this post on Zulip davidtwco (Jul 14 2020 at 17:03):

It just occured to me that MCVEs aren't particularly practical to synthesise for incr-comp bugs - we'd probably be working with full projects (and tons more log output) rather than individual files..

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:04):

(I'm not 100% convinced that having ICE-free incr. comp. is much more important than having effective incr. comp. – when an ICE happens the workaround is always trivial, but when incr. comp. results in unexpectedly long compile times there is no workaround)

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:05):

Well, more than ICE free, you want it to be correct

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:05):

ICE is just an indicator of some incorrectness (that word feels weird)

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:05):

Yeah, sure, that is important, but I'm not sure if we have any correctness bugs?

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:06):

How would we know?

view this post on Zulip pnkfelix (Jul 14 2020 at 17:06):

more that they're really hard to detect

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:06):

Is the cache getting corrupted somehow or is the cache valid and we have bug when we use that valid data?

If the cache is corrupt, then we're either:

If the cache is valid, then maybe the bug has something to do with how we restore cached data into the current compilation session's in-memory query cache.

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:06):

Ah, not soundness bug

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:07):

There isn't any kind of checksum on the cache data, right?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:07):

That we're seeing these ICEs trying to use the incr-cache data after it's been loaded suggests we have correctness bugs because the ICE's aren't from asserts about the incr-cache, they're from general asserts in std lib (ie, index out of range).

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:07):

I thought there was

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:07):

Which is why it's perplexing to me that some of these issues get reported after a new release is shipped and clearing the cache goes away

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:07):

That's what the checksum was supposed to prevent.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:08):

Probably related: I see frequent complaints from compiler devs that their stage1 cache isn't getting invalidated correctly.

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:08):

And then you get weird linking errors

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:08):

You mean having the compiler version in the cache?

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:08):

Yeah I thought that was part of the checksum

view this post on Zulip Jonas Schievink [he/him] (Jul 14 2020 at 17:08):

I think that is separate

view this post on Zulip Wesley Wiser (Jul 14 2020 at 17:08):

Ah

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:09):

That we're seeing these ICEs trying to use the incr-cache data after it's been loaded suggests we have correctness bugs because the ICE's aren't from asserts about the incr-cache, they're from general asserts in std lib (ie, index out of range).

This is why these issues worry me, mostly. If incr-comp is breaking basic invariants ("hey NO bringing usizes below zero!") then what the heck might be happening there.

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:10):

I think we want to... just make sure incr-comp sits on steady foundations, mostly

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:10):

Just in case

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:12):

I don't want to have incr-comp be fast only to later realize it ain't compiling correctly. Like some -O3 optimizations in clang breaking your code

view this post on Zulip Félix Fischer (Jul 14 2020 at 17:13):

I don't know if these worries make sense. Can you break the compilation artifacts if incr-comp is not working correctly? I'm assuming you can, but maybe you can't?

view this post on Zulip pnkfelix (Jul 14 2020 at 17:13):

things like that are an argument, in a way, against adding history-dependence

view this post on Zulip pnkfelix (Jul 14 2020 at 17:13):

because bugs are more shallow when results are history independent

view this post on Zulip pnkfelix (Jul 14 2020 at 17:14):

(because the search for bugs is more effectively distributed across the audience of Rust programmers)

view this post on Zulip Santiago Pastorino (Jul 15 2020 at 22:55):

something kind of related to the discussion, I have seen some incr comp bug reports where we don't have the needed information to reproduce the issue but we don't provide the right information for people that report bugs to provide such information, so ...

view this post on Zulip Santiago Pastorino (Jul 15 2020 at 22:56):

I wonder if we can document something better or if we can have some kind of ICE breaker group that we can ping and a bot would ping some of us but also would provide some information to reporters

view this post on Zulip Santiago Pastorino (Jul 15 2020 at 22:57):

the problem I see also is that some reports are old and they may not be easy to reproduce because reporters may have move on


Last updated: Oct 21 2021 at 20:03 UTC