Stream: t-compiler

Topic: fuzzing the compiler


nikomatsakis (Oct 30 2019 at 13:03, on Zulip):

So @Siavosh Zarrasvand opened compiler-team#212, a proposal to fuzz the rust compiler and stdlib:

Creating a OSS Fuzz project for Rust Compiler and stdlib compiler-team#212

I'm opening this topic to discuss the idea a bit. Some of the questions I wanted to ask:

pnkfelix (Oct 30 2019 at 13:05, on Zulip):

Hmm. I hadn't thought of this; it does seem like a potential headache.

pnkfelix (Oct 30 2019 at 13:05, on Zulip):

we don't really have any great methodology in place for identify duplicate bugs

pnkfelix (Oct 30 2019 at 13:05, on Zulip):

(and to be fair, I don't even know what the state of the art is in that area; as far as I know its still open research to correlate such things...)

simulacrum (Oct 30 2019 at 13:07, on Zulip):

It seems non-useful, i.e., if we expected soundness bugs to be uncovered, then that seems useful

simulacrum (Oct 30 2019 at 13:08, on Zulip):

but if it's just finding ICEs -- well, we have 268 of those filed right now :)

simulacrum (Oct 30 2019 at 13:08, on Zulip):

so finding more doesn't seem too useful

pnkfelix (Oct 30 2019 at 13:08, on Zulip):

we can certainly categorize which kinds of bugs are worth filing

pnkfelix (Oct 30 2019 at 13:10, on Zulip):

also: an ICE exposed solely by incremental, that has a reprodicible test case, is useful

pnkfelix (Oct 30 2019 at 13:10, on Zulip):

that is, I think we have a number of ICE's filed that are associated with incremental, but the bug filers often do not have a reproducible test case on hand, because they are not certain why it started failing, and cargo clean makes the bug go away

pnkfelix (Oct 30 2019 at 13:11, on Zulip):

and those bugs are often essentially non-actionable.

pnkfelix (Oct 30 2019 at 13:11, on Zulip):

So my claim is that fuzzing might provide a way to get easy to reproduce bugs in incremental-compilation

simulacrum (Oct 30 2019 at 13:12, on Zulip):

oh, yes, definitely true

pnkfelix (Oct 30 2019 at 13:13, on Zulip):

(also, it would just give me more faith that we are exercising more corner cases for incremental compilation testing. Right now I am scared to close the bugs that do not have non-reproducible test cases, because I don't want to lose the record that a latent bug is still believed to be present ...)

simulacrum (Oct 30 2019 at 13:13, on Zulip):

my impression was that fuzzing is not testing incremental? If we can do that, then amazing!

pnkfelix (Oct 30 2019 at 13:13, on Zulip):

My idea that I was saying to Niko was to use fuzzing as a way to test incremental

centril (Oct 30 2019 at 13:13, on Zulip):

Fuzzing is nice, but it's a fairly blunt instrument; sometime we should start using proper property based testing as well

simulacrum (Oct 30 2019 at 13:13, on Zulip):

one crazy thought: run the ui test suite in incremental mode with a single directory

pnkfelix (Oct 30 2019 at 13:13, on Zulip):

i.e. compile, change a bit, compile incrementally, then clean, then compile from scratch

pnkfelix (Oct 30 2019 at 13:14, on Zulip):

and compare the latter two compilation sessions

pnkfelix (Oct 30 2019 at 13:14, on Zulip):

because, ideally, they would have the same diagnostic output, if any

simulacrum (Oct 30 2019 at 13:14, on Zulip):

yes, that's true

simulacrum (Oct 30 2019 at 13:14, on Zulip):

Fuzzing is nice, but it's a fairly blunt instrument; sometime we should start using proper property based testing as well

I am a bit confused -- these seem pretty equivalent to me? In the sense that our property is "equivalent whether using incremental or not"

pnkfelix (Oct 30 2019 at 13:14, on Zulip):

one crazy thought: run the ui test suite in incremental mode with a single directory

can you elaborate on this? Like, every test input would use the same shared incremental store?

simulacrum (Oct 30 2019 at 13:15, on Zulip):

Right, yeah

simulacrum (Oct 30 2019 at 13:15, on Zulip):

obviously this would mean -j1 or so

simulacrum (Oct 30 2019 at 13:15, on Zulip):

but we would expect that incremental does not "hide" anything

pnkfelix (Oct 30 2019 at 13:15, on Zulip):

sure. Is part of the idea that many of our tests have similar filenames etc ?

pnkfelix (Oct 30 2019 at 13:15, on Zulip):

(i'm just trying to understand what corner case this is expecting to tickle)

simulacrum (Oct 30 2019 at 13:16, on Zulip):

oh, I was thinking we rename them to foo.rs or w/e, not different names

simulacrum (Oct 30 2019 at 13:16, on Zulip):

this is just an easy corpus for which we have pre-existing output

pnkfelix (Oct 30 2019 at 13:16, on Zulip):

gotcha

centril (Oct 30 2019 at 13:17, on Zulip):

@simulacrum With property based testing in terms of quickcheck and proptest you define how to generate data types and how to shrink them yourself

simulacrum (Oct 30 2019 at 13:17, on Zulip):

hm well okay, I guess "better" in some sense, but in our case I have no _good_ idea how one would do that. I guess maybe taking wg-grammar's grammar and generate output based on that

centril (Oct 30 2019 at 13:18, on Zulip):

basically yes, for some syntax -- but you can also property test some smaller parts of the compiler or just the standard library

pnkfelix (Oct 30 2019 at 13:19, on Zulip):

I agree that proptesting is good

centril (Oct 30 2019 at 13:19, on Zulip):

one can also generate ASTs directly and not take a syntax detour

pnkfelix (Oct 30 2019 at 13:19, on Zulip):

but the simplicity of fuzzing does make it an attractive option, just in terms of having its infrastructure be super super simple

pnkfelix (Oct 30 2019 at 13:19, on Zulip):

(and thus easy for contributors to onboard)

centril (Oct 30 2019 at 13:20, on Zulip):

@pnkfelix I think they can be complementary

pnkfelix (Oct 30 2019 at 13:20, on Zulip):

I agree with the complementary nature

centril (Oct 30 2019 at 13:20, on Zulip):

and they need not stand in each other's way

nikomatsakis (Oct 30 2019 at 13:22, on Zulip):

(it's worth noting that the proposal also describes libstd; although that's not our "balliwick", it could definitely be useful)

centril (Oct 30 2019 at 13:23, on Zulip):

(I think t-compiler should be empowered to improve libstd when it doesn't change the spec; much like t-compiler can improve the language when it doesn't change the spec)

pnkfelix (Oct 30 2019 at 13:25, on Zulip):

... anyone's welcome to contribute changes (to libstd). (or at least propose PRs thereof.) If you're talking about having power to approve changes, I don't see the need nor want the responsibility

centril (Oct 30 2019 at 13:27, on Zulip):

@pnkfelix if you have r+ you can even add new unstable APIs (with T-libs blessing) if you think it's reasonable and small

centril (Oct 30 2019 at 13:27, on Zulip):

(anyways... different subject)

nikomatsakis (Oct 30 2019 at 13:28, on Zulip):

(I think t-compiler should be empowered to improve libstd when it doesn't change the spec; much like t-compiler can improve the language when it doesn't change the spec)

obviously. I'm just saying that maybe part of the answer is: let's try this out, but focused on libs

nikomatsakis (Oct 30 2019 at 13:28, on Zulip):

though I personally think the incremental fuzzing is a good ide

nikomatsakis (Oct 30 2019 at 13:29, on Zulip):

I'm not sure, maybe @Siavosh Zarrasvand can clarify, what sort of fuzzing they had in mind -- i.e., we don't have to necessarily just mutate random bytes, we can do more structured changes

nikomatsakis (Oct 30 2019 at 13:29, on Zulip):

for example, we might try renaming variables

centril (Oct 30 2019 at 13:29, on Zulip):

@nikomatsakis I think that's a great idea; running fuzzing & PBT on the standard library should be way easier

nikomatsakis (Oct 30 2019 at 13:29, on Zulip):

or other things that we expect to be "perfect" in some sense

centril (Oct 30 2019 at 13:30, on Zulip):

especially for PBT, you can define some small properties for some algorithmic parts like iterators

centril (Oct 30 2019 at 13:30, on Zulip):

Maybe worth asking GHC what they do -- I have a feeling they use a lot of QuickCheck for stuff

Siavosh Zarrasvand (Oct 30 2019 at 13:34, on Zulip):

@nikomatsakis Yes, especially letting the fuzzer alter bits on a working seed input file will probably never allow it to get past the lexer.
Fuzzers can usually be programmed to alter only specific parts of the input file using string matching. We could start with very basic main.rs files and fuzz the syntax initially (see this approach: https://github.com/vegard/prog-fuzz) and increase coverage to std library modules long term. I was hoping to start very small and build incrementally.

Regarding bug concerns, the initial bugs on the compiler will have near 0 severity. They will crash rustc with input that it doesn't know how to lex. Later on in the std libraries, issues could have some severity...

Last update: Nov 16 2019 at 01:50UTC