Stream: t-lang/wg-unsafe-code-guidelines

Topic: Unsafe code reviews?


Lokathor (Jun 21 2019 at 19:59, on Zulip):

Is this a place where people should post unsafe code that they want to have inspected for validity? Or is there some other place that's better?

Jake Goulding (Jun 21 2019 at 20:21, on Zulip):

I can’t say that this is the goal of this stream, but I do use it a bit like that. Generally my hope is to provide some interesting “real world” test cases to the UCG group. They would then take them and make actual guidelines and produce a beautiful book like thing, and then I just read that and run Miri and never bother anyone.

We aren’t quite at the last step yet… :wink:

RalfJ (Jun 21 2019 at 20:23, on Zulip):

Yes, if you have some question that you can reduce to a small example of unsafe code, this stream is a good place to ask.
That's not the same thing as code review though. Speaking just for myself, I cannot provide review-as-a-service, that's just too time consuming.

Shnatsel (Jun 23 2019 at 17:30, on Zulip):

I'm trying to review very popular crates with unsafe with them right now, so cannot help here. If I were you I'd post to http://reddit.com/r/rust/ - that usually does the trick when I need input on something

Shnatsel (Jun 23 2019 at 17:45, on Zulip):

I've been looking at DEFLATE decoders and the situation there is quite sad. In libflate(7500 downloads/day) almost every single unsafe block I looked at was unsound, and I've inspected less than half of the crate. Issues that still aren't fixed:
https://github.com/sile/libflate/issues/31
https://github.com/sile/libflate/issues/33
Then there is miniz_oxide(6000 downloads/day) which I've just started reviewing, but the second unsafe block I looked at had buffer overflow on write in it: https://github.com/Frommi/miniz_oxide/pull/47 (fix still unreleased)
Plus there is something much more interesting going on with a segfault on panic (https://github.com/Frommi/miniz_oxide/issues/14) and also some type confusion issues that are beyond me (https://github.com/Frommi/miniz_oxide/pull/36)
And this is the code that's exposed to untrusted input from the network through reqwest :disappointed:

Shnatsel (Jun 23 2019 at 18:37, on Zulip):

The only silver lining is that there is a DEFLATE decoding crate called inflate that is slower and less popular, but has only one unsafe block, and we've already found and fixed the vulnerability in that one. Plus I have an RFC open that will make its only bespoke unsafe block redundant:
https://github.com/rust-lang/rfcs/pull/2714

Lokathor (Jun 23 2019 at 19:08, on Zulip):

@Shnatsel I was actually gearing up to make a core-only Deflate lib (which I need for my core-only PNG parser lib), so perhaps you can provide guidance in that area once I read the spec closely enough and have something to show. Unfortunately I've never done Huffman stuff before so it's a slow start at the moment, but Soon(tm).

RalfJ (Jun 23 2019 at 19:10, on Zulip):

@Shnatsel oh wow that's pretty sad :(

RalfJ (Jun 23 2019 at 19:12, on Zulip):

do you have any ideas for how to avoid unsoundness from creeping back in after the blocks you looked at got fixed? You seem to push for being able to express more things in safe code without a performance hit, but what about cases where that is not possible?

Shnatsel (Jun 23 2019 at 19:14, on Zulip):

Actually, the C implementation of miniz only ever used the stack, Rust impl then put some things in Box<&[u8]> because the compiler was not great at optimizing out stack copies. Perhaps you will be able to adapt miniz_oxide, it's easier than starting from scratch.
I don't think I can offer much help on DEFLATE specifically because I was only ever doing the following:
1. Fuzz the crate through its public API
2. Once all the bugs found by fuzzing are fixed, grep for unsafe and try to check if it's sound.
3. If nothing obviously wrong turns up, try to refactor it into safe Rust without losing performance.
So I have only a very cursory idea of what DEFLATE does. I know it has RLE in it which is impossible to implement safely and efficiently right now (hence my RFC to fix that), but that's about it.

Shnatsel (Jun 23 2019 at 19:25, on Zulip):

@RalfJ open Rust RFCs to make it possible.
What other options do we have... well, fuzzing with ASAN or running fuzz corpus through MIRI on CI helps, but is not a silver bullet because those are dynamic analyzers, they need an input that would trigger the bug.
Having a sound security-oriented static analyzer would be rad, but that's what borrow checker is. Also Prusti, but that also only works with safe code. I doubt people would bother with something like KLEE, and I don't think such a tool even exists for Rust at this point.
Heuristic static analyzers could help but are also not a silver bullet.
So... I guess open Rust RFCs to make it possible is actually the best option. Followed by putting the code in a crate if it's too niche and then reusing that one crate, and hoping that somewhere, somebody has audited it. cargo-crev might help in that regard.

Shnatsel (Jun 23 2019 at 19:26, on Zulip):

So basically my approach is "write and audit once" over "everyone rolls their own unsafe", and I have a hard time imagining a case where it's flat out impossible. Can you provide some examples?

RalfJ (Jun 23 2019 at 19:35, on Zulip):

I was mostly wondering if you had encountered such an example :) You have certainly seen much more unsafe code in the wild than I did.

RalfJ (Jun 23 2019 at 19:36, on Zulip):

(and btw, if you could leave some advise/notes/lessons learned at https://github.com/rust-lang/unsafe-code-guidelines/issues/146, that would be great)

RalfJ (Jun 23 2019 at 19:37, on Zulip):

In terms of what to do, beyond tooling I was also thinking of better docs/books/tutorials/whatever. Do you feel the issues arise because people don't understand what they are supposed to do? That's a point I am worried about in particular for the more subtle things, like how to handle uninitialized memory. Though the first PR you linked actually had a TODO: audit this in there, so in that case documentation probably was not the problem. ;)

Shnatsel (Jun 23 2019 at 19:47, on Zulip):

OOooh hey look, a use-after-free in smallvec: https://github.com/servo/rust-smallvec/issues/148

Shnatsel (Jun 23 2019 at 19:48, on Zulip):

/facepalm

Shnatsel (Jun 23 2019 at 19:57, on Zulip):

Aaand another one: https://github.com/servo/rust-smallvec/issues/149
This one looks non-Rust-specific, just general faulty logic.

Shnatsel (Jun 23 2019 at 20:15, on Zulip):

Sorry, had an "Ooh, shiny!" moment. Back to your question.
So far the majority of bugs that I've seen are pretty basic logic bugs in unsafe code, like the smallvec issues above. See also https://rustsec.org/advisories/ - that too lists mostly basic stuff like logic errors in unsafe code.
The only one I can think of that is trickier is https://github.com/servo/rust-smallvec/issues/96 - this is an interaction between unsafe code, iterators and panics. But it is pretty hard to trigger in practice.
However, I am incapable of detecting such bugs, so no wonder I've never found any them. It seems likely other people also have a much better understanding of manipulating the heap with raw pointers in a loop than of the invariants that iterators need to uphold and how that might cause a double-free for non-Copy types. So I'd wager that such bugs are simply undetected instead of nonexistent.
Although I see a lot less unsafe code dealing with iterators than with plain old memory regions.

Shnatsel (Jun 23 2019 at 20:17, on Zulip):

I feel that a guide on interacting with iterators from unsafe code in a sound fashion would be helpful, along with a testsuite of "evil" iterators that trigger the non-obvious edge-cases. For example, I have only learned about the set-len-on-drop trick from the SmallVec bug; I don't think there are any docs on that, although I admit I didn't read the Nomicon too closely.

Shnatsel (Jun 23 2019 at 20:19, on Zulip):

A testsuite of evil iterators actually sounds like a great idea. Say, a panicking iterator over an non-Copy type (potential double-free), an iterator that lies about its length (potential out-of-bounds), and probably a lot more evil stuff that I'm not aware of.

Shnatsel (Jun 23 2019 at 20:19, on Zulip):

Also, it's not common knowledge that passing a buffer of uninitialized memory to read(&mut self, buf: &mut [u8])provided by an arbitrary Read implementation is actually UB. But this is a pretty clearly missing abstraction - Vec-like view of a fixed-capacity buffer. There is even a crate to solve that, see buffer, but nobody knows about it and so nobody uses it.

Shnatsel (Jun 23 2019 at 20:20, on Zulip):

So in more general terms, I think the clarity on what invariants various non-trivial Rust facilities are supposed to uphold would be great, especially if they come with a testsuite exercising unexpected but valid edge cases.

RalfJ (Jun 23 2019 at 21:20, on Zulip):

Thanks a lot, that's useful feedback!

RalfJ (Jun 23 2019 at 21:20, on Zulip):

it's not common knowledge that passing a buffer of uninitialized memory to read(&mut self, buf: &mut [u8])provided by an arbitrary Read implementation is actually UB. But this is a pretty clearly missing abstraction - Vec-like view of a fixed-capacity buffer. There is even a crate to solve that, see buffer, but nobody knows about it and so nobody uses it.

Isn't the missing abstraction here basically out pointers, related to placement-new? That has a loooong history though...

Shnatsel (Jun 23 2019 at 21:23, on Zulip):

https://crates.io/crates/buffer seems good enough to me. This is like a Vec over a fixed-size backing allocation.

RalfJ (Jun 23 2019 at 21:29, on Zulip):

doesn't look very maintained though...

RalfJ (Jun 23 2019 at 21:31, on Zulip):

it also seems to do a lot of things like create references to uninitialized data and the like that is not actually allowed...

Shnatsel (Jun 23 2019 at 21:31, on Zulip):

Yeah frankly I expected something like that when somebody said "hey look I made a data structure", but never had the time to look

Shnatsel (Jun 23 2019 at 21:33, on Zulip):

But! We have a (hopefully) sound prototype that works for Vec only: https://github.com/WanzenBug/rust-fixed-capacity-vec

Shnatsel (Jun 23 2019 at 21:35, on Zulip):

This lets you temporarily view Vec with a frozen capacity. If you try to trigger reallocation it panics. This allows two interesting things:
1. Appending to Vec while parts of it are borrowed
2. Solves the "output reference" problem by being a fixed-capacity buffer that does not expose uninitialized data for reading

Shnatsel (Jun 23 2019 at 21:35, on Zulip):

I like the API for buffer crate a lot more though. Mostly because it's not restricted to Vec

Shnatsel (Jun 23 2019 at 21:35, on Zulip):

Oh and please report a bug on buffer if it does something illegal

Shnatsel (Jun 23 2019 at 21:36, on Zulip):

maybe I should start using cargo-crev because I have so much "don't use this!" feedback to share with people lately

Shnatsel (Jun 23 2019 at 21:41, on Zulip):

Not sure what kind of unsafe code I can contribute to https://github.com/rust-lang/unsafe-code-guidelines/issues/146 that you wouldn't find by grepping crates.io, so replied to it with some sample bugs in unsafe code instead.

Shnatsel (Jun 23 2019 at 22:05, on Zulip):

Oh. I got confused by the "Canvas" term. I get it now - you want to know why people choose to use unsafe. Yeah, I've been wondering about that too - and actually contemplated starting a "libs blitz"-like effort to map it, where possible convert it to safe, where impossible understand why. But now that I've looked at DEFLATE decoders I've realized that safe abstractions for almost everything are already present in stdlib, but were added so recently that for crates with any Rust version compatibility guarantees at all it's impossible to use them. For example, to_le_bytes requires 1.32 while using from_le_bytes on a slice also needs TryFrom which requires is 1.34
So my current thinking is to let things settle down, wait for debian stable to ship 1.34 and then incite The Great Unsafe Purge. This should be happening sometime this year.

Lokathor (Jun 24 2019 at 00:06, on Zulip):

I would say that my main uses of unsafe are: (1) FFI, recently specifically SDL2 bindings that I wrote myself because the commonly used sdl2 crate isn't good enough (definitely has leaks, might be unsound?), (2) GBA programming where Rust just doesn't offer safe abstraction (because it honestly can't, no fault here), (3) building an abstraction in Rust for a normal computing situation because the core/std lib just doesn't offer what I want.

Case 3 can't be avoided because the standard lib specifically chooses to keep out code and let crates try it out until a clear winning style is hit upon.

RalfJ (Jun 24 2019 at 07:52, on Zulip):

Oh and please report a bug on buffer if it does something illegal

well it's one of those "we haven't specified it as legal yet and would like to keep the option of declaring it illegal -- but it won't cause any issues in current compiler versions"

Shnatsel (Jun 24 2019 at 20:01, on Zulip):

@Lokathor https://github.com/image-rs/inflate this actually has core-only mode, and it's already used as the DEFLATE implementation in the png crate. This implementation is also safe to the best of my knowledge - we've already found and fixed the bug in its only unsafe block. Its only problem is that it's slow, but perhaps you'll be able to optimize it. It's definitely a better use of your time than writing the fourth DEFLATE decompressor.

RalfJ (Jun 24 2019 at 20:02, on Zulip):

we've already found and fixed the bug in its only unsafe block

so we have a limit of one bug per block? ;)

Lokathor (Jun 24 2019 at 20:05, on Zulip):

I don't see a way to make it no_std in that crate. Either way, I'm going to do it myself at some point and if it's only for educational purposes that's fine.

Shnatsel (Jun 24 2019 at 20:15, on Zulip):

Ugh, "the" is wrong here. Sorry, English is not my native tongue. "A bug" would be more appropriate. But yeah, there could be more vulns lurking in there.

Shnatsel (Jun 24 2019 at 20:16, on Zulip):

It says:

Finally, if you need even more flexibility, or if you only want to depend on core, you can use the inflate::InflateStream API. The below example decodes an array of DEFLATE encoded bytes:

but IDK, maybe they need alloc too.

Lokathor (Jun 24 2019 at 20:20, on Zulip):

separate from that, there's no feature for the crate to be no_std, so it'll pull in std just by that.

RalfJ (Jun 24 2019 at 21:00, on Zulip):

@Shnatsel you mentioned seeing several occurrences of read being called in a generic way with an uninitialized buffer... here's my attempt to improve the docs to call this out as illegal: https://github.com/rust-lang/rust/pull/62102

Shnatsel (Jun 24 2019 at 21:03, on Zulip):

At a glance it is not obvious to me who the "user of a trait" is. How about "it is responsibility of the code calling this function" ... to initialize the buffer?

Shnatsel (Jun 24 2019 at 21:09, on Zulip):

Actually, I've just gone ahead and suggested a replacement for that entire paragraph. But I'm slightly sleep-deprived so don't 100% trust my judgment.

Shnatsel (Jun 24 2019 at 21:48, on Zulip):

I've added zero-initialization to growing a Vec and that made the entire decoding process 5% faster! What kind of sorcery is this?! https://github.com/sile/libflate/pull/34/files

Shnatsel (Jun 24 2019 at 21:50, on Zulip):

I know vec![len; 0]; actually requests zeroed memory from the OS, but in here initializing the vector with zeroes is NOT supposed to be faster than simply writing to a slice of uninitialized memory!

simulacrum (Jun 24 2019 at 21:52, on Zulip):

er, vec![0; len] :)

Shnatsel (Jun 24 2019 at 21:54, on Zulip):

Tests still pass, I've checked

Shnatsel (Jun 24 2019 at 22:02, on Zulip):

Yup, it produces the exact same output, but 5% faster. I'm glad, but... WHY?!

nagisa (Jun 24 2019 at 22:07, on Zulip):

LLVM may be able to optimise better with knowledge that the buffer is 0-ed rather than undef by e.g. not being too careful about reading uninitialized data when doing certain operations and simd-ing everything together

Last update: Nov 19 2019 at 18:55UTC