Is this a place where people should post unsafe code that they want to have inspected for validity? Or is there some other place that's better?
I can’t say that this is the goal of this stream, but I do use it a bit like that. Generally my hope is to provide some interesting “real world” test cases to the UCG group. They would then take them and make actual guidelines and produce a beautiful book like thing, and then I just read that and run Miri and never bother anyone.
We aren’t quite at the last step yet… :wink:
Yes, if you have some question that you can reduce to a small example of unsafe code, this stream is a good place to ask.
That's not the same thing as code review though. Speaking just for myself, I cannot provide review-as-a-service, that's just too time consuming.
I'm trying to review very popular crates with
unsafe with them right now, so cannot help here. If I were you I'd post to http://reddit.com/r/rust/ - that usually does the trick when I need input on something
I've been looking at DEFLATE decoders and the situation there is quite sad. In
libflate(7500 downloads/day) almost every single unsafe block I looked at was unsound, and I've inspected less than half of the crate. Issues that still aren't fixed:
Then there is
miniz_oxide(6000 downloads/day) which I've just started reviewing, but the second unsafe block I looked at had buffer overflow on write in it: https://github.com/Frommi/miniz_oxide/pull/47 (fix still unreleased)
Plus there is something much more interesting going on with a segfault on panic (https://github.com/Frommi/miniz_oxide/issues/14) and also some type confusion issues that are beyond me (https://github.com/Frommi/miniz_oxide/pull/36)
And this is the code that's exposed to untrusted input from the network through
The only silver lining is that there is a DEFLATE decoding crate called
inflate that is slower and less popular, but has only one unsafe block, and we've already found and fixed the vulnerability in that one. Plus I have an RFC open that will make its only bespoke unsafe block redundant:
@Shnatsel I was actually gearing up to make a core-only Deflate lib (which I need for my core-only PNG parser lib), so perhaps you can provide guidance in that area once I read the spec closely enough and have something to show. Unfortunately I've never done Huffman stuff before so it's a slow start at the moment, but Soon(tm).
@Shnatsel oh wow that's pretty sad :(
do you have any ideas for how to avoid unsoundness from creeping back in after the blocks you looked at got fixed? You seem to push for being able to express more things in safe code without a performance hit, but what about cases where that is not possible?
Actually, the C implementation of miniz only ever used the stack, Rust impl then put some things in
Box<&[u8]> because the compiler was not great at optimizing out stack copies. Perhaps you will be able to adapt miniz_oxide, it's easier than starting from scratch.
I don't think I can offer much help on DEFLATE specifically because I was only ever doing the following:
1. Fuzz the crate through its public API
2. Once all the bugs found by fuzzing are fixed, grep for
unsafe and try to check if it's sound.
3. If nothing obviously wrong turns up, try to refactor it into safe Rust without losing performance.
So I have only a very cursory idea of what DEFLATE does. I know it has RLE in it which is impossible to implement safely and efficiently right now (hence my RFC to fix that), but that's about it.
@RalfJ open Rust RFCs to make it possible.
What other options do we have... well, fuzzing with ASAN or running fuzz corpus through MIRI on CI helps, but is not a silver bullet because those are dynamic analyzers, they need an input that would trigger the bug.
Having a sound security-oriented static analyzer would be rad, but that's what borrow checker is. Also Prusti, but that also only works with safe code. I doubt people would bother with something like KLEE, and I don't think such a tool even exists for Rust at this point.
Heuristic static analyzers could help but are also not a silver bullet.
So... I guess open Rust RFCs to make it possible is actually the best option. Followed by putting the code in a crate if it's too niche and then reusing that one crate, and hoping that somewhere, somebody has audited it.
cargo-crev might help in that regard.
So basically my approach is "write and audit once" over "everyone rolls their own
unsafe", and I have a hard time imagining a case where it's flat out impossible. Can you provide some examples?
I was mostly wondering if you had encountered such an example :) You have certainly seen much more unsafe code in the wild than I did.
(and btw, if you could leave some advise/notes/lessons learned at https://github.com/rust-lang/unsafe-code-guidelines/issues/146, that would be great)
In terms of what to do, beyond tooling I was also thinking of better docs/books/tutorials/whatever. Do you feel the issues arise because people don't understand what they are supposed to do? That's a point I am worried about in particular for the more subtle things, like how to handle uninitialized memory. Though the first PR you linked actually had a
TODO: audit this in there, so in that case documentation probably was not the problem. ;)
OOooh hey look, a use-after-free in
Aaand another one: https://github.com/servo/rust-smallvec/issues/149
This one looks non-Rust-specific, just general faulty logic.
Sorry, had an "Ooh, shiny!" moment. Back to your question.
So far the majority of bugs that I've seen are pretty basic logic bugs in unsafe code, like the smallvec issues above. See also https://rustsec.org/advisories/ - that too lists mostly basic stuff like logic errors in unsafe code.
The only one I can think of that is trickier is https://github.com/servo/rust-smallvec/issues/96 - this is an interaction between unsafe code, iterators and panics. But it is pretty hard to trigger in practice.
However, I am incapable of detecting such bugs, so no wonder I've never found any them. It seems likely other people also have a much better understanding of manipulating the heap with raw pointers in a loop than of the invariants that iterators need to uphold and how that might cause a double-free for non-Copy types. So I'd wager that such bugs are simply undetected instead of nonexistent.
Although I see a lot less unsafe code dealing with iterators than with plain old memory regions.
I feel that a guide on interacting with iterators from unsafe code in a sound fashion would be helpful, along with a testsuite of "evil" iterators that trigger the non-obvious edge-cases. For example, I have only learned about the set-len-on-drop trick from the SmallVec bug; I don't think there are any docs on that, although I admit I didn't read the Nomicon too closely.
A testsuite of evil iterators actually sounds like a great idea. Say, a panicking iterator over an non-Copy type (potential double-free), an iterator that lies about its length (potential out-of-bounds), and probably a lot more evil stuff that I'm not aware of.
Also, it's not common knowledge that passing a buffer of uninitialized memory to
read(&mut self, buf: &mut [u8])provided by an arbitrary
Read implementation is actually UB. But this is a pretty clearly missing abstraction - Vec-like view of a fixed-capacity buffer. There is even a crate to solve that, see
buffer, but nobody knows about it and so nobody uses it.
So in more general terms, I think the clarity on what invariants various non-trivial Rust facilities are supposed to uphold would be great, especially if they come with a testsuite exercising unexpected but valid edge cases.
Thanks a lot, that's useful feedback!
it's not common knowledge that passing a buffer of uninitialized memory to read(&mut self, buf: &mut [u8])provided by an arbitrary Read implementation is actually UB. But this is a pretty clearly missing abstraction - Vec-like view of a fixed-capacity buffer. There is even a crate to solve that, see buffer, but nobody knows about it and so nobody uses it.
Isn't the missing abstraction here basically out pointers, related to placement-new? That has a loooong history though...
https://crates.io/crates/buffer seems good enough to me. This is like a Vec over a fixed-size backing allocation.
doesn't look very maintained though...
it also seems to do a lot of things like create references to uninitialized data and the like that is not actually allowed...
Yeah frankly I expected something like that when somebody said "hey look I made a data structure", but never had the time to look
But! We have a (hopefully) sound prototype that works for Vec only: https://github.com/WanzenBug/rust-fixed-capacity-vec
This lets you temporarily view Vec with a frozen capacity. If you try to trigger reallocation it panics. This allows two interesting things:
1. Appending to Vec while parts of it are borrowed
2. Solves the "output reference" problem by being a fixed-capacity buffer that does not expose uninitialized data for reading
I like the API for
buffer crate a lot more though. Mostly because it's not restricted to Vec
Oh and please report a bug on
buffer if it does something illegal
maybe I should start using
cargo-crev because I have so much "don't use this!" feedback to share with people lately
Not sure what kind of unsafe code I can contribute to https://github.com/rust-lang/unsafe-code-guidelines/issues/146 that you wouldn't find by grepping crates.io, so replied to it with some sample bugs in unsafe code instead.
Oh. I got confused by the "Canvas" term. I get it now - you want to know why people choose to use
unsafe. Yeah, I've been wondering about that too - and actually contemplated starting a "libs blitz"-like effort to map it, where possible convert it to safe, where impossible understand why. But now that I've looked at DEFLATE decoders I've realized that safe abstractions for almost everything are already present in stdlib, but were added so recently that for crates with any Rust version compatibility guarantees at all it's impossible to use them. For example,
to_le_bytes requires 1.32 while using
from_le_bytes on a slice also needs
TryFrom which requires is 1.34
So my current thinking is to let things settle down, wait for debian stable to ship 1.34 and then incite The Great Unsafe Purge. This should be happening sometime this year.
I would say that my main uses of unsafe are: (1) FFI, recently specifically SDL2 bindings that I wrote myself because the commonly used
sdl2 crate isn't good enough (definitely has leaks, might be unsound?), (2) GBA programming where Rust just doesn't offer safe abstraction (because it honestly can't, no fault here), (3) building an abstraction in Rust for a normal computing situation because the core/std lib just doesn't offer what I want.
Case 3 can't be avoided because the standard lib specifically chooses to keep out code and let crates try it out until a clear winning style is hit upon.
Oh and please report a bug on
bufferif it does something illegal
well it's one of those "we haven't specified it as legal yet and would like to keep the option of declaring it illegal -- but it won't cause any issues in current compiler versions"
@Lokathor https://github.com/image-rs/inflate this actually has core-only mode, and it's already used as the DEFLATE implementation in the
png crate. This implementation is also safe to the best of my knowledge - we've already found and fixed the bug in its only unsafe block. Its only problem is that it's slow, but perhaps you'll be able to optimize it. It's definitely a better use of your time than writing the fourth DEFLATE decompressor.
we've already found and fixed the bug in its only unsafe block
so we have a limit of one bug per block? ;)
I don't see a way to make it no_std in that crate. Either way, I'm going to do it myself at some point and if it's only for educational purposes that's fine.
Ugh, "the" is wrong here. Sorry, English is not my native tongue. "A bug" would be more appropriate. But yeah, there could be more vulns lurking in there.
Finally, if you need even more flexibility, or if you only want to depend on core, you can use the inflate::InflateStream API. The below example decodes an array of DEFLATE encoded bytes:
but IDK, maybe they need
separate from that, there's no feature for the crate to be
no_std, so it'll pull in std just by that.
@Shnatsel you mentioned seeing several occurrences of
read being called in a generic way with an uninitialized buffer... here's my attempt to improve the docs to call this out as illegal: https://github.com/rust-lang/rust/pull/62102
At a glance it is not obvious to me who the "user of a trait" is. How about "it is responsibility of the code calling this function" ... to initialize the buffer?
Actually, I've just gone ahead and suggested a replacement for that entire paragraph. But I'm slightly sleep-deprived so don't 100% trust my judgment.
I've added zero-initialization to growing a Vec and that made the entire decoding process 5% faster! What kind of sorcery is this?! https://github.com/sile/libflate/pull/34/files
vec![len; 0]; actually requests zeroed memory from the OS, but in here initializing the vector with zeroes is NOT supposed to be faster than simply writing to a slice of uninitialized memory!
vec![0; len] :)
Tests still pass, I've checked
Yup, it produces the exact same output, but 5% faster. I'm glad, but... WHY?!
LLVM may be able to optimise better with knowledge that the buffer is 0-ed rather than undef by e.g. not being too careful about reading uninitialized data when doing certain operations and simd-ing everything together