Stream: wg-secure-code

Topic: use of unsafe


Shnatsel (Jan 04 2019 at 21:29, on Zulip):

Ropey just got released as 1.0 and I got an overview of the use of unsafe code in it from the author. Haven't dug into the code yet, but this may be an interesting case for trying to exploit compiler optimizations without resorting to unsafe code.

snf (Mar 02 2019 at 13:30, on Zulip):

Reviving this thread. Anyone made stats on how unsafe is used? I'm thinking in FFI, Sync/Send (or other traits) or others.

snf (Mar 02 2019 at 13:32, on Zulip):

Afaik, https://github.com/anderejd/cargo-geiger is the best tool for analyzing unsafe code but it's missing FFI in its classification. Probably because it doesn't matter as long as it is unsafe code

Shnatsel (Mar 02 2019 at 14:13, on Zulip):

I have seen a number of common patterns of unsafe code usage in binary format decoders. We've even got a clippy warning for a (precursor to) one of them. If I don't get around to replying today, poke me tomorrow and I'll elaborate.

DevQps (Mar 14 2019 at 15:39, on Zulip):

Today I saw this in the libflate crate

let mut buf = Vec::new();
    loop {
        let b = reader.read_u8()?;
        if b == 0 {
            return Ok(unsafe { CString::from_vec_unchecked(buf) });
        }
        buf.push(b);
    }

If you look closely you can see that this is actually safe, since there could be no NULL characters in between. I guess they probably did this for performance reasons, such that they don't have to check the whole Vec for intermediary NULL characters.

Now I am not a pro, but I think it should not be too hard to create a method for this. I wouldn't be surprised if this occurs more often.
The options that pop up in my mind:

1. Implement Read::read_cstring() -> CString (this could be implemented by the trait itself, using Read::read() -> u8)
2. Implement CString::read<T: impl Read>(readable: T) -> CString
3. Add some utility methods to a crate like byteorder that could read from a Read trait.
Forgive me the names of the functions, though, it's about the general idea :)

Curious to what you guys think!

Daniel Carosone (Mar 14 2019 at 18:29, on Zulip):

sounds good, but note that read_u8doesn't seem to be part of the Read trait either, must be elsewhere in that crate or its deps. If it weren't for that, I think Option 2 seems most appropriate

Shnatsel (Mar 14 2019 at 20:34, on Zulip):

I think this can be rewritten using read_until(): https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until

Shnatsel (Mar 14 2019 at 20:36, on Zulip):

Could you provide some context for that code? It's strange to see a pure-Rust library use a CString at all.

Shnatsel (Mar 14 2019 at 20:43, on Zulip):

Not directly relevant, but I just wanted to jot down some notes while we're discussing libflate: https://github.com/sile/libflate/blob/74c2b8bd5072cd9ca676d5b92ee326e64a01827f/src/lz77/default.rs#L108-L117
This function has caused trouble for the second time in a row. The function declaration is definitely wrong, the code as-is should be annotated unsafe fn because it doesn't validate that the length of the input is greater than 3.
But we actually want that function to be safe, so we want to validate that. The cheap way to do it is either an assert at the beginning, or subslice it [0..3] and look up values from that. This creates only one bounds check.
Since the function is marked #[inline(always)] rustc should elide the bounds check entirely where appropriate. The next step is to find if there are any hot loops which are using this function and check bounds for the entire loop just once, if possible. You can do that without explicit unsafe if you do the checks up front and the optimizer figures out that they can be elided in the loop.

DevQps (Mar 14 2019 at 21:14, on Zulip):

sounds good, but note that read_u8doesn't seem to be part of the Read trait either, must be elsewhere in that crate or its deps. If it weren't for that, I think Option 2 seems most appropriate

Ah my bad... I thought it provided such a method but apparently it does not..

DevQps (Mar 14 2019 at 21:27, on Zulip):

I think this can be rewritten using read_until(): https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until

The code is from here!: https://github.com/sile/libflate/blob/a7cd7995458ace28ed388ba0d1374bfdc4b495e5/src/gzip.rs
I see that they use the byteorder crate (with ReadBytesExt) to do the read_u8() call.

I must say read_until() looks promising, however when I smell my mustache I feel like something's not quite right.

Let's take for example libflate. I can image that someone wants their Decoder to be used with anything that implement Read right? Regardless of when it implements BufRead or not. I can imagine someone implementing some kind of structure that implements Read but not BufRead, without having to perform expensive system calls. Or does this make no sense?

I thought repeatedly reading a single byte with Read::read would be really slow, but I see that the byteorder ReadBytesExt::read_u8 does exactly this. Therefore we could technically create a Read::read_until method that uses Read::read without performance overhead (as long as the object implementing Read::read is efficient)

DevQps (Mar 14 2019 at 21:30, on Zulip):

Not directly relevant, but I just wanted to jot down some notes while we're discussing libflate: https://github.com/sile/libflate/blob/74c2b8bd5072cd9ca676d5b92ee326e64a01827f/src/lz77/default.rs#L108-L117
This function has caused trouble for the second time in a row. The function declaration is definitely wrong, the code as-is should be annotated unsafe fn because it doesn't validate that the length of the input is greater than 3.
But we actually want that function to be safe, so we want to validate that. The cheap way to do it is either an assert at the beginning, or subslice it [0..3] and look up values from that. This creates only one bounds check.
Since the function is marked #[inline(always)] rustc should elide the bounds check entirely where appropriate. The next step is to find if there are any hot loops which are using this function and check bounds for the entire loop just once, if possible. You can do that without explicit unsafe if you do the checks up front and the optimizer figures out that they can be elided in the loop.

As for this!: I saw this as well! It looked like it could easily go wrong if the slice is not exactly 3 bytes. I am not sure however how easy that is to happen. Personally I kind of feel like a single bound check should not degrade performance by that much and I'd rather have safe code then something that squeezes out a few extra micro/milliseconds. I will at least take a look at this! Hopefully I have some time tomorrow!

Shnatsel (Mar 14 2019 at 21:31, on Zulip):

I think the better question to ask would be why does it use CString at all. It's not the read part that's unsafe, it's the CString part.

Shnatsel (Mar 14 2019 at 21:33, on Zulip):

Build a benchmark with Criterion and you'll know instead of guessing: https://bheisler.github.io/criterion.rs/book/index.html
I'm pretty sure I've already made one for inflate crate, so you can mostly reuse that for libflate: https://github.com/Shnatsel/inflate/tree/benchmarking
Bonus points for contributing it upstream.

DevQps (Mar 14 2019 at 21:43, on Zulip):

I think the better question to ask would be why does it use CString at all. It's not the read part that's unsafe, it's the CString part.

I figured out that it's used for reading the GZIP header which stores the filename and potential comments as a null-terminated String. It seems like the unsafe function is used for both the filename and the comment field.

Shnatsel (Mar 14 2019 at 21:47, on Zulip):

Any reason why a plain old Vec doesn't cut it?

DevQps (Mar 14 2019 at 21:47, on Zulip):

I think the better question to ask would be why does it use CString at all. It's not the read part that's unsafe, it's the CString part.

Mmm I agree that the function call of CString is what makes it unsafe, but wouldn't you say that there should be a safe method for reading a CString without having to do loop over a buffer twice? Like reading and constructing a CString such that one can safely construct a CString while knowing that there are no null characters in between? I wouldn't be surprised if other projects would use the unchecked version as well just for performance reasons, while shouldn't be strictly necessary I think :)

DevQps (Mar 14 2019 at 21:49, on Zulip):

Wait... I suddenly feel like I am beginning to understand things haha :)

We can probably just replace the loop and fill the buffer without the null character and then convert it to a String or str somehow right?

Shnatsel (Mar 14 2019 at 21:50, on Zulip):

String or &str must be valid UTF-8, these are just bytes. It should be a vector.

DevQps (Mar 14 2019 at 21:50, on Zulip):

String or &str must be valid UTF-8, these are just bytes. It should be a vector.

That sounds good. I see that they also have calls like: get_filename and get_comment. Do you think these should return Vectors as well?

Shnatsel (Mar 14 2019 at 21:51, on Zulip):

Unless the function is passing the CString to some C code, the same guarantee (that the sequence is null-terminated) can be achieved by introducing a NullTerminatedVec newtype, with constructor that accepts a Vec and implements Deref into Vec

Alex Gaynor (Mar 14 2019 at 21:51, on Zulip):

If they have CString in the public API, you're probably stuck with them though

Shnatsel (Mar 14 2019 at 21:52, on Zulip):

Well, you could convert them into slices I guess

DevQps (Mar 14 2019 at 21:52, on Zulip):

They do have this kind of code:

/// Returns the file name.
    pub fn filename(&self) -> Option<&CString> {
        self.filename.as_ref()
    }

    /// Returns the comment.
    pub fn comment(&self) -> Option<&CString> {
        self.comment.as_ref()
    }

but they do not pass it anywhere to C code.

DevQps (Mar 14 2019 at 21:53, on Zulip):

So I guess we're stuck then?

snf (Mar 14 2019 at 21:53, on Zulip):

It's probably to return the information as it is in the file

Shnatsel (Mar 14 2019 at 21:53, on Zulip):

Actually, could Vector maybe implement .into(CString)?

Shnatsel (Mar 14 2019 at 21:54, on Zulip):

Not slice, but vector

Shnatsel (Mar 14 2019 at 21:54, on Zulip):

That could be implemented safely and efficiently

snf (Mar 14 2019 at 21:54, on Zulip):

I think there ir CString::from(Vec<u8>), it will probably require iterating it though

Alex Gaynor (Mar 14 2019 at 21:54, on Zulip):

How could it be done efficiently? It'd need a linear scan right?

DevQps (Mar 14 2019 at 21:54, on Zulip):

That would probably work, however I guess that would require checking if there are any null terminated characters in the content as well.

Alex Gaynor (Mar 14 2019 at 21:55, on Zulip):

CString::new uses memchr, so it's probably pretty fast, though obviously still O(n)

DevQps (Mar 14 2019 at 21:56, on Zulip):

The thing is this I guess:

CString supports a safe way to read the CString namely CString::new. But that method checks the whole buffer again for intermediary null characters. So for performance reasons CString::from_vec_unchecked is used. If we would do a Vec to CString I guess in order to be safe we would also have to perform the check again right? Or am I wrong about this?

Alex Gaynor (Mar 14 2019 at 21:57, on Zulip):

I think you're right -- we really need a CString::from_reader or something like that

DevQps (Mar 14 2019 at 21:57, on Zulip):

I think you're right -- we really need a CString::from_reader or something like that

I personally think this would solve many problems as well. You don't have to sacrifice performance and it would still be safe.

Shnatsel (Mar 14 2019 at 21:59, on Zulip):

Yeah CString::from_reader sounds like a good idea. However, CString in public API still sounds like a really bad idea.

Alex Gaynor (Mar 14 2019 at 22:00, on Zulip):

Yeah, that also seems like a mistake

DevQps (Mar 14 2019 at 22:01, on Zulip):

I guess we all agree on that! :) If you guys would like it I can create an issue and submit a pull request?

Shnatsel (Mar 14 2019 at 22:02, on Zulip):

Adding CString::from_reader or changing libflate API?

DevQps (Mar 14 2019 at 22:03, on Zulip):

First adding CString::from_reader and then changing libflate as well! It will probably take a while till the CString PR would get through, so I could look at other problems with libflate as well.

DevQps (Mar 14 2019 at 22:05, on Zulip):

Or do you think I am running too fast with this?

Shnatsel (Mar 14 2019 at 22:10, on Zulip):

Nah, go right ahead! I think prototyping it locally would be a good start, then you can create an RFC describing the behavior and a PR against standard library.

Alex Gaynor (Mar 14 2019 at 22:10, on Zulip):

I think just adding a single new method to CString wouldn't require an RFC, but I'm not positive.

Shnatsel (Mar 14 2019 at 22:11, on Zulip):

Start with a PR and they'll tell you if an RFC is needed or not :)

Shnatsel (Mar 14 2019 at 22:11, on Zulip):

Also we should start collecting info on the patterns we find that are missing. I have put a lot of work into one back in the day, along with a proposal to fix it, although my implementation turned out to be too fragile for my liking: https://internals.rust-lang.org/t/pre-rfc-fixed-capacity-view-of-vec/8413/1

simulacrum (Mar 14 2019 at 22:13, on Zulip):

@Alex Gaynor Unstable methods, especially on non-traits, definitely don't need RFCs for the most part

simulacrum (Mar 14 2019 at 22:13, on Zulip):

OTOH, stabilizing one might -- depending on the amount of thorny questions around the API. Probably not though.

DevQps (Mar 14 2019 at 22:16, on Zulip):

I will first make a small draft of the function tomorrow and share it with you guys, so you can review it first. Since I am not a very experienced Rust programmer that might be the best option! I'll go catch some sleep now and I'll tune in tomorrow again! Thanks for the nice discussion (Y)!

Shnatsel (Mar 14 2019 at 22:17, on Zulip):

Is there anything we want to cover other than BufReader types? That would make implementation trivial, and would cover most use cases since Cursor implements BufReader and if you're doing external I/O you probably want to wrap that in a BufReader as well.

Shnatsel (Mar 14 2019 at 22:18, on Zulip):

OTOH I'm not sure how that meshes with the byteorder use case, which might be a deal-breaker

Shnatsel (Mar 14 2019 at 22:24, on Zulip):

Perhaps Byteorder could expose a u8_reader that implements BufReader if the underlying reader does as well

DevQps (Mar 15 2019 at 12:52, on Zulip):

Alright guys! I made my initial version:

pub fn CString::from_reader(mut reader: impl Read) -> Result<CString, std::io::Error>
{
    let mut buffer = Vec::new();
    loop {
        // Read a single byte, same way as byteorder::read_u8 does it (using std::slice however).
        // If the specific implementation of Read::read is buffered and inlined
        // I am quite sure this can be optimized well.
        let mut character: u8 = 0;
        let slice = std::slice::from_mut(&mut character);
        reader.read_exact(slice)?;

        // Push a new character.
        buffer.push(character);

        // Check if a null character has been found, if so return the Vec as CString.
        if character == 0 {
            return Ok(CString { inner: buffer.into_boxed_slice() });
        }
    }
}

Here is a playground version of it but since I cannot construct CString directly I modified the code slightly for it to work (moving the buffer line and replacing it with from_vec_unchecked). But the general idea is the same.

Playground version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=59f4f97198d6d5a3310064770be2f4e1

DevQps (Mar 15 2019 at 12:55, on Zulip):

Let me know what you guys think!

The only thing that kind of itches me is the initial capacity of a Vec. It feels like it would be a bit of a shame if Vec has to allocate multiple times because the to-be-read string is longer then expected. I am not sure if it would be overkill, but maybe we should create another method that allows a user to specify the default capacity?

Alex Gaynor (Mar 15 2019 at 19:05, on Zulip):

So another idea occurred to me; should CString support a safe from method that takes Vec<NonZeroU8>? Because it's repr(transparent) I think you can cheat and convert it into a &[u8] with some magic unsafe.

Shnatsel (Mar 15 2019 at 20:00, on Zulip):

Why not unsafe {CString::from_vec_unchecked(buffer)}? That sounds like the obvious thing from performance standpoint. People will just roll their own version of this if the stdlib function doesn't work like that.

Alex Gaynor (Mar 15 2019 at 20:00, on Zulip):

Why not what? That's what people seem to be doing now, the question was how do we reduce the need to write unsafe?

Shnatsel (Mar 15 2019 at 20:02, on Zulip):

Oh, I meant to ask why is that not in the function code posted above. My thinking was that we write the function with that unsafe block once and expose a safe interface, and everybody else can use it to get speed without rolling their own unsafe.

Alex Gaynor (Mar 15 2019 at 22:17, on Zulip):

Oh, sure, that works. My suggestion about something taking Vec<NonZeroU8> was mostly because that has the interesting property that it can be safe and fast.

DevQps (Mar 15 2019 at 23:08, on Zulip):

About the unsafe CString::from_vec_unchecked!: I guess it is because it uses reserve_exact right? Otherwise the code is exactly identical. But I agree that might be better! We would introduce calling an unsafe method to the from_reader method, but in terms of behavior it would be just as unsafe as constructing it directly. If the from_vec_unchecked call is inlined we have the same (optimal) performance!

Are there any other things you'd suggest me to change? Otherwise I can open a PR for this! I wonder however about the procedure. Should I first create an issue on the Rust repo for this? Or can I directly make a pull request?

Alex Gaynor (Mar 15 2019 at 23:10, on Zulip):

you can send a PR without an issue

DevQps (Mar 15 2019 at 23:13, on Zulip):

Okay thanks for that info! It's already 0:12 here, so I will create one and link it back here tomorrow.

DevQps (Mar 15 2019 at 23:13, on Zulip):

Good night guys :)

DevQps (Mar 15 2019 at 23:19, on Zulip):

And btw Alex! I think its a cool idea! However I was wondering: How can one construct a Vec<NonZeroU8> safely without suffering the performance check of having to check if it is non-zero? Or do you think the compiler can optimise this? I could check if it does tomorrow if you're not sure

Alex Gaynor (Mar 16 2019 at 04:06, on Zulip):

Check where? In the sample from libflate, you're already checking if each byte is 0, this would simply change that check into constructing a new NonZeroU8

DevQps (Mar 16 2019 at 08:21, on Zulip):

Ahhh in that sense! I think that would be a great approach as well! What about implementing the From<Vec<NonZeroU8>> for CString? That way we can consume the buffer and use it for the CString.

DevQps (Mar 16 2019 at 08:25, on Zulip):

Maybe that is even better actually. I always felt like CString::from_reader is a bit akward... I think using the From trait would introduce less clutter but it would mean that people have to code reading byte by byte just as I did before themselves, but I that doesn't have to be a problem perse.

DevQps (Mar 16 2019 at 10:07, on Zulip):

Alright, I guess we have two methods of doing this right now:

Method 1: Using CString::from_reader
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=49cecd94f4cc0e1c09dc83deca5c91ee

Method 2: Using From<Vec<NonZeroU8>>
https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=3a4a90b19e7e7d3fdd80f9f1f48edeb9

Additional Question: Maybe we should provide both methods?

Personally I feel like the second option just feels more right... The only thing I am worried about is that people find it too complex and will use do something unsafe because they have no knowledge on how to do it right.

DevQps (Mar 16 2019 at 10:35, on Zulip):

I decided to create an issue: https://github.com/rust-lang/rust/issues/59229 such that we can continue the discussion there. I am also wondering if there are other people of the Rust community that have any solutions on their minds. To be really honest, I think implementing both methods we have come up with so far does not sound so bad.

Daniel Carosone (Mar 19 2019 at 03:22, on Zulip):

NonZeroU8 seems like a great way to arrange for the null check to happen once, and to communicate safely using types that it's already happened

Daniel Carosone (Mar 19 2019 at 03:24, on Zulip):

but it won't carry the NULL terminator, so some copying is needed to get a contiguous 0u8 on the end in the CString's storage.

Alex Gaynor (Mar 19 2019 at 03:25, on Zulip):

You don't need to copy, since you're passing ownership of the Vec to CString you can just append to it -- after stealing the storage and converting it to a Vec<u8> of course.

Daniel Carosone (Mar 19 2019 at 03:26, on Zulip):

yeah, so long as the Vec has capacity +1

DevQps (Mar 19 2019 at 06:28, on Zulip):

I will implement the CString::from_reader function tomorrow hopefully!

DevQps (Mar 19 2019 at 06:28, on Zulip):

We can probably do the From<NonZeroU8> conversion in a separate PR

DevQps (Mar 20 2019 at 14:41, on Zulip):

The pull request for CString::from_reader is here!: https://github.com/rust-lang/rust/pull/59314

DevQps (Mar 20 2019 at 14:41, on Zulip):

If you have any comments, please let me know!

DevQps (Mar 30 2019 at 09:33, on Zulip):

I was reading through some old T-Doc issues and I found this one:
https://github.com/rust-lang/rust/issues/54542

Maybe we can try to create an abstraction for this?

Tony Arcieri (Jul 20 2019 at 17:04, on Zulip):

not sure where else to ask this or who might know (perhaps @RalfJ ) but I'm wondering if there's a safe way to do AsRef-style reference conversions for wrapper newtypes for other references now, possibly with something like #[repr(transparent)]

Tony Arcieri (Jul 20 2019 at 17:05, on Zulip):

the use case is something like std::path::Path (but my own custom path type for different purposes), where I also have an owned PathBuf type, want for it to be able to impl AsRef<Path>, and then use AsRef<Path> as a bound so you can interchangeably pass either a &PathBuf or a &Path

Tony Arcieri (Jul 20 2019 at 17:06, on Zulip):

and how to impl AsRef<Path> for PathBuf without using unsafe

Tony Arcieri (Jul 20 2019 at 17:06, on Zulip):

I believe std uses transmute :grimacing:

Andreas Molzer (Jul 20 2019 at 17:09, on Zulip):

Related discussion on special (safe) syntax for #[repr(transparent)] reference conversion on internals: https://internals.rust-lang.org/t/pre-rfc-patterns-allowing-transparent-wrapper-types/10229

Tony Arcieri (Jul 20 2019 at 17:10, on Zulip):

@Andreas Molzer exactly what I'm looking for! thank you

Shnatsel (Jul 20 2019 at 17:19, on Zulip):

HeroicKatora? I know this guy, he took over maintenance of png, inflate and a bunch of other formerly Piston projects, and getting safety fixed merged got way easier since. Always very helpful on PRs. Kudos to that guy.

Tony Arcieri (Jul 20 2019 at 17:33, on Zulip):

would love to see something like this. in the meantime I'm using Into instead of AsRef as the bound, in order to create an owned wrapper rather than a reference. that mostly works except I'd love to have my PathBuf also impl Deref<Target=Path> but that's not possible without a reference conversion :cry:

Tony Arcieri (Jul 20 2019 at 17:34, on Zulip):

and in absence of being able to impl Deref I'm instead writing wrapper functions on PathBuf that delegate to Path

Tony Arcieri (Jul 20 2019 at 17:35, on Zulip):

I imagine if this sort of feature lands, there's all sorts of places in std that could use it too...

RalfJ (Jul 20 2019 at 19:05, on Zulip):

so the question is how to do this without writing unsafe? Sorry, I don't know. :/

Tony Arcieri (Jul 20 2019 at 19:09, on Zulip):

@RalfJ yes, exactly. the pre-RFC @Andreas Molzer linked has a suggested casting syntax

Tony Arcieri (Jul 20 2019 at 19:09, on Zulip):

which would do exactly what I want

Tony Arcieri (Jul 20 2019 at 19:10, on Zulip):

I just want to cast the inner &'a T into a &'a MyNewtype which is a #[repr(transparent)] 1-tuple newtype for &'a T

Nick12 (Aug 23 2019 at 19:52, on Zulip):

Is there a way to assert/static_assert that a type is transparent?
I feel like if there was, a simple macro could probably do this

Shnatsel (Aug 27 2019 at 19:08, on Zulip):

@Nick12 try asking in #black_magic in Rust Community Discord

Last update: Nov 11 2019 at 23:00UTC