Stream: t-lang/wg-unsafe-code-guidelines

Topic: FFI structs ending in variable sized arrays.


Peter Rabbit (May 11 2020 at 18:28, on Zulip):

Suppose I have a struct defined like so:

#[repr(C)] struct Foo {
    length: u32,
    data: [u8; 1],
}

I now allocate some memory and pass it to a function which fills it in or maybe it allocates some memory for me and returns a raw pointer that I return to another function later to free.
Either way I now have a foo: *mut Foo where the length of foo.data is determined by foo.length. However I feel like doing slice::from_raw_parts(foo.data.as_ptr(), foo.length as usize) is not sound as I'm using a pointer to a smaller data to access a larger chunk of data. Is this sound, and if not what is the correct way to get a slice to the full data?

Elichai Turkel (May 11 2020 at 19:12, on Zulip):

Yep doing that is probably unsound.
Why not do this:?

#[repr(C)] struct Foo {
    length: u32,
    data: *const u8,
}
Elichai Turkel (May 11 2020 at 19:16, on Zulip):

FWIW bindgen generates the following:

#[repr(C)]
pub struct __IncompleteArrayField<T>(PhantomData<T>, [T; 0]);
#[repr(C)]
pub struct my_struct {
    pub size: size_t,
    pub data: __IncompleteArrayField<u8>,
}

with this impl:
impl<T> __IncompleteArrayField<T> {
    #[inline]
    pub const fn new() -> Self {
        __IncompleteArrayField(PhantomData, [])
    }
    #[inline]
    pub fn as_ptr(&self) -> *const T {
        self as *const _ as *const T
    }
    #[inline]
    pub fn as_mut_ptr(&mut self) -> *mut T {
        self as *mut _ as *mut T
    }
    #[inline]
    pub unsafe fn as_slice(&self, len: usize) -> &[T] {
        slice::from_raw_parts(self.as_ptr(), len)
    }
    #[inline]
    pub unsafe fn as_mut_slice(&mut self, len: usize) -> &mut [T] {
        slice::from_raw_parts_mut(self.as_mut_ptr(), len)
    }
}

Which I'm also not sure is actually sound

lcnr (May 11 2020 at 19:34, on Zulip):

Afaik it is fine to access the complete allocation of a pointer even if it is outside of ptr..(ptr + size_of). Aliasing rules still apply though

Elichai Turkel (May 11 2020 at 19:39, on Zulip):

lcnr said:

Afaik it is fine to access the complete allocation of a pointer even if it is outside of ptr..(ptr + size_of). Aliasing rules still apply though

I really remember a rule saying that (&arr[0] as *const _) != (&arr as * const _)

Elichai Turkel (May 11 2020 at 19:46, on Zulip):

Miri is also not fine with the first example:
https://play.rust-lang.org/?gist=fc85ea4dc06a472c1ba8092c0765dcdf

lcnr (May 11 2020 at 19:48, on Zulip):

:thinking:

cc @RalfJ, who knows more about this stuff.

Peter Rabbit (May 11 2020 at 20:45, on Zulip):

Elichai Turkel said:

Yep doing that is probably unsound.
Why not do this:?

#[repr(C)] struct Foo {
    length: u32,
    data: *const u8,
}

I cannot change the layout of Foo as I am working with system APIs.

Elichai Turkel (May 11 2020 at 20:53, on Zulip):

Could you point at the exact API? I assumed it's a C API of the following:

struct my_struct {
   unsigned long size;
   unsigned char data[];
};
Peter Rabbit (May 11 2020 at 20:54, on Zulip):

There's a bajillion structs in winapi that end in variable sized arrays like for example:

STRUCT!{struct USB_NODE_CONNECTION_DRIVERKEY_NAME {
    ConnectionIndex: ULONG,
    ActualLength: ULONG,
    DriverKeyName: [WCHAR; 1],
}}
Peter Rabbit (May 11 2020 at 20:58, on Zulip):

I just need some sort of reasonably ergonomic solution that can get an official stamp of approval.

Elichai Turkel (May 11 2020 at 20:59, on Zulip):

Let's wait for @RalfJ he'll have the best answers :)

Jake Goulding (May 11 2020 at 21:03, on Zulip):

FWIW, I've heard this called "flexible array member", not "variable sized array"

Jake Goulding (May 11 2020 at 21:04, on Zulip):

See also https://doc.rust-lang.org/nightly/unstable-book/language-features/unsized-locals.html?highlight=unsized#variable-length-arrays

Lokathor (May 11 2020 at 21:07, on Zulip):

I'm finding it a little hard to believe that you couldn't have a struct like this and then go from &T to &T_slice_data just because the slice data isn't all "in" the memory span of the T. After all, the data of a Vec's slice isn't "in" the memory span of the Vec itself.

Peter Rabbit (May 11 2020 at 22:07, on Zulip):

I created this alternative which seems to pass miri. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=34c4e0107736b167d99bee081e468682

Elichai Turkel (May 12 2020 at 07:42, on Zulip):

Lokathor said:

I'm finding it a little hard to believe that you couldn't have a struct like this and then go from &T to &T_slice_data just because the slice data isn't all "in" the memory span of the T. After all, the data of a Vec's slice isn't "in" the memory span of the Vec itself.

The problem is taking a reference to an array, the moment you do that that reference is only valid for that array, and not for out of bounds, this is generalized also for single elements in the arrays.
in Vec we never use a array/slice to generate pointers, we keep a raw pointer from the allocator that is valid for the whole allocation.

Elichai Turkel (May 12 2020 at 07:46, on Zulip):

Peter Rabbit said:

I created this alternative which seems to pass miri. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=34c4e0107736b167d99bee081e468682

That's because you actually created a base_ptr that points to the full array

Lokathor (May 12 2020 at 07:53, on Zulip):

Ah, yeah, i should have posted a follow up to that. Yandros explained in Discord that references are to blame. If you were to do this purely with pointers it would work fine.

Elichai Turkel (May 12 2020 at 07:55, on Zulip):

Lokathor said:

Ah, yeah, i should have posted a follow up to that. Yandros explained in Discord that references are to blame. If you were to do this purely with pointers it would work fine.

Yep, that's why I suggested to replace data: [u8; 1] with data: *const u8

Connor Horman (May 12 2020 at 17:42, on Zulip):

Wouldn't that require a C level api change as well?

Peter Rabbit (May 12 2020 at 18:03, on Zulip):

So I modified my example that passes miri so that it does the original naive from_raw_parts directly with the array pointer and none of the offset manipulation, and it still passes miri?!

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4524cb616cf9c4d57689693e372b256e

Lokathor (May 12 2020 at 19:13, on Zulip):

@Connor Horman yes, and in this case that's not an allowed solution because the struct is defined in the windows SDK. All we get to do is decide how best to interact with it on the rust side.

RalfJ (May 12 2020 at 19:51, on Zulip):

Indeed when you have a reference it and all its "descendants" may only be used to access size_of_val(r) many bytes (Cc https://github.com/rust-lang/unsafe-code-guidelines/issues/134)

RalfJ (May 12 2020 at 19:52, on Zulip):

so for the original question @Peter Rabbit , you'd want to go from a *mut Foo to a *mut u8 (to the data field) without intermediate references. the best but unstable way to do that is &raw; the only table way I think is manual offset computations.

RalfJ (May 12 2020 at 19:53, on Zulip):

your other variant is probably passing in Miri because Miri is currently not trying to tell multiple raw ptrs to the same data apart -- but that is a Miri deficiency, we'll likely have to "close that hole" if we want to actually match what LLVM does

Peter Rabbit (May 12 2020 at 20:52, on Zulip):

@RalfJ So just to clarify and be absolutely certain...

This is sound? https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a0faca71829d3af39da4910ac9a11e30

And this is unsound? https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4524cb616cf9c4d57689693e372b256e

Peter Rabbit (May 12 2020 at 21:13, on Zulip):

ahahahaha, this even gets hit in the backtrace crate!
https://github.com/rust-lang/backtrace-rs/blob/16682c76eb25df517e2cc220e56baf4f8a616f72/src/symbolize/dbghelp.rs#L165

RalfJ (May 13 2020 at 08:40, on Zulip):

This is sound? https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a0faca71829d3af39da4910ac9a11e30

the shared reference version looks good but the mutable reference version is suspicious... I am actually surprised Miri doesn't complain there. You are creating an &mut [T] return value which overlaps with the &mut self... oh, Self is a ZST, hence no overlap! Okay this is subtle.
So yes this seems right but please add a comment saying it relies on FlexibleArray being a ZST.

RalfJ (May 13 2020 at 08:40, on Zulip):

And this is unsound? https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4524cb616cf9c4d57689693e372b256e

Yes, for this reason.

RalfJ (May 13 2020 at 08:41, on Zulip):

@Peter Rabbit I should add the usual disclaimer that Stacked Borrows is an experiment and not RFC'd in any way.

RalfJ (May 13 2020 at 08:41, on Zulip):

Peter Rabbit said:

ahahahaha, this even gets hit in the backtrace crate!
https://github.com/rust-lang/backtrace-rs/blob/16682c76eb25df517e2cc220e56baf4f8a616f72/src/symbolize/dbghelp.rs#L165

nice catch!

Peter Rabbit (May 14 2020 at 05:54, on Zulip):

Jake Goulding said:

FWIW, I've heard this called "flexible array member", not "variable sized array"

Windows calls them variable sized arrays. To quote an official header.

    // Variable length array of info about hub ports
    USB_HUB_PORT_INFORMATION PortInfo[1];
rpjohnst (May 14 2020 at 17:34, on Zulip):

That comment says "variable size field," though I think what jake was getting at is that the C standard has something entirely unrelated called "variable length arrays" that people like to avoid confusion with, by using the C standard term "flexible array members"

Jake Goulding (May 14 2020 at 18:45, on Zulip):

Yeah, the Windows API predates FAMs, which are expressed as [] instead of Windows' [1]. From my understanding they are the same though.

RalfJ (May 16 2020 at 08:53, on Zulip):

So if Windows uses [1] then the &mut version of what @Peter Rabbit posted above likely doesn't work -- that example code used [0] though

Peter Rabbit (Jun 02 2020 at 19:49, on Zulip):

RalfJ said:

So if Windows uses [1] then the &mut version of what Peter Rabbit posted above likely doesn't work -- that example code used [0] though

Uhhhh, according to one MSDN article

If this parameter is specified, the caller must set DeviceInterfaceDetailData.cbSize to sizeof(SP_DEVICE_INTERFACE_DETAIL_DATA) before calling this function. The cbSize member always contains the size of the fixed part of the data structure, not a size reflecting the variable-length string at the end.

So this means that if the array uses [T; 0] instead of [T; 1], it changes the result of size_of() which in turn breaks usage of Windows API functions that require setting the cbSize member in this manner. What is your recommendation?

Chris Denton (Jun 02 2020 at 20:41, on Zulip):

For what it's worth, I've been using a new type struct that wraps a *mut u8. I use the "real" Win32 struct only to calculate the right offsets. The Win32 struct looks something like this. I cleaned up the types a bit so as not to scare off non-Windows users.

Peter Rabbit (Jun 02 2020 at 22:07, on Zulip):

Chris Denton said:

For what it's worth, I've been using a new type struct that wraps a *mut u8. I use the "real" Win32 struct only to calculate the right offsets. The Win32 struct looks something like this. I cleaned up the types a bit so as not to scare off non-Windows users.

That is horrifying and also impractical on a large scale.

Chris Denton (Jun 02 2020 at 22:18, on Zulip):

Tell me about it! I could use the Win32 struct for the other fields but, as far as I understand, the last one has to come from an actual pointer and that pointer can't come via a slice at any point. I mean, unless I only wanted one character...

Last update: Jun 05 2020 at 22:00UTC