Stream: general

Topic: intrinsics / uninitialized


Jethro (Jul 05 2019 at 15:39, on Zulip):

So I'm looking at https://github.com/rust-lang/rust/issues/62397

Jethro (Jul 05 2019 at 15:39, on Zulip):

I believe the only remaining use of uninitialized in the SGX target is this:

fn rdrand64() -> u64 {
    unsafe {
        let mut ret: u64 = crate::mem::uninitialized();
        for _ in 0..10 {
            if crate::arch::x86_64::_rdrand64_step(&mut ret) == 1 {
                return ret;
            }
        }
        rtabort!("Failed to obtain random data");
    }
}
Jethro (Jul 05 2019 at 15:40, on Zulip):

now this is interesting because if I understand correctly, just having a &mut to uninitialized memory is UB

Jethro (Jul 05 2019 at 15:40, on Zulip):

but the intrinsic requires a &mut

centril (Jul 05 2019 at 15:40, on Zulip):

just having a &mut to uninitialized memory is UB

correct

Jethro (Jul 05 2019 at 15:40, on Zulip):

@RalfJ @gnzlbg

simulacrum (Jul 05 2019 at 15:41, on Zulip):
src/libstd/sys/sgx/mutex.rs
71:    pub const fn uninitialized() -> ReentrantMutex {

src/libstd/sys/sgx/mod.rs
145:            let mut ret: u64 = crate::mem::uninitialized();

src/libstd/sys/sgx/abi/usercalls/mod.rs
19:        let mut userbuf = alloc::User::<[u8]>::uninitialized(total_len);
55:        let mut userbuf = alloc::User::<[u8]>::uninitialized(total_len);
92:        let mut local = alloc::User::<ByteBuffer>::uninitialized();
107:        let mut bufs = alloc::User::<[ByteBuffer; 2]>::uninitialized();
127:        let mut bufs = alloc::User::<[ByteBuffer; 2]>::uninitialized();

src/libstd/sys/sgx/abi/usercalls/alloc.rs
187:    // This function returns memory that is practically uninitialized, but is
249:    pub fn uninitialized() -> Self {
257:    pub fn uninitialized(n: usize) -> Self {
gnzlbg (Jul 05 2019 at 15:41, on Zulip):

correct

not necessarily

gnzlbg (Jul 05 2019 at 15:41, on Zulip):

It hasn't been decided yet.

centril (Jul 05 2019 at 15:41, on Zulip):

@gnzlbg it is UB until decided otherwise.

centril (Jul 05 2019 at 15:42, on Zulip):

or "unspecified"; in any case it means "don't do it".

Jethro (Jul 05 2019 at 15:42, on Zulip):

@simulacrum only one of those is std::mem::uninitialized

simulacrum (Jul 05 2019 at 15:42, on Zulip):

Yeah, looking at it the other bits are "interesting" and I want to know more but I don't have the time right now :)

gnzlbg (Jul 05 2019 at 15:43, on Zulip):

I don't know if churning from u64 to MaybeUninit<u64> is worth it until its decided

gnzlbg (Jul 05 2019 at 15:44, on Zulip):

like in @Jethro they are creating a reference and coercing it to a raw pointer directly

gnzlbg (Jul 05 2019 at 15:44, on Zulip):

so depending on what happens with &raw, their code might not be creating a reference at all

Jethro (Jul 05 2019 at 15:44, on Zulip):

@gnzlbg ah but am I? because as I said the intrinsic takes a &mut, not a *mut

gnzlbg (Jul 05 2019 at 15:44, on Zulip):

that intrinsic takes a &mut ?

Jethro (Jul 05 2019 at 15:44, on Zulip):

https://doc.rust-lang.org/core/arch/x86_64/fn._rdrand64_step.html

centril (Jul 05 2019 at 15:45, on Zulip):

https://doc.rust-lang.org/nightly/src/core/up/stdsimd/crates/core_arch/src/x86_64/rdrand.rs.html#26-30

centril (Jul 05 2019 at 15:45, on Zulip):
pub unsafe fn _rdrand64_step(val: &mut u64) -> i32 {
    let (v, flag) = x86_rdrand64_step();
    *val = v;
    flag
}
gnzlbg (Jul 05 2019 at 15:45, on Zulip):

yup, you are right

gnzlbg (Jul 05 2019 at 15:46, on Zulip):

so the question is whether the reference is invalid, which depends on whether the u64 is invalid, which we don't know

gnzlbg (Jul 05 2019 at 15:46, on Zulip):

the reference is dereferenceable, aligned, noalias, etc.

centril (Jul 05 2019 at 15:46, on Zulip):

@Jethro would there be any problem with doing let mut ret: u64 = 0; ?

gnzlbg (Jul 05 2019 at 15:46, on Zulip):

independently of whether the u64 is uninitialized or not, so all of that is ok from LLVM's pov

Jethro (Jul 05 2019 at 15:47, on Zulip):

@centril not really, you're just relying on the optimizer to get rid of it then

centril (Jul 05 2019 at 15:47, on Zulip):

that's what I'd do then

centril (Jul 05 2019 at 15:49, on Zulip):

and possibly change the signature of that intrinsic to take a raw pointer if feasible

centril (Jul 05 2019 at 15:50, on Zulip):

@gnzlbg Why doesn't _rdrand64_step just return -> (u64, i32)?

gnzlbg (Jul 05 2019 at 15:50, on Zulip):

because C doesn't have multiple return arguments

Jethro (Jul 05 2019 at 15:50, on Zulip):

because it needs to match the Intel-defined intrinsic

nagisa (Jul 05 2019 at 15:50, on Zulip):

@centril because it follows the C signature.

centril (Jul 05 2019 at 15:51, on Zulip):

but this function is to be called from Rust, not C... so why do we need to follow C's poor design here?

gnzlbg (Jul 05 2019 at 15:52, on Zulip):

Because you didn't offered to write an RFC designing 10k intrinsics, with rationale for each, etc.

gnzlbg (Jul 05 2019 at 15:52, on Zulip):

;)

simulacrum (Jul 05 2019 at 15:52, on Zulip):

all arch intrinsics match C

centril (Jul 05 2019 at 15:52, on Zulip):

I know they do; but to me this feels like "reasons"

gnzlbg (Jul 05 2019 at 15:52, on Zulip):

the only reason is the one i mentioned, 10k intrinsics

gnzlbg (Jul 05 2019 at 15:53, on Zulip):

proposing an API for a single std lib function takes months, do the math

simulacrum (Jul 05 2019 at 15:53, on Zulip):

I think over time we can definitely expose APIs over them that are better for widely used functions

gnzlbg (Jul 05 2019 at 15:53, on Zulip):

it was either do them the Rust way, and never have any intrinsics, or shipping what C does, two years ago

simulacrum (Jul 05 2019 at 15:53, on Zulip):

(and there may already be work doing that, not sure)

centril (Jul 05 2019 at 15:53, on Zulip):

@gnzlbg but the code for this intrinsic was not machine generated?

Jethro (Jul 05 2019 at 15:53, on Zulip):

the signature is machine-verified to match the C version

gnzlbg (Jul 05 2019 at 15:54, on Zulip):

none of them are

gnzlbg (Jul 05 2019 at 15:54, on Zulip):

too many corner cases for machine generation

gnzlbg (Jul 05 2019 at 15:54, on Zulip):

we used to do that, it didn't work

centril (Jul 05 2019 at 15:54, on Zulip):

aha; seems strange for &mut to be acceptable in the eyes of the machine verifier

Jethro (Jul 05 2019 at 15:55, on Zulip):

anyway, the C version here uses a pointer (obviously)

gnzlbg (Jul 05 2019 at 15:55, on Zulip):

its accepted on FFI where pointers are

gnzlbg (Jul 05 2019 at 15:55, on Zulip):

its ABi compatible with C

Jethro (Jul 05 2019 at 15:55, on Zulip):

so I agree it seems strange that this takes a reference

gnzlbg (Jul 05 2019 at 15:55, on Zulip):

a pointer can be null, a reference can't

gnzlbg (Jul 05 2019 at 15:55, on Zulip):

you can't pass this API a null pointer

centril (Jul 05 2019 at 15:55, on Zulip):

Maybe we didn't have NonNull<T> back then

gnzlbg (Jul 05 2019 at 15:56, on Zulip):

AFAICT, &mut u64 is ok even if u64 is uninitialized

Jethro (Jul 05 2019 at 15:56, on Zulip):

captain hindsight: should MaybeUninit::as_mut_ptr have returned NonNull<T>?

gnzlbg (Jul 05 2019 at 15:57, on Zulip):

that's another wrapper you need to unwrap, what's the win ?

gnzlbg (Jul 05 2019 at 15:57, on Zulip):

you can always put it in a NonNull<T> yourself, if you need to store it somewhere where you need a niche

Jethro (Jul 05 2019 at 15:58, on Zulip):

well say the rdrand intrinsic would've taken NonNull<u64>, then you have to jump through hoops to call it on uninitialized memory

centril (Jul 05 2019 at 15:58, on Zulip):

@gnzlbg not if references require that the referent is allocated and valid?

gnzlbg (Jul 05 2019 at 15:59, on Zulip):

even if referefent is allocated and valid

Jethro (Jul 05 2019 at 15:59, on Zulip):

unwrapping NonNull is easy, creating one is "hard"

gnzlbg (Jul 05 2019 at 15:59, on Zulip):

depends on what it means for u64 to be valid

centril (Jul 05 2019 at 15:59, on Zulip):

@gnzlbg so you are assuming instead that integers may be uninit?

gnzlbg (Jul 05 2019 at 16:00, on Zulip):

its an assumption a lot of code, like the above, relies on

gnzlbg (Jul 05 2019 at 16:00, on Zulip):

otherwise all that code would be UB

gnzlbg (Jul 05 2019 at 16:00, on Zulip):

we could do that, but nobody has managed to find a good argument for doing that

gnzlbg (Jul 05 2019 at 16:00, on Zulip):

we don't gain any optmization potential from doing so

gnzlbg (Jul 05 2019 at 16:01, on Zulip):

and we don't get any better tools for checking UB

nagisa (Jul 05 2019 at 16:01, on Zulip):

@Jethro I have confirmed that tmp = 0 has no negative effect on generated code.

nagisa (Jul 05 2019 at 16:02, on Zulip):

Even at opt-level=0

RalfJ (Jul 05 2019 at 17:53, on Zulip):

we could do that, but nobody has managed to find a good argument for doing that

wait, wasn't it you who argued *against* uninitialized integers...?

RalfJ (Jul 05 2019 at 17:53, on Zulip):

@gnzlbg I am very surprised to see you on that side of the fence now^^

RalfJ (Jul 05 2019 at 17:53, on Zulip):

also notice that for the SGX snippet above, the reference question is irrelevant. it starts by creating an uninitialized integer. if that's okay by-value, it's certainly okay by-ref.

RalfJ (Jul 05 2019 at 17:54, on Zulip):

but indeed I am irritated by the use of &mut as an out-pointer here

Jethro (Jul 05 2019 at 17:55, on Zulip):

also notice that for the SGX snippet above, the reference question is irrelevant. it starts by creating an uninitialized integer. if that's okay by-value, it's certainly okay by-ref.

but it doesn't need to be by value

RalfJ (Jul 05 2019 at 17:55, on Zulip):

generally in Rust fn(x: &mut T) is basically the same as fn(x: T) -> T but more efficient -- in-place modification instead of pass-through. So from that perspective, the intrinsic definitely should take a raw pointer.

Jethro (Jul 05 2019 at 17:55, on Zulip):

I can use MaybeUninit

Jethro (Jul 05 2019 at 17:55, on Zulip):

but not without creating a ref

RalfJ (Jul 05 2019 at 17:55, on Zulip):

I can use MaybeUninit

sure

RalfJ (Jul 05 2019 at 17:56, on Zulip):

but not without creating a ref

because of the type of the intrinsic. gotcha.
I thought you were discussing the original code. if unintiialized integers are okay, that code is okay.

Jethro (Jul 05 2019 at 17:56, on Zulip):

no that's not my issue

Jethro (Jul 05 2019 at 17:56, on Zulip):

someone else in this thread might have been discussing that issue, but not me ;)

gnzlbg (Jul 05 2019 at 23:00, on Zulip):

generally in Rust fn(x: &mut T) is basically the same as fn(x: T) -> T but more efficient -- in-place modification instead of pass-through. So from that perspective, the intrinsic definitely should take a raw pointer.

in general it isn't more efficient - LLVM is better at optimizing values, than pointer accesses across functions

gnzlbg (Jul 05 2019 at 23:01, on Zulip):

in rust just always prefer fn(T) -> T over fn(&mut T)

gnzlbg (Jul 05 2019 at 23:01, on Zulip):

its not only simpler, it helps with optimizations

simulacrum (Jul 05 2019 at 23:01, on Zulip):

(and readability often, IMO)

gnzlbg (Jul 05 2019 at 23:01, on Zulip):

for the intrinsics, being able to use them to initialize invalid memory might have been useful when they were introduced to C in the early 90s

gnzlbg (Jul 05 2019 at 23:02, on Zulip):

right now, it kind of doesn't really matter, LLVM lowers it all to the same thing

gnzlbg (Jul 05 2019 at 23:02, on Zulip):

if the intrinsic can't be used to initialize invalid memory, then we'd need to change its signature

gnzlbg (Jul 05 2019 at 23:03, on Zulip):

ideally we'd like something in between pointer and reference

gnzlbg (Jul 05 2019 at 23:04, on Zulip):

not requiring validity, but guaranteeing alignment, and dereferenceable

RalfJ (Jul 11 2019 at 15:23, on Zulip):

(and readability often, IMO)

so you are saying a version of push that passes ownership through would be mroe readable? or is that one of the exceptions?

RalfJ (Jul 11 2019 at 15:23, on Zulip):

I would actually say that &mut T is frequently a more readable version of T -> T ;)

simulacrum (Jul 11 2019 at 15:24, on Zulip):

hm, you mean Vec::push being defined as Vec::push(Self, T) -> Self basically?

RalfJ (Jul 11 2019 at 15:34, on Zulip):

that's what you basically said is more readable

RalfJ (Jul 11 2019 at 15:34, on Zulip):

fn(T) -> T instead of fn(&mut T)

simulacrum (Jul 11 2019 at 15:36, on Zulip):

hm, so I disagree a little

simulacrum (Jul 11 2019 at 15:37, on Zulip):

I guess it depends

RalfJ (Jul 11 2019 at 15:39, on Zulip):

you disagree with yourself? ;)
Or you disagree with my statement that this is a consequence of your statement?

Jake Goulding (Jul 11 2019 at 15:40, on Zulip):

I don't know about readable, but having a &mut to a value is more common than ownership

simulacrum (Jul 11 2019 at 15:41, on Zulip):

@RalfJ I think I sort of feel that it's different -- fn(A, B) -> A vs. fn(&mut A, B) and fn(A) -> A vs. fn(&mut A)

RalfJ (Jul 11 2019 at 15:42, on Zulip):

@simulacrum interesting

centril (Jul 11 2019 at 15:57, on Zulip):

I do feel vec.push(0).push(1) is more readable personally

centril (Jul 11 2019 at 15:58, on Zulip):

I really fancy the fluent APIs; sadly push doesn't return &mut Self

simulacrum (Jul 11 2019 at 15:58, on Zulip):

I think things change when you "builder" pattern (not quite the right name)

centril (Jul 11 2019 at 15:58, on Zulip):

The session types idiom is particularly nice

simulacrum (Jul 11 2019 at 15:59, on Zulip):

sure -- &mut T is in some sense less powerful in that way

simulacrum (Jul 11 2019 at 15:59, on Zulip):

well, for some T anyway

simulacrum (Jul 11 2019 at 16:00, on Zulip):

I guess A -> A is equivalent but A<T> -> A<U> feels where T, U are zero-sized session types feels very similar

simulacrum (Jul 11 2019 at 16:00, on Zulip):

(though very different from a type perspective)

Last update: Nov 21 2019 at 23:35UTC