Stream: t-lang/wg-unsafe-code-guidelines

Topic: data-type-layout


nikomatsakis (Aug 09 2018 at 10:09, on Zulip):

Spinning off from @RalfJ's comments in data-invariants, I do want to follow-up on @alercah's question here:

is it okay to store a repr(rust) value on disk and read it back out to memory? do we offer guarantees that this works across identically defined types? across different compiler versions? can the compiler randomly reassign discriminants for enums every time the program runs?

I don't have an answer specifically to it, but I think it would be really great to try and make some progress on trying to document what things user's can rely on when it comes to memory layout. I think a good start would be to try and enumerate the questions even and write them down -- ideally in a commit to the rust-lang-rfcs repo. (Maybe along with some notes about the pros/cons)

@alercah are you interested in maybe taking point on that?

I'd like to organize an initial "focused meeting" around this topic but I'm not sure how that should work best.

(Another related question: "if two repr(rust) structs have the same fields of the same type, do they have the same layout? repr(C)?")

alercah (Aug 12 2018 at 19:15, on Zulip):

I'm not sure I have the ability to make commitments right now, but as long as I'm interested I can keep thinking on this. ;) repr(C) should have an explicitly defined ABI, so that it can be safely used with FFI in both directions. This may in some cases just be a reference to the platform ABI, but we want to ensure that e.g. struct padding is guaranteed to work as expected. It would also make it possible for users to understand when they can safely read bytes into repr(C), valuable for any binary protocol.

alercah (Aug 12 2018 at 19:23, on Zulip):

repr(rust), on the other hand, I can see reasons to even forbid it on types with the same layout in the same program. Especially for types with private fields, the compiler could conceivably try to optimize the most frequently accessed field into offset 0, and this may vary from type to type.

alercah (Aug 12 2018 at 19:28, on Zulip):

And that means we give the least guarantee for now, so I'm happy to propose this as our guideline for now---if you want reliable layout, you could always use repr(C). We could also consider adding a repr(stable) which is guaranteed to be the same for identical definitions, even across platforms where the C ABI may vary.

alercah (Aug 12 2018 at 19:31, on Zulip):

(Since repr(C) enum layout is currently fully specified in a way that should be platform-invariant I think, perhaps we should check that it is actually compatible with C ABI enums on all/most platforms and see if it makes sense to retract that guarantee slightly? Unsure here)

RalfJ (Aug 22 2018 at 19:42, on Zulip):

Fully agreed with all of that, @alercah :D

nikomatsakis (Aug 23 2018 at 11:57, on Zulip):

So I want to talk about the mechanics of running this conversation.

@RalfJ had the idea, which I like, of opening up various issues on https://github.com/rust-rfcs/unsafe-code-guidelines/, devoted to various type constructors.

wycats had the idea, which I also like, of writing out a lot of things that we believe to be uniformly agreed upon, and using that to try to find the contours of the rules. e.g., I believe that Option<&T> should be represented always a pointer (thin or fat). The idea here is that there are individual questions which have (maybe?) known answers, but if we go from those to general principles things get murky.

What both of these say to me is that we basically need some useful "seeds" for discussion.

I think there are a few questions I would like to be able to answer:

alercah (Aug 23 2018 at 16:37, on Zulip):

Related question: Is bool guaranteed to be laid out as bit-patterns 0 and 1 for true and false? Is its size guaranteed?

alercah (Aug 23 2018 at 16:39, on Zulip):

I think my personal inclination is to say that no-repr and repr(Rust) are always equivalent and always unspecified. Any additional guarantees (e.g. Option<&T>) should be promised by adding an attribute which guarantees a specific layout.

alercah (Aug 23 2018 at 16:42, on Zulip):

This means that someone can write an isomorphic enum and opt into the promised layout, but we don't have to make any general promises. This would in turn imply to me that we should probably at least start by minimizing commitment. I think we should promise that 1-element tuples, including newtype structs but not including enum variants, are laid out as the underlying type.

kennytm (Aug 23 2018 at 16:43, on Zulip):

Is bool guaranteed to be laid out as bit-patterns 0 and 1 for true and false? Is its size guaranteed?

oh no not #46156 / #46176 again

alercah (Aug 23 2018 at 16:51, on Zulip):

Ok, so looks like we do want to promise 0 and 1. :) It seems to me that C allows you to put a 2 into a _Bool, whereas Rust does not?

alercah (Aug 23 2018 at 16:54, on Zulip):

Ahhh. Looks like it's even better. It's unspecified whether or not it's UB.

alercah (Aug 23 2018 at 16:55, on Zulip):

It might not be.

alercah (Aug 23 2018 at 16:55, on Zulip):

But the compiler isn't required to document whether or not it's UB. :rofl:

Nicole Mazzuca (Aug 23 2018 at 17:15, on Zulip):

That's not how it works, to be clear

Nicole Mazzuca (Aug 23 2018 at 17:16, on Zulip):

There are two values of type _Bool; 0 and 1. An object of type _Bool must have one of these values, or else it is UB. It is unspecified what the bit patterns of these values are, however common implementations use 0b00000000 and 0b00000001.

Nicole Mazzuca (Aug 23 2018 at 17:18, on Zulip):

@alercah it is guaranteed that enum { V1(reference), V2 } and enum { V1, V2(reference) } are ABI-compatible with pointers

Nicole Mazzuca (Aug 23 2018 at 17:19, on Zulip):

I don't believe it's guaranteed that struct { T } is ABI-compatible with T; that's what #[repr(transparent)] is for

alercah (Aug 23 2018 at 17:19, on Zulip):

The C standard requires that 0 integer values have the all-0s bit pattern. But it says that using an out-of-range integer value is UB only if it's a trap representation, and the existence of trap representations is unspecified.

And yeah, I forgot about repr(transparent).

alercah (Aug 23 2018 at 17:21, on Zulip):

So 0b00000000 is guaranteed to be _False; 0b00000001 may be a trap, may be _True, or may just cause other unspecified weirdness (e.g. it might be true in conditionals but not behave like _True does in bitwise operations).

alercah (Aug 23 2018 at 17:24, on Zulip):

Oh wait. hm. that's not quite right, now with the new 2s-complement rules.

Nicole Mazzuca (Aug 23 2018 at 17:25, on Zulip):

that's not true - by 6.2.6/5, "Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined."

alercah (Aug 23 2018 at 17:26, on Zulip):

Yes, but that doesn't imply uniqueness of values. In particular, see 6.2.6.2/1 and 6.2.6.2/5.

Nicole Mazzuca (Aug 23 2018 at 17:26, on Zulip):

basically, if you store a bitpattern that is not either _Bool(0) or _Bool(1) into memory, and then read it out as a _Bool, you get undefined behavior

alercah (Aug 23 2018 at 17:27, on Zulip):

So _Bool is guarantee to have one bit, somewhere, which determines if it is 0 or 1. It's not guaranteed to be the low bit. It is unspecified whether any combination of the other bits forms a trap representation, but if they do not, only the governing bit matters.

alercah (Aug 23 2018 at 17:28, on Zulip):

Additionally, all-zeroes is guaranteed to be a valid 0 representation. But e.g. the only valid 1 representation might be all-1s.

Nicole Mazzuca (Aug 23 2018 at 17:28, on Zulip):

ah, I see what you're saying. Yeah

alercah (Aug 23 2018 at 17:29, on Zulip):

hm the draft I'm looking at predates the 2s complement rule though. Let me see if I can find that one.

Nicole Mazzuca (Aug 23 2018 at 17:29, on Zulip):

However, I'd rather have bool be implementation defined, because compat with _Bool is more important, imo, than compat with u8

Nicole Mazzuca (Aug 23 2018 at 17:30, on Zulip):

and if you assume compat with u8, you should just stick to normal platforms

nikomatsakis (Aug 23 2018 at 17:31, on Zulip):

I think my personal inclination is to say that no-repr and repr(Rust) are always equivalent and always unspecified. Any additional guarantees (e.g. Option<&T>) should be promised by adding an attribute which guarantees a specific layout.

I feel this is too strong and impractical, but we can have it out "on the issue tracker" =) It comes down to similar reasoning from bool land, though.

Nicole Mazzuca (Aug 23 2018 at 17:32, on Zulip):

What's too strong and impractical about it? I feel like repr(Rust) and no-repr are naturally the same thing, i.e., "ABI doesn't matter"

Nicole Mazzuca (Aug 23 2018 at 17:33, on Zulip):

we've already promised Option<&T>, tho, so that definitely can't be rolled back (and why would we want to?)

nikomatsakis (Aug 23 2018 at 17:49, on Zulip):

I was specifically referring to things lke "how &T is represented"

nikomatsakis (Aug 23 2018 at 17:49, on Zulip):

and Option<&T>

nikomatsakis (Aug 23 2018 at 17:50, on Zulip):

not the idea of "say nothing" and #[repr(rust)] being equivalent, which I totally agree with

nikomatsakis (Aug 23 2018 at 17:50, on Zulip):

but basically I don't think we can say "#[repr(rust)] means totally impl defined"

alercah (Aug 23 2018 at 17:50, on Zulip):

Do we actually guarantee that all isomorphic-to-Option enums are laid out that way? Or is it only Option itself?

nikomatsakis (Aug 23 2018 at 17:50, on Zulip):

that is an interesting question that I wanted to highlight for later :)

nikomatsakis (Aug 23 2018 at 17:51, on Zulip):

(however, I'd probably be willing to make the "isomorphic" guarantee)

nikomatsakis (Aug 23 2018 at 17:51, on Zulip):

(but I think saying Option might suffice)

nikomatsakis (Aug 23 2018 at 17:51, on Zulip):

(at least to start)

alercah (Aug 23 2018 at 17:51, on Zulip):

I think making some guarantees for primitives is ok, because we don't currently have a mechanism to define "new" ones. For user-defined types, including Option, it may make more sense to move the guarantees to repr.

nikomatsakis (Aug 23 2018 at 17:51, on Zulip):

yes, I could see that perhaps one declares a specific repr (and option declares that)

Matthew Jasper (Aug 23 2018 at 17:53, on Zulip):

I thought #[repr(rust)]doesn't exist.

nikomatsakis (Aug 23 2018 at 17:55, on Zulip):

heh I don't remember but it seems like it would mean whatever "the default is" if it did ;)

Jake Goulding (Aug 23 2018 at 17:56, on Zulip):

"The repr formerly known as rust"

alercah (Aug 23 2018 at 17:56, on Zulip):

repr(rust) would be useful if we let you tag an entire module/block with repr(C) to use as a default

Jake Goulding (Aug 23 2018 at 17:57, on Zulip):

(but I think saying Option might suffice)

As a casual observer, I think it's a lot nicer if it is the isomorphic enums. I know nothing of the tradeoffs, so I'm mostly thinking from the teaching side of it.

Nicole Mazzuca (Aug 23 2018 at 18:01, on Zulip):

the nomicon definitely specifies "isomorphic to option"

at least, I'm pretty sure it did at one time...

Nicole Mazzuca (Aug 23 2018 at 18:05, on Zulip):

ah, here we go:

As a special case, an enum is eligible for the "nullable pointer optimization" if it contains exactly two variants, one of which contains no data and the other contains a field of one of the non-nullable types listed above. This means no extra space is required for a discriminant; rather, the empty variant is represented by putting a null value into the non-nullable field. This is called an "optimization", but unlike other optimizations it is guaranteed to apply to eligible types.

simulacrum (Aug 23 2018 at 18:08, on Zulip):

FWIW, I think we currently do more than that, but I might be wrong -- e.g. using extra space in bool etc

Nicole Mazzuca (Aug 23 2018 at 18:10, on Zulip):

yeah, those are optimizations; NPO is a guarantee

nikomatsakis (Aug 23 2018 at 18:13, on Zulip):

so basically I think these things can be separated:

nikomatsakis (Aug 23 2018 at 18:14, on Zulip):

there is the concept of a niche, and then there which types "offer" those niches

nikomatsakis (Aug 23 2018 at 18:14, on Zulip):

(I think that's the terminology @eddyb used)

Nicole Mazzuca (Aug 23 2018 at 18:16, on Zulip):

@nikomatsakis what do you mean?

nikomatsakis (Aug 23 2018 at 18:23, on Zulip):

a "niche" is a place to "store" a 1-bit discriminant; some types have them, some types don't

nikomatsakis (Aug 23 2018 at 18:23, on Zulip):

essentially, it means "some invalid bit pattern"

nikomatsakis (Aug 23 2018 at 18:23, on Zulip):

so we might guarantee that &T has a niche, but not say one way or the other about bool, etc

nikomatsakis (Aug 23 2018 at 18:23, on Zulip):

I guess it's no different than what you were saying

nikomatsakis (Aug 23 2018 at 18:23, on Zulip):

just saying you can kind of break it down into a property of types

nikomatsakis (Aug 23 2018 at 18:24, on Zulip):

(obviously one can also generalize the concept beyond 1 bit; I don't believe rustc does today)

Nicole Mazzuca (Aug 23 2018 at 18:24, on Zulip):

ah, sure! I'm just speaking from a "standardization"/ABI-guarantee standpoint for FFI, basically.

RalfJ (Aug 23 2018 at 20:45, on Zulip):

we've already promised Option<&T>,

we have promised Option but we have not promised the same for when you define your own Option

RalfJ (Aug 23 2018 at 20:45, on Zulip):

hm seems the nomicon did oops^^

RalfJ (Aug 23 2018 at 20:46, on Zulip):

people have brought up that different isomorphic structs might get different layout based on PGO

RalfJ (Aug 23 2018 at 20:46, on Zulip):

so we might not want to commit too strongly on that^^

RalfJ (Aug 23 2018 at 20:46, on Zulip):

if we say "isomorphic to option", does that include type MyOption<T> = Result<T, ()>?

alercah (Aug 23 2018 at 21:26, on Zulip):

Maybe! :P

Nicole Mazzuca (Aug 24 2018 at 16:18, on Zulip):

no - it says specifically enum { V1, V2(non-nullable-pointer) } or enum { V1(non-nullable-pointer), V2 }

Last update: Nov 19 2019 at 17:40UTC