Stream: t-lang/wg-unsafe-code-guidelines

Topic: representation of references


gnzlbg (Nov 29 2018 at 15:38, on Zulip):

@Alan Jeffrey I don't understand the latest comments in the representation of references

gnzlbg (Nov 29 2018 at 15:39, on Zulip):

Alignment is part of representation, but whether Option::<&T>::None is "equal" to ptr::null::<*const T>() is part of validity

Alan Jeffrey (Nov 29 2018 at 15:39, on Zulip):

You mean https://github.com/rust-rfcs/unsafe-code-guidelines/issues/16#issuecomment-442877094?

gnzlbg (Nov 29 2018 at 15:39, on Zulip):

I don't know if there is any document defining these terms yet

gnzlbg (Nov 29 2018 at 15:40, on Zulip):

I'm unsure that I understood them correctly

Alan Jeffrey (Nov 29 2018 at 15:40, on Zulip):

Indeed, we're hitting issues about basic definitions :/

gnzlbg (Nov 29 2018 at 15:40, on Zulip):

AFAIK there are at least three terms we use here: representation, validity, and safety

gnzlbg (Nov 29 2018 at 15:41, on Zulip):

representation answer the question: what shape do the bits that the type occupies in memory have

Alan Jeffrey (Nov 29 2018 at 15:41, on Zulip):

Certainly being non-zero is part of representation, but I'm not sure about alignment.

RalfJ (Nov 29 2018 at 15:41, on Zulip):

aligned of the type is part of representation, but whether the value of type &T must be aligned is part of validity

gnzlbg (Nov 29 2018 at 15:41, on Zulip):

that's basically, what's the type's size, what's its alignment

RalfJ (Nov 29 2018 at 15:41, on Zulip):

Certainly being non-zero is part of representation, but I'm not sure about alignment.

no, non-zero is also validity

RalfJ (Nov 29 2018 at 15:42, on Zulip):

repr is: align+size of the type, and then the offsets of its fields. and ABI stuff.

gnzlbg (Nov 29 2018 at 15:42, on Zulip):

for DSTs representation is more complicated, and attributes like repr(C, packed, align, simd, ...) alter all of this

Alan Jeffrey (Nov 29 2018 at 15:42, on Zulip):

@RalfJ but non-zero impacts repr, due to Option<T> optimization.

gnzlbg (Nov 29 2018 at 15:42, on Zulip):

ah yes, ABI / calling convention stuff is probably also part of representation

gnzlbg (Nov 29 2018 at 15:42, on Zulip):

@Alan Jeffrey no it does not, that impacts validity

RalfJ (Nov 29 2018 at 15:42, on Zulip):

@Alan Jeffrey enums exploit the validity invariant in how their layout is computed, yes

RalfJ (Nov 29 2018 at 15:42, on Zulip):

the discussions are not entirely separate

gnzlbg (Nov 29 2018 at 15:43, on Zulip):

The size, alignment, of Option<&T> does not depend on which value ptr::null has

RalfJ (Nov 29 2018 at 15:43, on Zulip):

but there was so much to talk about in terms of field offsets for structs and tranmuting arrays to homogenous tuples and such things that we decided to split them up

Alan Jeffrey (Nov 29 2018 at 15:43, on Zulip):

@RalfJ so for you, repr can be based on validity?

RalfJ (Nov 29 2018 at 15:43, on Zulip):

@Alan Jeffrey it's a cyclic dependency

gnzlbg (Nov 29 2018 at 15:43, on Zulip):

validity: which values are the bits of the representation allowed to take

Alan Jeffrey (Nov 29 2018 at 15:43, on Zulip):

@RalfJ ugh

RalfJ (Nov 29 2018 at 15:43, on Zulip):

validity of a struct depends on repr, i.e., on the offsets of the fields

Alan Jeffrey (Nov 29 2018 at 15:43, on Zulip):

I thought validity depended on repr, but not vice versa.

gnzlbg (Nov 29 2018 at 15:44, on Zulip):

repr: what's the shape of the bits, validity: which values can the bits take

RalfJ (Nov 29 2018 at 15:44, on Zulip):

yeah no enum optimizations make these form a cycle

gnzlbg (Nov 29 2018 at 15:44, on Zulip):

depending on padding, etc. , repr interacts with validity

gnzlbg (Nov 29 2018 at 15:44, on Zulip):

but most of the time one can treat these separate

gnzlbg (Nov 29 2018 at 15:45, on Zulip):

so if we say: the bit values of ptr::null() are not a valid bit pattern for &T, that's validity, and because Option is an enum, and enums can exploit invalid bit patterns (called niches), that Option<&T>::None has the same bit pattern as ptr::null() follows from all of that

Alan Jeffrey (Nov 29 2018 at 15:46, on Zulip):

I'd been interpreting them as repr includes which bitstrings are valid just looking at the bitstring, no looking at the rest of memory.

gnzlbg (Nov 29 2018 at 15:46, on Zulip):

@RalfJ I think we need a half a page definition of these terms somewhere in the reference

Alan Jeffrey (Nov 29 2018 at 15:46, on Zulip):

Validity is the invariant including the state of the rest of memory.

gnzlbg (Nov 29 2018 at 15:47, on Zulip):

I am not sure, but I think you might be talking about safety

gnzlbg (Nov 29 2018 at 15:48, on Zulip):

validity is a property of the type, it does not interact with memory beyond the type AFAICT

Alan Jeffrey (Nov 29 2018 at 15:49, on Zulip):

@RalfJ is this the same as your use of safety vs validity?

gnzlbg (Nov 29 2018 at 15:49, on Zulip):

safety would be, e.g., pre-conditions on unsafe methods such that safe Rust calling safe methods remains safe, or something like that

gnzlbg (Nov 29 2018 at 15:50, on Zulip):

e.g. constructing a &str using from_utf8_unchecked requires that the byte string passed as argument actually is valid UTF-8 for safety

rkruppe (Nov 29 2018 at 15:51, on Zulip):

@Alan Jeffrey it's definitely not the same approach to defining the terms, and if they happen to coincide it's by accident. but i think the way things are going, not even that seems true -- the validity invariants mostly/likely do not look at other memory. (e.g. reference validity might not look at the memory the reference points at)

Alan Jeffrey (Nov 29 2018 at 15:51, on Zulip):

/me is rereading https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html

RalfJ (Nov 29 2018 at 15:51, on Zulip):

@RalfJ I think we need a half a page definition of these terms somewhere in the reference

I was actually thinking that we merge these two discussions in the text after we are done with both

RalfJ (Nov 29 2018 at 15:52, on Zulip):

but yes we also need definitions^^

Alan Jeffrey (Nov 29 2018 at 15:52, on Zulip):

@RalfJ sorry, which two discussions?

gnzlbg (Nov 29 2018 at 15:52, on Zulip):

having to go through the couple of blog posts to figure out all the definitions is maybe a too high a barrier of entry

RalfJ (Nov 29 2018 at 15:52, on Zulip):

repr and validity

RalfJ (Nov 29 2018 at 15:53, on Zulip):

the distinction between validity and safety for me is that validity is what must always hold, even in unsafe code, while safety is what safe code can rely on (but unsafe code is allowed to temporarily violate)

Alan Jeffrey (Nov 29 2018 at 15:53, on Zulip):

@RalfJ do we have a name for "correctness of a bitstring without looking at the rest of memory"?

RalfJ (Nov 29 2018 at 15:53, on Zulip):

references must always be dereferencable, but validity actually kind-of can depend on memory.

Alan Jeffrey (Nov 29 2018 at 15:54, on Zulip):

(e.g. &T being T-aligned, bool being just 0 and 1 etc.)

gnzlbg (Nov 29 2018 at 15:54, on Zulip):

@Alan Jeffrey what do you mean by "correctness" ?

RalfJ (Nov 29 2018 at 15:54, on Zulip):

@Alan Jeffrey I don't think we have that concept. there is no situation in which a reference must be aligned but may be dangling.

gnzlbg (Nov 29 2018 at 15:54, on Zulip):

If I have a struct PrimeInt(i32); would that include a bit representation that does not represent a prime number ?

RalfJ (Nov 29 2018 at 15:55, on Zulip):

If I have a struct PrimeInt(i32); would that include a bit representation that does not represent a prime number ?

validity is compiler-defined, you cannot just pick it

RalfJ (Nov 29 2018 at 15:55, on Zulip):

safety OTOH can be user-defined if you have custom invariants

gnzlbg (Nov 29 2018 at 15:55, on Zulip):

that's why i'm asking @Alan Jeffrey what they mean by "correctness"

gnzlbg (Nov 29 2018 at 15:55, on Zulip):

depending on what they mean, they might be talking about validity, or safety, or both =/

Alan Jeffrey (Nov 29 2018 at 15:56, on Zulip):

Good question, I guess (as with safety) there's the invariants enforced by the language, and then possibly stricter ones that are user-defined.

RalfJ (Nov 29 2018 at 15:57, on Zulip):

@Alan Jeffrey it is possible to phrase things a bit differently such that validity does not depend on memory (or rather, it only depends in ways that can never be invalidated once something is valid: &dyn Trait must have a proper vtable, that is validity, but once true it can never be non-true because vtables live in constant memory). then the fact that references are dereferencable follows from "retagging" as part of Stacked Borrows. I am not sure if that helps, though...

Alan Jeffrey (Nov 29 2018 at 15:58, on Zulip):

As a concrete instance, the statement that bool contains only 0 and 1, is that validity?

RalfJ (Nov 29 2018 at 15:58, on Zulip):

so, I don't think that it is useful to have a notion of "there exists a memory for which this is valid, whether or not that's the current memory".

RalfJ (Nov 29 2018 at 15:58, on Zulip):

As a concrete instance, the statement that bool contains only 0 and 1, is that validity?

yes.

gnzlbg (Nov 29 2018 at 15:58, on Zulip):

Which bits a type is allowed to take without being undefined behavior is specified by validity. Which bits a type is allowed to take such that safe code remains safe is specified by safety - unsafe code can break safety temporarily as long as safe Rust code cannot exploit it in the meantime to run into undefined behavior - unsafe code cannot break validity, doing so is undefined behavior

RalfJ (Nov 29 2018 at 15:59, on Zulip):

bool's validity happens to not depend on memory

RalfJ (Nov 29 2018 at 15:59, on Zulip):

&T's validity, however, does depend on memory

gnzlbg (Nov 29 2018 at 15:59, on Zulip):

you cannot create a bool with a value of 3 ever, not even in unsafe code - that's UB

Alan Jeffrey (Nov 29 2018 at 15:59, on Zulip):

@RalfJ because a valid &T is required to be dereferencable?

RalfJ (Nov 29 2018 at 16:00, on Zulip):

what @gnzlbg said. moreover, both validity and safety may talk about memory, but they don't have to

RalfJ (Nov 29 2018 at 16:00, on Zulip):

@RalfJ because a valid &T is required to be dereferencable?

yes

gnzlbg (Nov 29 2018 at 16:00, on Zulip):

It also depends on where the object of type T is, if the object is at an unaligned address, then &T bit pattern would contain an unaligned memory address, which is not valid

gnzlbg (Nov 29 2018 at 16:00, on Zulip):

because &T is not allowed to point to unaligned memory

gnzlbg (Nov 29 2018 at 16:00, on Zulip):

that's something that not even unsafe Rust code can temporarily violate

Alan Jeffrey (Nov 29 2018 at 16:01, on Zulip):

Hmm, so unsafe code isn't allowed to create an undereferencable &T temporarily?

rkruppe (Nov 29 2018 at 16:01, on Zulip):

and to be clear, the alignment requirement is not really "talking about memory" -- it just inspects the address bits in the pointer

RalfJ (Nov 29 2018 at 16:01, on Zulip):

right, so validity for x: &T makes the following requirements (and maybe more, we haven't discussed that yet^^)

gnzlbg (Nov 29 2018 at 16:01, on Zulip):

exactly, we don't have to dereference a &T to check whether its valid, we can always just tell by looking at &T

Alan Jeffrey (Nov 29 2018 at 16:02, on Zulip):

So unsafe code can't transmute a &[T] to a &T?

RalfJ (Nov 29 2018 at 16:02, on Zulip):

exactly, we don't have to dereference a &T to check whether its valid, we can always just tell by looking at &T

well, we have to figure out if memory is allocated. which you cannot actually do inside the language.

Alan Jeffrey (Nov 29 2018 at 16:02, on Zulip):

because the &[T] might be empty.

RalfJ (Nov 29 2018 at 16:02, on Zulip):

So unsafe code can't transmute a &[T] to a &T?

correct. if the slice is empty, that's UB.

RalfJ (Nov 29 2018 at 16:03, on Zulip):

also they have different size so you cannot immediately transmute. I assume you mean some cast through a raw ptr.

Alan Jeffrey (Nov 29 2018 at 16:03, on Zulip):

@RalfJ that's a stronger requirement than C's UB IIRC.

RalfJ (Nov 29 2018 at 16:03, on Zulip):

@Alan Jeffrey it is

RalfJ (Nov 29 2018 at 16:04, on Zulip):

it's similar to int&

gnzlbg (Nov 29 2018 at 16:04, on Zulip):

Note that C++ references have similar requirements

RalfJ (Nov 29 2018 at 16:04, on Zulip):

but even stronger

RalfJ (Nov 29 2018 at 16:04, on Zulip):

references just have to be dereferencable when created and when used

gnzlbg (Nov 29 2018 at 16:04, on Zulip):

they can't be null

RalfJ (Nov 29 2018 at 16:04, on Zulip):

in rust, even just assigning a variable asserts validity of the being-assigned data

gnzlbg (Nov 29 2018 at 16:05, on Zulip):

I think that's UB in C++ too

gnzlbg (Nov 29 2018 at 16:05, on Zulip):

C++ references are not pointers, they are not even objects, they can't be null, they have to always point to a valid T, which due to strict aliasing, has to be properly aligned, etc. etc.

rkruppe (Nov 29 2018 at 16:06, on Zulip):

Is this valid C++? void foo() { int *p = new int; int &r = *p; delete p; }

rkruppe (Nov 29 2018 at 16:06, on Zulip):

the equivalent Rust would be UB because r is no longer dereferenceable at the last line

gnzlbg (Nov 29 2018 at 16:07, on Zulip):

it is, yes, C++ references do not have to be dereferenceable, as long as you don't dereference them obviously, they can dangle

gnzlbg (Nov 29 2018 at 16:08, on Zulip):

so that snippet just contains a dangling reference, which is ok in C++, but this reference is not null, and its memory address is appropriately aligned for a T

gnzlbg (Nov 29 2018 at 16:13, on Zulip):

I always wondered why C++ compilers do not error when dangling references are returned from functions. Maybe this is why: dangling references are ok.

Alan Jeffrey (Nov 29 2018 at 16:14, on Zulip):

So in Rust, the following code can produce UBfn foo(x: &[u8]) -> Option<&u8> { unsafe { let result = &*(x as *u8); if x.len() == 0 { None } else { Some(result) } } }.

RalfJ (Nov 29 2018 at 16:14, on Zulip):

the equivalent Rust would be UB because r is no longer dereferenceable at the last line

that's not true, r is not used again

gnzlbg (Nov 29 2018 at 16:14, on Zulip):

i think that depends on what does dereferenceable mean

rkruppe (Nov 29 2018 at 16:16, on Zulip):

@RalfJ if that is what you're proposing, I don't see how it justifies dereferenceable? Because that attribute licenses introducing loads anywhere in the function, even after its last use. That is the core of why Clang's use of dereferenceable is unsound, too.

RalfJ (Nov 29 2018 at 16:16, on Zulip):

this would be UB though:

fn foo() { let b = Box::new(0); let r = &mut *b; drop(b);
let r2 = r; // UB: copying a dangling ref
}
RalfJ (Nov 29 2018 at 16:17, on Zulip):

Because that attribute licenses introducing loads anywhere in the function, even after its last use. That is the core of why Clang's use of dereferenceable is unsound, too.

it was my understanding that this is the case when the attribute is on a parameter?

RalfJ (Nov 29 2018 at 16:17, on Zulip):

there must be some limit to the scope of the attribute

RalfJ (Nov 29 2018 at 16:17, on Zulip):

I have no idea how it works when used inside a function. I mean, it can't just be "until the end of the current function" because then inlining would be broken

gnzlbg (Nov 29 2018 at 16:17, on Zulip):

this would be UB as well:

fn foo() { let b = Box::new(0); let r = &mut *b; drop(b); /* r is dropped afterwards */ }
RalfJ (Nov 29 2018 at 16:18, on Zulip):

for parameters, Stacked Borrows has a special thing ensuring you dont allocate them while the function runs ("barriers")

RalfJ (Nov 29 2018 at 16:18, on Zulip):

this would be UB as well:

fn foo() { let b = Box::new(0); let r = &mut *b; drop(b); /* r is dropped afterwards */ }

no, r has no drop glue

RalfJ (Nov 29 2018 at 16:18, on Zulip):

and NLL lets us witness that that is the case

gnzlbg (Nov 29 2018 at 16:18, on Zulip):

but it is still alive, even if there is nothing to drop ?

gnzlbg (Nov 29 2018 at 16:18, on Zulip):

or is r "dead" before drop(b) ?

RalfJ (Nov 29 2018 at 16:18, on Zulip):

it does not require r to be live at the end

RalfJ (Nov 29 2018 at 16:18, on Zulip):

or is r "dead" before drop(b) ?

yes

gnzlbg (Nov 29 2018 at 16:19, on Zulip):

so values of types without drop glue are dead "as soon as possible" ?

rkruppe (Nov 29 2018 at 16:19, on Zulip):

for parameters, Stacked Borrows has a special thing ensuring you dont allocate them while the function runs ("barriers")

OK I've not kept up with stacked borrows as much as I wanted to so I'm going to take this on faith and then it should be fine. I was just confused because it sounded a bit like the validity invariant "points to allocated memory of right size" was supposed to justify the attribute on its own.

RalfJ (Nov 29 2018 at 16:19, on Zulip):

no. it's the other way around: values of types with drop glue have a use when they are dropped

gnzlbg (Nov 29 2018 at 16:20, on Zulip):

so values of types without drop glue die on their last use ?

RalfJ (Nov 29 2018 at 16:20, on Zulip):

and then yes stuff with function scope is complicated, and that's where we cannot really tease apart validity and Stacked Borrows any more. (or, we can, but now "dereferencable" becomes part of stacked borrows, not of validity.)

RalfJ (Nov 29 2018 at 16:21, on Zulip):

not sure what you mean by "die"

RalfJ (Nov 29 2018 at 16:21, on Zulip):

they have to be valid at all uses

rkruppe (Nov 29 2018 at 16:21, on Zulip):

so values of types without drop glue die on their last use ?

Values don't die, they cease to be live because there are no more uses. The reason types with drop glue don't "die early" is because the drop is a use very late in the function.

RalfJ (Nov 29 2018 at 16:21, on Zulip):

end of story. "dereferencable" is a bit special in that it is entangled with Stacked Borrows because of its function scope

gnzlbg (Nov 29 2018 at 16:22, on Zulip):

TIL - thanks - I think I have to read about "valid at all uses", that wasn't really part of my mental model

Nicole Mazzuca (Nov 29 2018 at 16:40, on Zulip):

@RalfJ it's not really cyclic - it's more like a dag

RalfJ (Nov 29 2018 at 16:42, on Zulip):

@Nicole Mazzuca it's mutually recursive but well-founded, as repr of an enum only depends on validity of its fields.

Nicole Mazzuca (Nov 29 2018 at 16:42, on Zulip):

for any given type T, validity depends on the representation of T, and the validity and representation of all T's sub-objects

Nicole Mazzuca (Nov 29 2018 at 16:43, on Zulip):

mmh

Nicole Mazzuca (Nov 29 2018 at 16:43, on Zulip):

yeah, okay

RalfJ (Nov 29 2018 at 16:43, on Zulip):

I think you said the same thing in different words^^

Nicole Mazzuca (Nov 29 2018 at 16:43, on Zulip):

yep!

Nicole Mazzuca (Nov 29 2018 at 16:43, on Zulip):

"well-founded" is a good term

RalfJ (Nov 29 2018 at 16:44, on Zulip):

sometimes, research is good for something ;)

Nicole Mazzuca (Nov 29 2018 at 16:45, on Zulip):

I am literally a mathematics major :P

RalfJ (Nov 29 2018 at 16:45, on Zulip):

I know, I'm joking :D

Nicole Mazzuca (Nov 29 2018 at 16:47, on Zulip):

I'm taking an abstract algebra course next quarter

Nicole Mazzuca (Nov 29 2018 at 16:47, on Zulip):

I am _very_ excited

RalfJ (Nov 29 2018 at 16:49, on Zulip):

:heart: algebra

Alan Jeffrey (Nov 29 2018 at 17:07, on Zulip):

Submitted https://github.com/rust-rfcs/unsafe-code-guidelines/pull/51

Alan Jeffrey (Dec 10 2018 at 20:03, on Zulip):

OK, silly naming question. Does "reference" in Rust include trait objects or not?

Alan Jeffrey (Dec 10 2018 at 20:04, on Zulip):

I think the answer is "yes", in which case how do I refer to "&T or &mut T where T is a type"?

Alan Jeffrey (Dec 10 2018 at 20:06, on Zulip):

This is coming up in the context of @rkruppe's comment at https://github.com/rust-rfcs/unsafe-code-guidelines/pull/51#pullrequestreview-181964318

Alan Jeffrey (Dec 10 2018 at 20:08, on Zulip):

for which the fix is to say something like "The alignment of &T and &mut T, where T is a type, is the word size" which is quite clunky.

QuietMisdreavus (Dec 10 2018 at 20:09, on Zulip):

in code you'd say where T: Sized, iirc, but i'm not totally sure how to word that for docs that succinctly

Alan Jeffrey (Dec 10 2018 at 20:11, on Zulip):

@QuietMisdreavus trait objects are the issue here, the problem is that the alignment of &dyn Traitmight be more than the word size, sigh.

Alan Jeffrey (Dec 10 2018 at 20:13, on Zulip):

oh, you're right, custom DSTs too.

RalfJ (Dec 10 2018 at 21:56, on Zulip):

@Alan Jeffrey I am a bit confused, because your two first questions do not connect.^^ yes, reference include &dyn Trait. in &T, T is always a type. we have some old syntax where without context we can confuse types and traits, but we also have dyn Trait as new syntax to avoid that ambiguity.

RalfJ (Dec 10 2018 at 21:58, on Zulip):

also when you are talking about the alignment of &T, do you mean the alignment of this type itself (the same way that i32 has alignment 4), or the required alignment of values of the given type (which is just the alignment of T)?

Alan Jeffrey (Dec 10 2018 at 22:26, on Zulip):

@RalfJ I was meaning the alignment of &T itself, not the valid values (which are multiples of Ts alignment).

Alan Jeffrey (Dec 10 2018 at 22:27, on Zulip):

The problem being that if T is a custom DST, then &T might have alignment greater than the word, sigh.

Alan Jeffrey (Dec 10 2018 at 22:28, on Zulip):

So in order to not box ourselves in for DSTs, we need to allow &T to have alignment that's possibly greater than a word.

Alan Jeffrey (Dec 10 2018 at 22:28, on Zulip):

Sigh.

RalfJ (Dec 11 2018 at 07:50, on Zulip):

ah I see

RalfJ (Dec 11 2018 at 07:50, on Zulip):

I don't see what's so bad about that though^^ you can always determine the current alignment with align_of, after all

Alan Jeffrey (Dec 11 2018 at 15:31, on Zulip):

@RalfJ it makes the definition wordier, rather than just "&T is word-aligned" we have "&T is at least word-aligned, and is word-aligned in the following cases..."

RalfJ (Dec 11 2018 at 15:32, on Zulip):

seems like the definition should be in terms of a desugaring to a struct

RalfJ (Dec 11 2018 at 15:32, on Zulip):

there are no new rules here, right? the fact that u64 metadata can increase alignment is just like (*const (), u64) having a larger-than-word alignment

gnzlbg (Dec 11 2018 at 15:41, on Zulip):

that's why I preferred the generic definition for DSTs, but the equivalent without that is to just say that "&T has the same layout as ..."

Last update: Nov 20 2019 at 12:20UTC