Stream: t-lang/wg-unsafe-code-guidelines

Topic: transitive validity of references


gnzlbg (Mar 15 2019 at 12:50, on Zulip):

@RalfJ in the lang meeting there were some questions about checking the validity of references transitively, and arguments about whether they should be shallow or deep. I don't understand how we could check the validity of references if their validity would depend on the validity of the pointee, e.g., &[T] would need us to check that all Ts in the slice are valid for &[T] to be valid

gnzlbg (Mar 15 2019 at 12:51, on Zulip):

e.g. @Taylor Cramer argued that the deepness that we check is at most N (e.g. 5)

gnzlbg (Mar 15 2019 at 12:52, on Zulip):

I suppose this did not mean that we would only check the first N elements of &[T], but that we could check at most &[&[&[&[&[T]]]]]

gnzlbg (Mar 15 2019 at 12:53, on Zulip):

so if each slice has M elements, we'd need to check M^N values for slices

gnzlbg (Mar 15 2019 at 12:54, on Zulip):

even with N=1 this sounds unrealistic to me

gnzlbg (Mar 15 2019 at 12:55, on Zulip):

the only realistic option is N=0 which means reference validity is shallow checked, e.g., they are properly aligned, non-zero, but not that their pointee must be valid

gnzlbg (Mar 15 2019 at 12:55, on Zulip):

maybe we could check that the pointee has the appropriate layout - e.g. that the memory bounds of the pointee are ok, but that's it.

gnzlbg (Mar 15 2019 at 12:57, on Zulip):

(deleted)

gnzlbg (Mar 15 2019 at 12:58, on Zulip):

so when casting &[u8] as &[bool]miri wouldn't check if the bools are invalid, it would only check it when the bools are actually accessed

gnzlbg (Mar 15 2019 at 13:03, on Zulip):

Maybe for code that uses no slices we can check more, but a lot of code uses Vec or String which get coerced to slices often.

gnzlbg (Mar 15 2019 at 13:05, on Zulip):

so I don't know, if we require validity of references to be transitive, but we can't check it in practice, i wonder if it wouldn't be better to adjust the validity rules of references to not require transitivity

RalfJ (Mar 15 2019 at 15:31, on Zulip):

yes I think we meant "5 levels of references"

RalfJ (Mar 15 2019 at 15:31, on Zulip):

and yes that could still be expensive

gnzlbg (Mar 15 2019 at 15:38, on Zulip):

do we gain any optimization power from requiring the pointee of a valid reference to be valid ?

RalfJ (Mar 15 2019 at 15:39, on Zulip):

yes. we can optimize away a function taking &! as input (this was requested by @Taylor Cramer )

RalfJ (Mar 15 2019 at 15:39, on Zulip):

I have some thoughts for other ways to achieve this, but they are a bit hacky.

gnzlbg (Mar 15 2019 at 15:41, on Zulip):

I think this also helps reasoning about unsafe code (e.g. if you see a reference you know that the pointee is valid), otherwise use pointers.

gnzlbg (Mar 15 2019 at 15:42, on Zulip):

this means that you can always dereference a reference in unsafe code

gnzlbg (Mar 15 2019 at 15:43, on Zulip):

otherwise, you can only dereference one if the pointee is valid, which might not be easy or cheap to check

gnzlbg (Mar 15 2019 at 15:43, on Zulip):

@RalfJ what thoughts?

gnzlbg (Mar 15 2019 at 15:43, on Zulip):

we could try to cache these checks

gnzlbg (Mar 15 2019 at 15:44, on Zulip):

but beyond that..

gnzlbg (Mar 15 2019 at 15:45, on Zulip):

making caching efficient would probably involve a lot of work

RalfJ (Mar 15 2019 at 15:46, on Zulip):

otherwise, you can only dereference one if the pointee is valid, which might not be easy or cheap to check

true. OTOH, it's not like the stronger validity makes this any easier. it just moves the same work to the point when you create a reference: when turning a raw ptr to a ref, you now must know the pointee is valid, which might not be easy or cheap to check

RalfJ (Mar 15 2019 at 15:48, on Zulip):

we could try to cache these checks

that's actually made much harder by recursive validity, because now changing something in memory could make any random reference invalid if you can reach that memory from it

RalfJ (Mar 15 2019 at 15:52, on Zulip):

RalfJ what thoughts?

basically: (1) validity is "pure", does not depend on memory at all; references are valid if they are non-null and aligned; (2) any further property is enforced by Stacked Borrows, which (as part of reborrowing) already "dereferences" references (to update the tags of the memory they point to). Stacked Borrows depends on the layout of the pointee -- it has to, to determine the size. we could add a simple but hacky extra condition saying that moreove, if that layout is of Abi type "uninhabited", then we have UB. that'd get us @Taylor Cramer's optimization.

gnzlbg (Mar 15 2019 at 16:06, on Zulip):

I see. So in a nutshell validity does not guarantee that we can do @Taylor Cramer optimization, stacked borrows does.

RalfJ (Mar 15 2019 at 16:11, on Zulip):

and not just the &! thing, even the dereferencable attribute would source from Stacked Borrows, as opposed to validity.

RalfJ (Mar 15 2019 at 16:11, on Zulip):

this would mean validity does not depend on memory and can be fully cached -- validity and "bitlist validity" coincide

RalfJ (Mar 15 2019 at 16:12, on Zulip):

however it also makes me wonder if the distinction between validity and Stacked Borrows is even meaningful^^

gnzlbg (Mar 15 2019 at 16:49, on Zulip):

would stacked borrows require that references are always dereferenceable ?

E.g. would let unused: &[bool] = transmute(&[3_u8, 4, 5]); "check" under stacked borrows if unused is never used ?

gnzlbg (Mar 15 2019 at 16:50, on Zulip):

Would re-borrrowing there dereference unused ?

RalfJ (Mar 16 2019 at 09:33, on Zulip):

would let unused: &[bool] = transmute(&[3_u8, 4, 5]); "check" under stacked borrows if unused is never used ?

yes. after every assignment of reference types (and currently more, but I am thinking maybe that was overeager?), we do retagging. retagging asserts that the reference is dereferencable.

RalfJ (Mar 16 2019 at 09:33, on Zulip):

oh wait.

RalfJ (Mar 16 2019 at 09:33, on Zulip):

so, it'd check that the reference is derefencable (though I'd prefer ifw e could get unsized types out of the equation here, they might change things)

RalfJ (Mar 16 2019 at 09:34, on Zulip):

but it'll not check whether these are valid bool because it doesnt check that the pointee is valid. it just checks that the pointee is allocated.

gnzlbg (Mar 16 2019 at 10:14, on Zulip):

@RalfJ so by dereferenceable we just check that the allocation satisfies the layout (e.g. that it is large enough to be dereferenced), but not that the memory is valid, right?

RalfJ (Mar 16 2019 at 10:15, on Zulip):

correct

RalfJ (Mar 16 2019 at 10:15, on Zulip):

dereferncable as LLVM attribute means that there's memory there that can be read from (without incurring data races or so)

RalfJ (Mar 16 2019 at 10:15, on Zulip):

at least that's my interpretation -- based on what it is used for: inserting spurious loads

gnzlbg (Mar 16 2019 at 10:15, on Zulip):

so I don't think there are any issues with making this part of validity

RalfJ (Mar 16 2019 at 10:15, on Zulip):

egh, sorry. LLVM doesnt consider read-write races UB. so I guess the racy part doesnt matter.

gnzlbg (Mar 16 2019 at 10:16, on Zulip):

as in, for a reference to be valid it must point to an allocation that's sufficiently large to fit the layout

RalfJ (Mar 16 2019 at 10:16, on Zulip):

well, it's somewhat a matter of redundancy -- if stacked borrows checks this anyway (as a side-effect), then why also check it in validity?

RalfJ (Mar 16 2019 at 10:16, on Zulip):

in particular when there are benefits from making validity a property that looks at the value only, not at the rest of memory

gnzlbg (Mar 16 2019 at 10:17, on Zulip):

in the implementation it wouldn't be checked twice, but conceptually, it might be worth to teach this as part of validity

gnzlbg (Mar 16 2019 at 10:17, on Zulip):

we should layer what kind of unsafe code people can write with different amounts of knowledge

RalfJ (Mar 16 2019 at 10:17, on Zulip):

well I was also speaking conceptually

RalfJ (Mar 16 2019 at 10:18, on Zulip):

for the spec

RalfJ (Mar 16 2019 at 10:18, on Zulip):

teaching can diverge from the spec though

gnzlbg (Mar 16 2019 at 10:18, on Zulip):

validity is something that pretty much everybody using zeroed / uninitialized / maybeuninit will need to know, but stacked borrows would be a level higher

RalfJ (Mar 16 2019 at 10:18, on Zulip):

right but that's not an argument to specify things this way

RalfJ (Mar 16 2019 at 10:18, on Zulip):

only to teach them this way

gnzlbg (Mar 16 2019 at 10:18, on Zulip):

right

RalfJ (Mar 16 2019 at 10:18, on Zulip):

and we're talking about the spec here, I think. that was my assumption anyway.

RalfJ (Mar 16 2019 at 10:19, on Zulip):

let's defer the teaching part until we have a spec^^

gnzlbg (Mar 16 2019 at 10:19, on Zulip):

thinking about this, while teaching, we don't have to use the word validity

RalfJ (Mar 16 2019 at 10:19, on Zulip):

well people want us to change that word anyway

gnzlbg (Mar 16 2019 at 10:19, on Zulip):

so @centril issue isn't necessary to rename validity in the spec

RalfJ (Mar 16 2019 at 10:19, on Zulip):

"initialization invariant" (or "initialziedness" or something like that... is there a less awkward equivalent?) is my current favourite

gnzlbg (Mar 16 2019 at 10:19, on Zulip):

they want a different name that they can use informally, while programming or teaching, but the spec doesn't need to match that necessarily

gnzlbg (Mar 16 2019 at 10:20, on Zulip):

it's nice if the names match, but it isn't a must, e.g., C programmers don't say x has a trap representation / indeterminate value, they just say x contains garbage or "is not initialized"

rkruppe (Mar 16 2019 at 10:20, on Zulip):

Using one term with users and another one in the spec will confuse everyone who goes from being taught about this to reading the spec themselves, which should hopefully be many people

rkruppe (Mar 16 2019 at 10:20, on Zulip):

The C standard's level of difficulty is not a good bar to aspire to imo :)

RalfJ (Mar 16 2019 at 10:20, on Zulip):

it's nice if the names match, but it isn't a must, e.g., C programmers don't say x has a trap representation / indeterminate value, they just say x contains garbage or "is not initialized"

well also in C nobody knows how these terms really relate

gnzlbg (Mar 16 2019 at 10:20, on Zulip):

@RalfJ initialization invariant sounds good to me

RalfJ (Mar 16 2019 at 10:21, on Zulip):

so maybe let's not use C as the example for this...

gnzlbg (Mar 16 2019 at 10:21, on Zulip):

i was just arguing that the terms are not required to match

RalfJ (Mar 16 2019 at 10:21, on Zulip):

well you gave an example for a case where they dont match and things are rather messy

RalfJ (Mar 16 2019 at 10:21, on Zulip):

seems like an argument to make them match :P

gnzlbg (Mar 16 2019 at 10:22, on Zulip):

MaybeDoesNotSatisfyTheInitializationInvariant<T> :D

RalfJ (Mar 16 2019 at 10:22, on Zulip):

we're not talking about type names again are we?^^

RalfJ (Mar 16 2019 at 10:23, on Zulip):

which reminds me I should push stabilizing that thing...

gnzlbg (Mar 16 2019 at 10:24, on Zulip):

and push the place value etc. PR!

Last update: Nov 19 2019 at 17:35UTC