Stream: t-lang/wg-unsafe-code-guidelines

Topic: unaligned memory access


RalfJ (Feb 21 2019 at 12:05, on Zulip):

@gnzlbg memory accesses happen on a place, and a place consists of a pointer, a size, and an alignment. when you use a pointer to access memory, a (pointer)-value-to-place conversion occurs. at that point, the size and alignment of the place are determined with size_of_val and align_of_val. when the access later happens, and the pointer does not actually match the alignment given in the place, Miri detects UB.

gnzlbg (Feb 21 2019 at 12:07, on Zulip):

So when one creates a reference from a pointer, we just check the reference validity, and that checks that the place it points to has the right alignment?

RalfJ (Feb 21 2019 at 12:09, on Zulip):

when one creates a reference from a pointer, we just check the reference validity

yes

that checks that the place it points to has the right alignment?

no, no place is involved there. we can just compare the pointer value with align_of_val.

gnzlbg (Feb 21 2019 at 12:10, on Zulip):

I think I found a feature request for miri here. When dereferencing an unaligned pointer the error tells me the actual alignment of the place, and the value expected: "tried to access memory with alignment 8, but alignment 16 is required" but when I create an invalid reference it "just" tells me "type validation failed: encountered unaligned reference"

gnzlbg (Feb 21 2019 at 12:10, on Zulip):

would be cool if that could also tell me the alignment required, and the alignment "provided"

RalfJ (Feb 21 2019 at 12:10, on Zulip):

re: what I originally said, actually currently we don't use align_of_val for ref-to-place conversions, we use the static alignment reflected in the type

gnzlbg (Feb 21 2019 at 12:10, on Zulip):

we can just compare the pointer value with align_of_val.

Ah duh, yes, that makes sense.

Daniel Carosone (Feb 21 2019 at 12:11, on Zulip):

I recall hearing about some system (maybe Java) that stashed flags in the low-order bits of >byte-aligned pointers.

RalfJ (Feb 21 2019 at 12:11, on Zulip):

which is probably more reasonable because we can determine it statically. ultimately the expected alignment of a load must be known sttaically, after all.

RalfJ (Feb 21 2019 at 12:12, on Zulip):

would be cool if that could also tell me the alignment required, and the alignment "provided"

sure, make a bug report (this would be against rustc, that's part of the validation code that CTFE also uses)

gnzlbg (Feb 21 2019 at 12:12, on Zulip):

Yes, but when allocating memory, some of the methods pass an alignment manually. E.g. when allocating a [u8], I can choose the alignment

RalfJ (Feb 21 2019 at 12:12, on Zulip):

yes? what does that have to do with our discussion?

RalfJ (Feb 21 2019 at 12:13, on Zulip):

when you allocate a [u16] and you use alignment 1, you have UB by the rules I stated above

RalfJ (Feb 21 2019 at 12:13, on Zulip):

align_of_val wouldnt know how you allocated that value anyway

gnzlbg (Feb 21 2019 at 12:14, on Zulip):

All of this is obvious if one knows that miri has a linear memory region, and when one allocates with some alignment, then one gets a pointer whose address is aligned properly. I haven't thought about how that would be implemented till now, but it probably does not make sense to implement that in any other way.

gnzlbg (Feb 21 2019 at 12:14, on Zulip):

I just thought that maybe, miri allocations where "disjoint" memory regions

gnzlbg (Feb 21 2019 at 12:15, on Zulip):

so that if I allocate twice a [u8; 16], that I can't use a pointer to the first one to access the second region or something

gnzlbg (Feb 21 2019 at 12:18, on Zulip):

but it suffice to remember which allocation each pointer was crated from, and then check on dereference that it points to somewhere inside that allocation

RalfJ (Feb 21 2019 at 12:21, on Zulip):

miri has a linear memory region

what do you mean by this?

RalfJ (Feb 21 2019 at 12:21, on Zulip):

miri allocations where "disjoint" memory regions

they are

gnzlbg (Feb 21 2019 at 12:21, on Zulip):

@RalfJ this information, which allocation is associated to which pointer / reference, is not part of Stacked Borrows, right ?

RalfJ (Feb 21 2019 at 12:22, on Zulip):

@RalfJ this information, which allocation is associated to which pointer / reference, is not part of Stacked Borrows, right ?

correct, that's just miri's basic memory model

RalfJ (Feb 21 2019 at 12:22, on Zulip):

which is not necessarily the same as Rust's

gnzlbg (Feb 21 2019 at 12:22, on Zulip):

so how is that checked ?

gnzlbg (Feb 21 2019 at 12:22, on Zulip):

does every pointer get an associated allocation, and that's propagated to all derived pointers /references ?

gnzlbg (Feb 21 2019 at 12:24, on Zulip):

maybe not even an allocation, maybe just some "bounds", e.g., pointer value + range of values that it can take, which is why the error happens when the pointer is offsetted, and not on dereference

RalfJ (Feb 21 2019 at 14:01, on Zulip):

so how is that checked ?

it's less checked and more by construction. sorry I dont have the time to explain the memory model right now.

RalfJ (Feb 21 2019 at 14:01, on Zulip):

ideally I should explain it into a .md file anyway and put that somewhere. @oli what would be a good place for a document explaining miri's memory model? the rustc guide? it is also used by CTFE, after all.

oli (Feb 21 2019 at 14:44, on Zulip):

yes, the guide is the idea place for this

oli (Feb 21 2019 at 14:44, on Zulip):

@RalfJ maybe pointing to the exact page from the miri front page might be helpful for interested parties, too

gnzlbg (Feb 21 2019 at 14:48, on Zulip):

In which aspect is this memory model relevant for Rust programmers ?

RalfJ (Feb 21 2019 at 14:49, on Zulip):

to learn about memory models? :D and to understand how CTFE works

gnzlbg (Feb 21 2019 at 14:49, on Zulip):

As in, it might be better to document it in the miri repo, and link it / include it, in the rustc-guide and other document

gnzlbg (Feb 21 2019 at 14:49, on Zulip):

@RalfJ if my code is correct according to miri's memory model, is it also correct in x86 ?

gnzlbg (Feb 21 2019 at 14:50, on Zulip):

or in other words, is the intent to make miri's memory model the memory model that all rust programs should adhere to ?

RalfJ (Feb 21 2019 at 14:51, on Zulip):

note that Rust's memory model is even further from x86 than from Miri's

RalfJ (Feb 21 2019 at 14:52, on Zulip):

but the intent is for Miri's model to be a sound approximation of Rust's model, yes

RalfJ (Feb 21 2019 at 14:52, on Zulip):

as in, every program defined in Miri's model should be defined in Rust's model

gnzlbg (Feb 21 2019 at 14:52, on Zulip):

So I think that should be its own document, with the intent that it will be RFC'ed at some point

RalfJ (Feb 21 2019 at 14:52, on Zulip):

there are some hard questions there about transmuting pointers to integers, but let's ignore those

RalfJ (Feb 21 2019 at 14:53, on Zulip):

I dont think it needs RFC'ing, it's a compiler implementation detail

gnzlbg (Feb 21 2019 at 14:53, on Zulip):

maybe even in the UCG repo, just like stacked borrows, as documentation for now of what miri does

RalfJ (Feb 21 2019 at 14:53, on Zulip):

the Rust memory model needs RFC'ing

RalfJ (Feb 21 2019 at 14:53, on Zulip):

but it first needs a paper, because it's that far out there^^

gnzlbg (Feb 21 2019 at 14:54, on Zulip):

so the programs that miri's model will accept is a subset of the one that rust memory model accepts ?

RalfJ (Feb 21 2019 at 14:54, on Zulip):

hopefully

RalfJ (Feb 21 2019 at 14:55, on Zulip):

and moreover, if a program gets accepted by both it has the same behavior (that might seem obvious but it isnt^^)

gnzlbg (Feb 21 2019 at 14:55, on Zulip):

so I think that would be a language feature worth guaranteeing, but anyways, I'm digressing, any place is fine for me as long as it gets written down somewhere at some point.

RalfJ (Feb 21 2019 at 15:00, on Zulip):

it is certainly worth guaranteeing, but it may be really, really hard

RalfJ (Feb 21 2019 at 15:00, on Zulip):

and the Miri model should also be relatively simple

RalfJ (Feb 21 2019 at 15:01, on Zulip):

like, one example of a case where we currently don't achieve this is that when you compare two pointers into different objects, they will never be equal. but in reality, if two objects are allocated right next to each other, a pointer to the end of one object might compare equal to a pointer to the beginning of another.

RalfJ (Feb 21 2019 at 15:01, on Zulip):

we could restrict the comparisons we allow in Miri's model, but that means programs will stop working that really should work

gnzlbg (Feb 21 2019 at 15:02, on Zulip):

like, one example of a case where we currently don't achieve this is that when you compare two pointers into different objects, they will never be equal. but in reality, if two objects are allocated right next to each other, a pointer to the end of one object might compare equal to a pointer to the beginning of another.

I thought that this was undefined behavior.

RalfJ (Feb 21 2019 at 15:03, on Zulip):

no, both C and C++ allow this (but C++ is vague about what the result is)

RalfJ (Feb 21 2019 at 15:04, on Zulip):

this is related to the special exception for the pointer "right at the end" (often called one-past-the-end) of an allocation

gnzlbg (Feb 21 2019 at 15:04, on Zulip):

i don't recall it that way, i recall that the behavior of comparing pointers originating from different allocations is undefined

RalfJ (Feb 21 2019 at 15:06, on Zulip):

nope, you are mistaken

RalfJ (Feb 21 2019 at 15:06, on Zulip):

(I am talking about == and != comparison)

gnzlbg (Feb 21 2019 at 15:06, on Zulip):

that is, if you have a pointer to an [u8; N] array, you can use that pointer, and pointers derived from it, to refer to elements in the array, and one past the end, but if you allocate two [u8; N] arrays, you can't use a pointer to an element of the first array to access an element of the second. If the "one-past-the-end" pointer of the first array, happens to have the same address as the first element of the second array, those two pointers are allowed to compare differently, even though they have the same address

RalfJ (Feb 21 2019 at 15:06, on Zulip):

you can't use the pointer to access the other side

RalfJ (Feb 21 2019 at 15:06, on Zulip):

but we talked about comparing pointers

RalfJ (Feb 21 2019 at 15:06, on Zulip):

different thing

RalfJ (Feb 21 2019 at 15:07, on Zulip):

the access is UB, the comparison is not

gnzlbg (Feb 21 2019 at 15:07, on Zulip):

i thought the comparison was also UB

RalfJ (Feb 21 2019 at 15:07, on Zulip):

in C, the comparison must "compare the physical representations of the pointer" (but really compilers ignore that), in C++ they say something about the result being unspecified or so

gnzlbg (Feb 21 2019 at 15:08, on Zulip):

hmm

RalfJ (Feb 21 2019 at 15:08, on Zulip):

so the comparison is defined behavior but they may either compare equal or not

RalfJ (Feb 21 2019 at 15:08, on Zulip):

whether comparing them multiple times must yield consistent result is unknown (as usual in C/C++...)

RalfJ (Feb 21 2019 at 15:09, on Zulip):

well actually in C it must yield consistent results, the standard permits no non-determinism

RalfJ (Feb 21 2019 at 15:09, on Zulip):

in C++ it's unclear

RalfJ (Feb 21 2019 at 15:09, on Zulip):

and anyway LLVM implements the C++ version even for C

gnzlbg (Feb 21 2019 at 15:12, on Zulip):

http://eel.is/c++draft/expr.eq#3.1

gnzlbg (Feb 21 2019 at 15:12, on Zulip):

So the result is unspecified.

gnzlbg (Feb 21 2019 at 15:14, on Zulip):

That is, the comparison is allowed to happen, but the result can be anything.

RalfJ (Feb 21 2019 at 15:16, on Zulip):

well, it can be true or false

RalfJ (Feb 21 2019 at 15:16, on Zulip):

but I read this as requiring a valid boolean, i.e., using this is a conditional is allowed

gnzlbg (Feb 21 2019 at 15:17, on Zulip):

yes, but what miri does (returning always false) is ok according to this

RalfJ (Feb 21 2019 at 15:17, on Zulip):

@RalfJ maybe pointing to the exact page from the miri front page might be helpful for interested parties, too

for now I opened an issue: https://github.com/rust-lang/rust/issues/58618

RalfJ (Feb 21 2019 at 15:18, on Zulip):

I have this on my list, but it's quite far down (I think a basic spec of MIR without the memory model is more important), so that shouldnt block anyone else from giving it a shot

RalfJ (Feb 21 2019 at 15:18, on Zulip):

true, but I deliberately said the behavior is the same in both models

gnzlbg (Feb 21 2019 at 15:18, on Zulip):

reading this part of the C++ standard, i cannot but think that none of this makes much sense as is if you have ZSTs in the language (which C and C++ don't have), then you have many objects at the same address

RalfJ (Feb 21 2019 at 15:18, on Zulip):

and if one model allows true/false, and the other always makes it true, that's not the same behavior

RalfJ (Feb 21 2019 at 15:19, on Zulip):

well, with ZSTs the pointer is always one-past-the-end

gnzlbg (Feb 21 2019 at 15:19, on Zulip):

@RalfJ to make the behavior the same in both models, we can just make it undefined

RalfJ (Feb 21 2019 at 15:19, on Zulip):

so they all may compare arbitarily

gnzlbg (Feb 21 2019 at 15:20, on Zulip):

so the comparison itself wouldn't return false on miri, but be an error, and when targetting something else, anything can happen

RalfJ (Feb 21 2019 at 15:20, on Zulip):

@RalfJ to make the behavior the same in both models, we can just make it undefined

so you want to make ptr::eq on ZSTs UB? I dont think that'll make people happy^^

RalfJ (Feb 21 2019 at 15:20, on Zulip):

oh you mean undefined in Miri only

RalfJ (Feb 21 2019 at 15:20, on Zulip):

yes we could. but Miri is also supposed to be useful...

gnzlbg (Feb 21 2019 at 15:20, on Zulip):

i think that if we are going to adopt similar semantics to C++, we need to handle ZSTs differently anyways

RalfJ (Feb 21 2019 at 15:21, on Zulip):

ptr comparison has nothing to do with types, but I assume you mean zero-sized allocations

RalfJ (Feb 21 2019 at 15:21, on Zulip):

(which we do have, e.g. from ZST statics)

gnzlbg (Feb 21 2019 at 15:21, on Zulip):

like two pointers with the same address to ZSTs, are they pointing their objects? their objects _and_ one past the end? their object and all other objects in that address? etc.

RalfJ (Feb 21 2019 at 15:21, on Zulip):

there's no object^^

RalfJ (Feb 21 2019 at 15:21, on Zulip):

4 is a valid pointer to a [u32; 0]

gnzlbg (Feb 21 2019 at 15:22, on Zulip):

there is a value

RalfJ (Feb 21 2019 at 15:22, on Zulip):

there's a pointer value

RalfJ (Feb 21 2019 at 15:22, on Zulip):

but no pointed-to value

RalfJ (Feb 21 2019 at 15:22, on Zulip):

in the language/machine

RalfJ (Feb 21 2019 at 15:22, on Zulip):

ZSTs are a fiction

RalfJ (Feb 21 2019 at 15:23, on Zulip):

I think trying to make them actually exist in the memory model will make that model a lot more complicated, for no gain at all

gnzlbg (Feb 21 2019 at 15:24, on Zulip):

i'm not suggesting that, just saying that the memory model will need to explain what happens with zero-sized allocations, ZST values, etc. somewhere

gnzlbg (Feb 21 2019 at 15:25, on Zulip):

even if that is, a ZST value can have any address, addresses to ZST values of the same type always compare equally / different / are not comparable at all, etc.

RalfJ (Feb 21 2019 at 15:25, on Zulip):

zero-sized allocations, yes. they are just allocations with size 0 though and should not get any special treatment.
ZST, no. they just dont show up any more. the memory model doesnt care about types.

RalfJ (Feb 21 2019 at 15:25, on Zulip):

there's no such thing as a ZST value, at the language semantics level

gnzlbg (Feb 21 2019 at 15:26, on Zulip):

i thought that Box<ZST> performed a zero-sized allocation, returning a pointer, and that pointer was the address of the value of that type

RalfJ (Feb 21 2019 at 15:26, on Zulip):

Box<ZST> returns align_of::<ZST>

RalfJ (Feb 21 2019 at 15:26, on Zulip):

no allocation happens

gnzlbg (Feb 21 2019 at 15:26, on Zulip):

that's how it is implemented, is that the operational semantics for allocating a ZST ?

RalfJ (Feb 21 2019 at 15:26, on Zulip):

but there are ways to get an actual zero-sized allocation, such as static FOO: ZST

RalfJ (Feb 21 2019 at 15:27, on Zulip):

from the memory model perspective you dont allocate a type but a size/alignment. and sure, the size can be 0. so what?

gnzlbg (Feb 21 2019 at 15:27, on Zulip):

If I do GlobalAlloc::alloc(Layout<ZST>) -> *mut u8, can I do *ptr = ZST() ?

RalfJ (Feb 21 2019 at 15:27, on Zulip):

the GlobalAlloc contract says you are not allowed to use it with size 0

RalfJ (Feb 21 2019 at 15:28, on Zulip):

but you are also jumping around between lots of different topics here^^

RalfJ (Feb 21 2019 at 15:29, on Zulip):

*ptr = ZST is a zero-sized memory access, which does not actually do anything in memory (but it should check alignment, which Miri currently does not implement correctly)

gnzlbg (Feb 21 2019 at 15:30, on Zulip):

I would prefer a programming language in which ZST don't need to be handled exceptionally

gnzlbg (Feb 21 2019 at 15:30, on Zulip):

I know this is how things are currently implemented, but I always assumed that these were implementation details.

RalfJ (Feb 21 2019 at 15:30, on Zulip):

well, just go look at the slice iterator or Box source code. in Rust, they do need special treatment.

RalfJ (Feb 21 2019 at 15:31, on Zulip):

and in the model, zero-sized accesses (the types dont matter) are special in the sense that they dont actually require there to be a valid allocation anywhere

gnzlbg (Feb 21 2019 at 15:31, on Zulip):

In this particular implementation of the standard library they need special treatment, but whether it is necessary to leak that special treatment to users is a different issue

RalfJ (Feb 21 2019 at 15:32, on Zulip):

we could say that they do, but then a whole lot of existing code has UB

RalfJ (Feb 21 2019 at 15:32, on Zulip):

so we likely don't want that

RalfJ (Feb 21 2019 at 15:32, on Zulip):

well, we can either have a special clause for empty accesses, or make a lot of existing code (in libstd and most likely elsewhere) UB.

RalfJ (Feb 21 2019 at 15:32, on Zulip):

I dont see a third way.

RalfJ (Feb 21 2019 at 15:33, on Zulip):

also note that the special case is rather tiny, it's one early return in the code for reads and writes

gnzlbg (Feb 21 2019 at 15:33, on Zulip):

The third way would be ZSTs are not special, and code that users can write today that's guaranteed to work still does not have UB

RalfJ (Feb 21 2019 at 15:34, on Zulip):

you cant just axiomatize that, you have to make it happen in the model

RalfJ (Feb 21 2019 at 15:34, on Zulip):

I don't think that's possible

gnzlbg (Feb 21 2019 at 15:34, on Zulip):

if a user is avoiding Box<ZST> and is just using *mut ptr = align_of<ZST>(), and relying that the addresses match, then that code might have UB, but that's not something that we are guaranteeing

RalfJ (Feb 21 2019 at 15:34, on Zulip):

and again, it's not the zero-sized types that are special, it's the accesses

RalfJ (Feb 21 2019 at 15:35, on Zulip):

if a user is avoiding Box<ZST> and is just using *mut ptr = align_of<ZST>(), and relying that the addresses match, then that code might have UB, but that's not something that we are guaranteeing

addresses match? we are talking about accesses, not ptr comparison!

gnzlbg (Feb 21 2019 at 15:35, on Zulip):

I thought that we were also talking about pointers to ZSTs and whether comparing the pointers is allowed or not

RalfJ (Feb 21 2019 at 15:35, on Zulip):

let x = *Box::new(ZST) is UB without that special clause

RalfJ (Feb 21 2019 at 15:36, on Zulip):

because you are doing a memory access to address 1

RalfJ (Feb 21 2019 at 15:36, on Zulip):

and why would there be an allocation there

RalfJ (Feb 21 2019 at 15:36, on Zulip):

comparing pointers does not care about the type (I said that above, hence the emphasis...)

gnzlbg (Feb 21 2019 at 15:37, on Zulip):

that is not doing a memory access at address 1, that's reading 0 memory from address 1, which is a nop

RalfJ (Feb 21 2019 at 15:37, on Zulip):

you just dclared zero-siozed accesses a special case

RalfJ (Feb 21 2019 at 15:37, on Zulip):

by making them a NOP

RalfJ (Feb 21 2019 at 15:37, on Zulip):

you said above you didnt want that

RalfJ (Feb 21 2019 at 15:37, on Zulip):

please make up your mind

gnzlbg (Feb 21 2019 at 15:38, on Zulip):

i don't know why this is turning violent, but this is not what I said

RalfJ (Feb 21 2019 at 15:38, on Zulip):

I am just getting frustrated because I feel like you keep ignoring what I am saying.

gnzlbg (Feb 21 2019 at 15:38, on Zulip):

so lets fall back

RalfJ (Feb 21 2019 at 15:38, on Zulip):

also I should get some work done today, and this already took too much time -- sorry

gnzlbg (Feb 21 2019 at 15:39, on Zulip):

ok, better to do this some other time

RalfJ (Feb 21 2019 at 15:39, on Zulip):

once we have some actual models written down, I can point at the special casing in accesses and the absence of special casing in ptr comparison

RalfJ (Feb 21 2019 at 15:39, on Zulip):

and then we have a concrete ground for this discussion

RalfJ (Feb 21 2019 at 15:39, on Zulip):

currently we likely are working under very different assumptions, hence the furstration

gnzlbg (Feb 21 2019 at 15:40, on Zulip):

i feel that when you talk about access and i talk about access we are referring to different things

gnzlbg (Feb 21 2019 at 15:40, on Zulip):

so that sounds like a good idea

RalfJ (Feb 21 2019 at 15:41, on Zulip):

gosh I wish I could just have a pipe from my brain to a markdown file to be able to communicate the framework I am working in here^^

RalfJ (Feb 21 2019 at 15:41, on Zulip):

typing all of that up is so slow

RalfJ (Feb 21 2019 at 15:42, on Zulip):

or just do a brain transfer to you :P

gnzlbg (Feb 21 2019 at 15:42, on Zulip):

if that happens some day, it will be powered by Rust

RalfJ (Feb 21 2019 at 15:42, on Zulip):

:D

gnzlbg (Feb 21 2019 at 15:42, on Zulip):

and all this things will make sure that you don't pipe the wrong things out of your brain :P

Last update: Nov 19 2019 at 17:35UTC