Stream: t-lang/wg-unsafe-code-guidelines

Topic: padding bytes def #183


gnzlbg (Jul 31 2019 at 11:38, on Zulip):

@RalfJ I am not sure how your typed copy proposal can be reconciled with how memcpy works right now

gnzlbg (Jul 31 2019 at 11:41, on Zulip):
let from: T = valid;
let mut to: T = uninitialized;
memcpy(&to as .. as *const u8, &mut from as ... as *mut u8, size_of<T>());

Here memcpy only works on a [u8] so IIUC per your typed copy proposal, all bytes of T including padding bytes should be copied. Yet right now this is not the case.

gnzlbg (Jul 31 2019 at 11:41, on Zulip):

(deleted)

gnzlbg (Jul 31 2019 at 11:44, on Zulip):

To make memcpy work in your proposal, it would need to operate on *T and not on *u8

gnzlbg (Jul 31 2019 at 11:45, on Zulip):

The implementation of our memcpy is here: https://github.com/rust-lang-nursery/compiler-builtins/blob/6178e2c61105a9ff7fa1c4fc974b142b0c07ae3d/src/mem.rs#L9

And as you see the code there is written to copy all bytes, so it must copy padding bytes, but it can be optimized not to do that, and it is currently optimized as such.

rkruppe (Jul 31 2019 at 11:57, on Zulip):

I don't not believe it, but could you please demonstrate a concrete program using memcpy which is optimized to not copy padding?

gnzlbg (Jul 31 2019 at 12:04, on Zulip):

uh

gnzlbg (Jul 31 2019 at 12:04, on Zulip):

https://rust.godbolt.org/z/_Oha1o

gnzlbg (Jul 31 2019 at 12:04, on Zulip):

we don't, but then, it is a compiler bug

gnzlbg (Jul 31 2019 at 12:06, on Zulip):

@rkruppe https://rust.godbolt.org/z/SDb-pj those two operations should only copy one byte, and not 128 bytes

rkruppe (Jul 31 2019 at 12:10, on Zulip):

Is that a compiler bug? Really? It seems to respect the quite reasonable semantics Ralf proposed, that's a good thing in my book.

gnzlbg (Jul 31 2019 at 12:11, on Zulip):

https://github.com/rust-lang/rust/issues/63159

gnzlbg (Jul 31 2019 at 12:11, on Zulip):

I consider it a missed optimization

rkruppe (Jul 31 2019 at 12:12, on Zulip):

The memcpy one I mean. ptr::copy_nonoverlapping could be considered a typed copy, so it should be optimizable. But plain memcpy being just that and not magically knowing about padding is a good thing IMO

gnzlbg (Jul 31 2019 at 12:12, on Zulip):

The semantics ralf proposed do say that when doing a typed copy the padding bytes do not need to be copied

gnzlbg (Jul 31 2019 at 12:13, on Zulip):

@rkruppe ptr::copy_nonoverlapping is the API of memcpy in Rust: https://rust.godbolt.org/z/rEG68z

gnzlbg (Jul 31 2019 at 12:13, on Zulip):

(or at least is the only way I know to call the definition given in compiler-builtins)

rkruppe (Jul 31 2019 at 12:14, on Zulip):

I know what it lowers to, but it's a typed API and an intrinsic, so we can give it "typed copy" semantics while clearly making raw memcpy know about padding would require some ugly magic.

rkruppe (Jul 31 2019 at 12:14, on Zulip):

btw, LLVM will recognize calls to extern "C" { fn memcpy(...); } as The Memcpy Function and convert it to its intrinsics

rkruppe (Jul 31 2019 at 12:17, on Zulip):

Both of your examples are (in the first case) or could be (in the second case) typed copies so I don't really see the contradiction to Ralf's semantics. That's all I'm saying.

gnzlbg (Jul 31 2019 at 12:17, on Zulip):

In C: https://rust.godbolt.org/z/Ka2VLG

rkruppe (Jul 31 2019 at 12:18, on Zulip):

lol

gnzlbg (Jul 31 2019 at 12:18, on Zulip):

in Rust, using extern "C" { fn memcpy(...); } does not recognize the intrinsic as a memcpy function, so I'd guess that's another bug

rkruppe (Jul 31 2019 at 12:18, on Zulip):

huh

gnzlbg (Jul 31 2019 at 12:18, on Zulip):

(note that in C both Clang and GCC do the optimization)

gnzlbg (Jul 31 2019 at 12:18, on Zulip):

In Rust with extern "C" another missed optimization: https://rust.godbolt.org/z/O9Lrqw

gnzlbg (Jul 31 2019 at 12:20, on Zulip):

Actually, I'm not sure if the extern "C" call in Rust is a missed optimization, it is definetely an optimization that C and C++ do, but since we have ptr::copy_nonoverlapping and C and C++ do not have it, I do not really care that much about extern "C"

gnzlbg (Jul 31 2019 at 12:20, on Zulip):

although it would at least be nice to have an attribute that the libc crate could use on the memcpy and memmove intrinsics that it exposes

gnzlbg (Jul 31 2019 at 12:20, on Zulip):

but that's a problem that the libc crate has, and can be solved using unstable attributes

rkruppe (Jul 31 2019 at 12:23, on Zulip):

RalfJ I am not sure how your typed copy proposal can be reconciled with how memcpy works today:

Anyway, do you agree that this issue :up: is not an issue after all?

gnzlbg (Jul 31 2019 at 12:51, on Zulip):

yes, thanks, talking with you helped

RalfJ (Jul 31 2019 at 17:24, on Zulip):

great, issue resolved before I even arrived in this thread :)

gnzlbg (Aug 12 2019 at 12:10, on Zulip):

@RalfJ a different way to define padding could be

gnzlbg (Aug 12 2019 at 12:13, on Zulip):

to just say that they have a particular type, like MaybeUninit<u8> ?

gnzlbg (Aug 12 2019 at 12:13, on Zulip):

e.g. (u8, u16) would just be (u8, MaybeUninit<u8>, u16)

gnzlbg (Aug 12 2019 at 12:13, on Zulip):

and that the rules are the same ?

gnzlbg (Aug 12 2019 at 12:14, on Zulip):

for initialization, we probably want to say that they are initialized to MaybeUninit::uninit

rkruppe (Aug 12 2019 at 12:14, on Zulip):

The rules can't be the same. A MaybeUninit<u8> field always has to be copied, but by other (more conventional) definitions they don't have to be copied.

gnzlbg (Aug 12 2019 at 12:14, on Zulip):

yep

gnzlbg (Aug 12 2019 at 12:14, on Zulip):

so what we need is Uninit<u8> instead, which is always uninitialized

gnzlbg (Aug 12 2019 at 12:15, on Zulip):

and therefore never needs to be copied

gnzlbg (Aug 12 2019 at 12:15, on Zulip):

(or some other type like that)

rkruppe (Aug 12 2019 at 12:15, on Zulip):

"always uninitialized" is still a very weird notion as discussion elsewhere previously

gnzlbg (Aug 12 2019 at 12:16, on Zulip):

these are always initialized, but they only have one valid representation, and that's 0xUU

rkruppe (Aug 12 2019 at 12:16, on Zulip):

That can't work out, it's allowed to write to padding, it just doesn't get preserved on typed copies. If 0xUU was the only valid bit string then e.g. memset followed by a typed copy would be UB.

gnzlbg (Aug 12 2019 at 12:16, on Zulip):

indeed

gnzlbg (Aug 12 2019 at 12:18, on Zulip):

bad idea - if we make it a normal type, then typed copies would need to copy it

rkruppe (Aug 12 2019 at 12:18, on Zulip):

I also don't see the motivation for trying to explicitly define padding this way (or in any other explicit way) when it can fall out nicely as a side effect of other definitions (repr relation)

gnzlbg (Aug 12 2019 at 12:19, on Zulip):

i'm not sure how to differentiate padding from niche in the repr relation

gnzlbg (Aug 12 2019 at 12:20, on Zulip):

for padding everything is valid, while niches are a sub-set of invalid relations

rkruppe (Aug 12 2019 at 12:21, on Zulip):

Niches aren't ignored by the repr relation, padding is. e.g. is a byte can only be 0x00 or 0x01 for the byte list to represent a value of T, then there's a (potential) niche there. If the value of the byte is completely irrelevant for the value being represented, then it's padding.

gnzlbg (Aug 12 2019 at 14:56, on Zulip):

@rkruppe so a u16 that can only be 0 or 1, has a niche

gnzlbg (Aug 12 2019 at 14:56, on Zulip):

does it also have a byte of padding ?

gnzlbg (Aug 12 2019 at 14:58, on Zulip):

i'd say no, because the upper byte must always be zero

gnzlbg (Aug 12 2019 at 15:02, on Zulip):

but then one can't put that type at an alignment of 1

gnzlbg (Aug 12 2019 at 15:04, on Zulip):

AFAICT one can't easily construct such a value with an alignment of 1

gnzlbg (Aug 12 2019 at 15:06, on Zulip):

so I was wondering if there was a way to have a Padding type that one could use to explicitly add padding, and enable that

gnzlbg (Aug 12 2019 at 15:06, on Zulip):

Such a type would be useful for repr(C)

gnzlbg (Aug 12 2019 at 15:09, on Zulip):

The problem I see is that for typed copies of the type to not copy anything it would need to have size 0, but then it cannot increase the size of the type

gnzlbg (Aug 12 2019 at 15:09, on Zulip):

We currently use, e.g., [...; 0] types to insert padding to raise the alignment of the next field

gnzlbg (Aug 12 2019 at 15:10, on Zulip):

but that's more implicit and hard to discover than just inserting a Padding field

gnzlbg (Aug 12 2019 at 15:12, on Zulip):

such a Padding type would have non-zero size, but none of its bytes would be part of the value it represents

gnzlbg (Aug 12 2019 at 15:12, on Zulip):

it represents no value

gnzlbg (Aug 12 2019 at 17:00, on Zulip):

(more like it would only represent a single value, like ())

RalfJ (Aug 12 2019 at 17:56, on Zulip):

i'm not sure how to differentiate padding from niche in the repr relation

what is the problem?
niche: byte lists that are not valid for any value
padding: bytes that you can change arbitrarily without affecting the value for which the overall byte list is valid

gnzlbg (Aug 12 2019 at 18:26, on Zulip):

@RalfJ is padding the fundamental primitive ?

gnzlbg (Aug 12 2019 at 18:28, on Zulip):

i mean, we have bytes, and then we have padding bytes, which are not like normal bytes

RalfJ (Aug 12 2019 at 18:29, on Zulip):

no, they are not fundamental in any way

gnzlbg (Aug 12 2019 at 18:30, on Zulip):

whats fundamental is the value relation

RalfJ (Aug 12 2019 at 18:30, on Zulip):

this is all following what I laid down in https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/value-domain.md

gnzlbg (Aug 12 2019 at 18:30, on Zulip):

and bytes that do not affect it follow from it

RalfJ (Aug 12 2019 at 18:30, on Zulip):

the notion of typed copy I describe there handles padding correctly, I think

gnzlbg (Aug 12 2019 at 18:30, on Zulip):

and we just call them padding bytes

RalfJ (Aug 12 2019 at 18:31, on Zulip):

specifically, for a type like (u8, u16), with abstract value say Tuple([42, 1337])

RalfJ (Aug 12 2019 at 18:31, on Zulip):

it will, when writing, pick (non-deterministically) any value for the padding byte

RalfJ (Aug 12 2019 at 18:31, on Zulip):

so it's not actually saying they become 0xUU, it says they could become anything

RalfJ (Aug 12 2019 at 18:31, on Zulip):

but that is indistinguishable in the program from saying they become 0xUU

gnzlbg (Aug 12 2019 at 18:32, on Zulip):

that is, it is not that only 0xUU is validfor padding

RalfJ (Aug 12 2019 at 18:32, on Zulip):

it will, when writing, pick (non-deterministically) any value for the padding byte

the reason for this is that no matter what the padding byte says, the resulting byte list is related to the abstract value we are writing

RalfJ (Aug 12 2019 at 18:33, on Zulip):

Tuple([0, 0]) is easier

RalfJ (Aug 12 2019 at 18:33, on Zulip):

the set of 4-byte lists related to that is the set consisting of all [Raw(0), X, Raw(0), Raw(0)] for any X

RalfJ (Aug 12 2019 at 18:34, on Zulip):

well technically the way I wrote it, it could also pick pointer bytes that, when cast to an int, produce 0... hm, interesting.^^ that is an orthogonal issue though.

gnzlbg (Aug 12 2019 at 19:06, on Zulip):

Is it important to be able to talk about X as if they are a special value that a byte can take ?

gnzlbg (Aug 12 2019 at 19:07, on Zulip):

we say today that a byte can be {0..256, UU}, but could we extend that to also say {0..256, UU, X} ?

gnzlbg (Aug 12 2019 at 19:08, on Zulip):

where X is "doesn't matter"

RalfJ (Aug 12 2019 at 19:16, on Zulip):

Is it important to be able to talk about X as if they are a special value that a byte can take ?

X is a mathematical variable here, quantifying over all values of my Byte type

RalfJ (Aug 12 2019 at 19:17, on Zulip):

where X is "doesn't matter"

that seems like an unnecessary complication to me. I dont even see how it would behave differently from UU

RalfJ (Aug 12 2019 at 19:17, on Zulip):

we say today that a byte can be {0..256, UU}

that's incomplete, it can also be a pointer fragment (carrying provenance)

gnzlbg (Aug 13 2019 at 13:58, on Zulip):

that seems like an unnecessary complication to me. I dont even see how it would behave differently from UU

I don't either

gnzlbg (Aug 13 2019 at 13:58, on Zulip):

yet we don't have a way to use UU in Rust like that

gnzlbg (Aug 13 2019 at 13:58, on Zulip):

right?

gnzlbg (Aug 13 2019 at 13:59, on Zulip):

e.g. MaybeUninit<u8> wouldn't be it, because it requires the byte to be copied

RalfJ (Aug 13 2019 at 20:31, on Zulip):

yet we don't have a way to use UU in Rust like that

I don't follow. As I said above I believe I have fully described padding in my existing frame, the way I defined Byte and Value in my WIP documents. What do you think is missing from that?

gnzlbg (Aug 14 2019 at 12:42, on Zulip):

The ability for a user to say "at offset y there are N bytes of padding"

gnzlbg (Aug 14 2019 at 12:42, on Zulip):

Or at least, in your model, I wouldn't know how to do that

gnzlbg (Aug 14 2019 at 12:43, on Zulip):

E.g. If I have a #[repr(C)] struct S(u16, u16) and I want to insert 3 bytes of padding between both u16s, how would I do that ?

RalfJ (Aug 14 2019 at 15:56, on Zulip):

The ability for a user to say "at offset y there are N bytes of padding"

I think that is a meaningless statement

RalfJ (Aug 14 2019 at 15:56, on Zulip):

padding isnt a thing that exists in memory

RalfJ (Aug 14 2019 at 15:56, on Zulip):

just like you cant say "there's a bool here" (in memory)

RalfJ (Aug 14 2019 at 15:56, on Zulip):

or do you mean "at offset y in type T, ..." -- so, stating a property of the type, not some piece of data? in that case I proposed a definition that makes that work in some UCG issue

gnzlbg (Aug 14 2019 at 19:41, on Zulip):

or do you mean "at offset y in type T, ..."

that's what I meant

gnzlbg (Aug 14 2019 at 19:43, on Zulip):

ideally, to state that property I'd write #[repr(C)] struct S(u16, Pad, Pad, Pad, u16) but that would mean that I need a Pad type that only contains padding

gnzlbg (Aug 14 2019 at 19:44, on Zulip):

that would be a type that has only one value, but has, e.g., size 1

RalfJ (Aug 14 2019 at 19:45, on Zulip):

"contains only padding" is an ill-defined concept

RalfJ (Aug 14 2019 at 19:45, on Zulip):

Pad is the same as MaybeUninit<u8>

RalfJ (Aug 14 2019 at 19:45, on Zulip):

it is a byte that can have any value

gnzlbg (Aug 14 2019 at 19:45, on Zulip):

not really

gnzlbg (Aug 14 2019 at 19:45, on Zulip):

one must copy all bytes of a MaybeUninit<u8> on a typed copy

RalfJ (Aug 14 2019 at 19:45, on Zulip):

there is no reason to reify the concept of padding into the abstract machine

gnzlbg (Aug 14 2019 at 19:45, on Zulip):

the whole point of pad would be not doing that

RalfJ (Aug 14 2019 at 19:45, on Zulip):

one must copy all bytes of a MaybeUninit<u8> on a typed copy

ah I see. but that is different from "contains only padding".

RalfJ (Aug 14 2019 at 19:46, on Zulip):

for that you want a type which has a trivial value representation but accepts any byte list

gnzlbg (Aug 14 2019 at 19:46, on Zulip):

the libc crate is literally full of _padding: u8

RalfJ (Aug 14 2019 at 19:46, on Zulip):

like, () of size > 0

gnzlbg (Aug 14 2019 at 19:46, on Zulip):

yeah

RalfJ (Aug 14 2019 at 19:46, on Zulip):

yes that is also easy to define in my framework

RalfJ (Aug 14 2019 at 19:47, on Zulip):

the type Pad is defined as:
Value Tuple([]) (the empty tuple) is related to any byte-list of length 1.
end of definition.

RalfJ (Aug 14 2019 at 19:47, on Zulip):

this is a type that accepts any value (validity invariant is trivial) and where a typed copy transports no information (because it goes through Tuple([]), a singleton)

RalfJ (Aug 14 2019 at 19:48, on Zulip):

I finally understood what you mean by "contains only padding". it's not about the values it accepts / the validity invariant. it is about the behavior on a typed copy.

gnzlbg (Aug 14 2019 at 19:48, on Zulip):

we currently doesn't really have a way to specify that in the language

RalfJ (Aug 14 2019 at 19:48, on Zulip):

Pad and MaybeUninit<u8> have the same validity invariant

RalfJ (Aug 14 2019 at 19:48, on Zulip):

we currently doesn't really have a way to specify that in the language

well we do in the meta-language that we are speaking right now

RalfJ (Aug 14 2019 at 19:49, on Zulip):

but we dont in Rust, right

gnzlbg (Aug 14 2019 at 19:49, on Zulip):

i guess that's what i meant with "a MaybeUninit<u8> that's always uninitialzied"

RalfJ (Aug 14 2019 at 19:49, on Zulip):

yeah. that wording tripped me because "is always X" sounds a lot like you want to change the validity invariant.

RalfJ (Aug 14 2019 at 19:49, on Zulip):

"a bool is always true or false"

gnzlbg (Aug 14 2019 at 19:49, on Zulip):

so with such a Pad type, maybe we could improve some of the examples

RalfJ (Aug 14 2019 at 19:49, on Zulip):

but here this is not at all what you want so that wording was misleading

RalfJ (Aug 14 2019 at 19:50, on Zulip):

but now thta we cleared this, we should write this down somewhere :D

gnzlbg (Aug 14 2019 at 19:50, on Zulip):

e.g. if such a Pad type were present in libcore, we can write some examples as ... is equivalent to ... struct using Pad

RalfJ (Aug 14 2019 at 19:50, on Zulip):

yes

gnzlbg (Aug 14 2019 at 19:50, on Zulip):

and then we don't talk about "padding" anymore

gnzlbg (Aug 14 2019 at 19:50, on Zulip):

but about Pad

RalfJ (Aug 14 2019 at 19:50, on Zulip):

we could add a section "padding" to the glossary, which does nothing but define that type?

gnzlbg (Aug 14 2019 at 19:51, on Zulip):

that sounds like a good idea

RalfJ (Aug 14 2019 at 19:51, on Zulip):

and then we can basically use padding as synonym for [Pad; N]

gnzlbg (Aug 14 2019 at 19:51, on Zulip):

a union then always has a variant of type [Pad; N]

RalfJ (Aug 14 2019 at 19:52, on Zulip):

why that?

RalfJ (Aug 14 2019 at 19:52, on Zulip):

seems like this doesnt change anything

gnzlbg (Aug 14 2019 at 19:52, on Zulip):

ah no, it is not necessary

RalfJ (Aug 14 2019 at 19:52, on Zulip):

its more like, a union uses two [Pad; N] (one before, one after the field) to fill each variant to the full size

gnzlbg (Aug 14 2019 at 19:52, on Zulip):

we can just say that typed copies of union do not copy bytes at offsets where all variants have a Pad

RalfJ (Aug 14 2019 at 19:52, on Zulip):

i.e., what I had in my "picture"^^

gnzlbg (Aug 14 2019 at 19:53, on Zulip):

yep

gnzlbg (Aug 14 2019 at 19:53, on Zulip):

rules are the same

RalfJ (Aug 14 2019 at 19:53, on Zulip):

we can just say that typed copies of union do not copy bytes at offsets where all variants have a Pad

well here I'd slow down a bit, as that's a syntactic def.n

RalfJ (Aug 14 2019 at 19:53, on Zulip):

"where in this type is padding" would become a part of the ABI

gnzlbg (Aug 14 2019 at 19:53, on Zulip):

isn't it equivalent to the one we have ?

RalfJ (Aug 14 2019 at 19:53, on Zulip):

and this is "interesting" for enums

RalfJ (Aug 14 2019 at 19:54, on Zulip):

like, Result<(u8, u16), (u16, u8)> has different padding depending on the active variant...

RalfJ (Aug 14 2019 at 19:54, on Zulip):

at least if you follow my typed copy rules

gnzlbg (Aug 14 2019 at 19:54, on Zulip):

indeed

gnzlbg (Aug 14 2019 at 19:55, on Zulip):

"where in this type is padding" is kind of already part of the type ABI

gnzlbg (Aug 14 2019 at 19:55, on Zulip):

even if nothing there mentions padding

RalfJ (Aug 14 2019 at 19:55, on Zulip):

well... no I dont agree. it's pat of the value representation relation. but so far we didnt want to make that part of the ABI.

gnzlbg (Aug 14 2019 at 19:58, on Zulip):

so i think the value relation is part of the ABI of the type

gnzlbg (Aug 14 2019 at 19:58, on Zulip):

you can observe it

RalfJ (Aug 14 2019 at 19:58, on Zulip):

I dont think so. the representation relation is extremely language-specific. we don't want to have to sync that between Rust and C.

gnzlbg (Aug 14 2019 at 19:58, on Zulip):

e.g. get a value of a type, inspect its bytes, move it around, and see which bytes don't change

gnzlbg (Aug 14 2019 at 19:59, on Zulip):

so you can at least infer that some bytes are not part of the value relation of a type

RalfJ (Aug 14 2019 at 19:59, on Zulip):

but ultimately this is a rather arbitrary choice of terminology

gnzlbg (Aug 14 2019 at 19:59, on Zulip):

due to how typed copies work on the type

RalfJ (Aug 14 2019 at 19:59, on Zulip):

what worries me more is how to define unions appropriately. it'll be really ugly and really sad :(

gnzlbg (Aug 14 2019 at 20:00, on Zulip):

time to sleep :P

RalfJ (Aug 14 2019 at 20:00, on Zulip):

;)

Last update: Nov 19 2019 at 17:35UTC