Stream: t-lang/wg-unsafe-code-guidelines

Topic: validity of booleans


gnzlbg (Jan 13 2019 at 10:38, on Zulip):

@rkruppe i think we agree

gnzlbg (Jan 13 2019 at 10:39, on Zulip):

when calling C functions rust functions would just need to adjust bools

gnzlbg (Jan 13 2019 at 10:41, on Zulip):

my point is that because calling convention compatibility is intended, we have to provide that as well, and that prevent us from saying anything about the values of true or false

rkruppe (Jan 13 2019 at 10:42, on Zulip):

How booleans are encoded when passed in the calling convention is not observable for programs

gnzlbg (Jan 13 2019 at 10:44, on Zulip):

we can't use that information to say anything about the values of true and false, since C code can write _Bools via a pointer into Rust memory expecting bools

rkruppe (Jan 13 2019 at 10:45, on Zulip):

When Rust and C (or C and C, or Rust and Rust) communicate through memory, then the memory representation is relevant and calling convention isn't

gnzlbg (Jan 13 2019 at 10:45, on Zulip):

and the memory representation of _Bool is unspecified

gnzlbg (Jan 13 2019 at 10:46, on Zulip):

while the calling convention is (I was hoping that we could use the calling convention to make the values of true and false implementation defined)

rkruppe (Jan 13 2019 at 10:46, on Zulip):

Ok I see why you brought up calling conventions

gnzlbg (Jan 13 2019 at 10:46, on Zulip):

clang and gcc are slightly against specifying anything about the memory representation of _Bool

gnzlbg (Jan 13 2019 at 10:47, on Zulip):

while independently of whether that changes, they have to adhere to the calling convention

rkruppe (Jan 13 2019 at 10:47, on Zulip):

But for the same reason you can have memory layout compatibility without calling convention capability, knowing how bools are passed around wouldn't give you memory layout compatibility

rkruppe (Jan 13 2019 at 10:47, on Zulip):

And also, while the C standard is vague on matters of memory layout, it says even less (nothing at all, I believe) about calling conventions

gnzlbg (Jan 13 2019 at 10:48, on Zulip):

the C standard says nothing at all about calling conventions (the platform ABIs do say a lot though)

rkruppe (Jan 13 2019 at 10:48, on Zulip):

And the platform ABIs also have to describe memory layout as well, don't they?

gnzlbg (Jan 13 2019 at 10:48, on Zulip):

the problem is that we'd like to have bool's with two valid values, true and false - i don't see how that can work given the definition that bool == _Bool

gnzlbg (Jan 13 2019 at 10:49, on Zulip):

@rkruppe i haven't checked, would make sense for IPC

rkruppe (Jan 13 2019 at 10:49, on Zulip):

Forget IPC, you need to be to share data structures in the same process as well

rkruppe (Jan 13 2019 at 10:49, on Zulip):

Certainly struct layout computation is part of the ABIs, so I don't see why primitive layout wouldn't be either

rkruppe (Jan 13 2019 at 10:50, on Zulip):

the problem is that we'd like to have bool's with two valid values, true and false - i don't see how that can work given the definition that bool == _Bool

yeah this is a problem but one unrelated to calling conventions, do you agree?

gnzlbg (Jan 13 2019 at 10:50, on Zulip):

I can't find that for _Bool in the SysV AMD64 ABI spec

gnzlbg (Jan 13 2019 at 10:50, on Zulip):

i only find there the specification of bool for function arguments and return values

nagisa (Jan 13 2019 at 10:51, on Zulip):

@gnzlbg C-the-standard has fairly strict definition for _Bool only allowing it to contain two distinct values.

rkruppe (Jan 13 2019 at 10:52, on Zulip):

@gnzlbg page 14 of https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

Booleans, when stored in a memory object, are stored as single byte objects the value of which is always 0 (false) or 1 (true).

nagisa (Jan 13 2019 at 10:53, on Zulip):

I remember looking into making C compiler store something non-1 or non-0 into a _Bool. It was extremely difficult

nagisa (Jan 13 2019 at 10:53, on Zulip):

and ended up being UB in C as well.

gnzlbg (Jan 13 2019 at 10:53, on Zulip):

The C standard definition of _Bool is pretty loose, a lot of it is unspecified.

nagisa (Jan 13 2019 at 10:54, on Zulip):

a lot is, but what is specificed is fairly compatible with our own bool. Adding specifications from the ABI documents you get the complete picture.

gnzlbg (Jan 13 2019 at 10:54, on Zulip):

@nagisa with the ABI documents you get the complete picture, which is that bool is pretty much implementation-defined

rkruppe (Jan 13 2019 at 10:55, on Zulip):

Knowledge of specific C implementations is all good and well and I think it's quite likely that all the ones we care about in practice are compatible with what rustc does today; the real problem is hypothetical standard-conforming implementations that aren't

rkruppe (Jan 13 2019 at 10:55, on Zulip):

But since these are very hypothetical I care very little so I'm going to peace out if/when this discussion turns to language lawyering again

nagisa (Jan 13 2019 at 10:56, on Zulip):

I disagree. _Bool is ABI-specific, but not implementation specific. Multiple implementors of the same ABI will implement _Bool the same way.

rkruppe (Jan 13 2019 at 10:56, on Zulip):

I don't really see that as a meaningful distinction

gnzlbg (Jan 13 2019 at 10:56, on Zulip):

The issue is that saying that bool has two valid values, true and false, is incompatible with saying that bool == _Bool.

rkruppe (Jan 13 2019 at 10:56, on Zulip):

Multiple implementations can and do agree with each other to get interoperability

gnzlbg (Jan 13 2019 at 10:57, on Zulip):

I'd be fine with saying that Rust does not support platforms in which _Bool has more than two valid values

nagisa (Jan 13 2019 at 10:57, on Zulip):

The C standard specifies that _Bool may contain two bit patterns, one for true and another for false. It is hidden in a fairly obscure manner, but it is there. (Oh and the patterns are not exactly specified IIRC, but that’s not too much of an issue)

gnzlbg (Jan 13 2019 at 10:58, on Zulip):

@nagisa because of padding bits that's not true

gnzlbg (Jan 13 2019 at 10:58, on Zulip):

https://github.com/rust-rfcs/unsafe-code-guidelines/pull/63#issue-241483785

nagisa (Jan 13 2019 at 11:00, on Zulip):

@nagisa because of padding bits that's not true

are you saying an equivalent of uint16_t has more than 2¹⁶-1 bit patterns because of padding bits it may have coming after it?

nagisa (Jan 13 2019 at 11:01, on Zulip):

/me drops the topic.

gnzlbg (Jan 13 2019 at 11:02, on Zulip):

Are the fixed-width integer types allowed to have padding bits ?

nagisa (Jan 13 2019 at 11:04, on Zulip):

Sure, you can have data layout to specify alignment for integer larger than its native size.

nagisa (Jan 13 2019 at 11:04, on Zulip):

I believe there is at least one architecture where 8-bit integer is actually a 16-bit one, with 8 padding bits.

gnzlbg (Jan 13 2019 at 11:06, on Zulip):

The whole point of padding bits is that they are ignored, so an uint16_t with ignored padding bits would have more than 2¹⁶-1 bit patterns because it would need to be larger than 16 bit

gnzlbg (Jan 13 2019 at 11:06, on Zulip):

the question is whether it has more than 2¹⁶-1 values, which it does not AFAICT - since the padding bits are ignored, different bit-patterns represent the same value

gnzlbg (Jan 13 2019 at 11:07, on Zulip):

that's the problem with a bool where the bit 0 encodes true and false, and e.g., bits 1-7 are ignored. It has 256 bit-patterns, but only two valid values, the problem is that multiple bit-patterns represent the same value

Nicole Mazzuca (Jan 13 2019 at 19:18, on Zulip):

@gnzlbg why do we keep bringing this up

Nicole Mazzuca (Jan 13 2019 at 19:18, on Zulip):

zero platforms do anything but the sane thing

Nicole Mazzuca (Jan 13 2019 at 19:18, on Zulip):

until someone comes up with a platform, let's not care about it

gnzlbg (Jan 14 2019 at 07:41, on Zulip):

@Nicole Mazzuca because @RalfJ would like to guarantee that bool has two valid values: true and false, but AFAICT that's incompatible with our definition that bool == _Bool, so we can't write the spec to be as simple as that

gnzlbg (Jan 14 2019 at 07:43, on Zulip):

We have to write that bool has as many valid values as _Bool, and add a "note: on all platforms that Rust currently support, bool only has two valid values, true (0x1) and false (0x0)", or similar

Nicole Mazzuca (Jan 14 2019 at 07:47, on Zulip):

_Bool has only two valid values

Nicole Mazzuca (Jan 14 2019 at 07:47, on Zulip):

that is guaranteed by the C standard

Nicole Mazzuca (Jan 14 2019 at 07:48, on Zulip):

the representation is not guaranteed

Nicole Mazzuca (Jan 14 2019 at 07:48, on Zulip):

all zeroes is guaranteed to be 0 (aka false)

gnzlbg (Jan 14 2019 at 08:26, on Zulip):

@Nicole Mazzuca AFAICT _Bool has only one bit of valid values, and whether all other bits are trap representations or padding bits is unspecified. That is, a _Bool where one bit denotes true or false, and all other bits are ignored, is standard compliant AFAICT. That results in > 2 bit-patterns that denote either true or false.

Nicole Mazzuca (Jan 14 2019 at 08:28, on Zulip):

that is not my understanding - however, I also think it's unreasonable to attempt to be compatible with anything that breaks this very basic assumption, since no system that Rust will work on does

Nicole Mazzuca (Jan 14 2019 at 08:28, on Zulip):

and no system which will be created in the future will either

gnzlbg (Jan 14 2019 at 08:29, on Zulip):

FWIW I filled a bug in both gcc and clang, and one possible solution for GCC is to offer an intrinsic that lets you query whether _Bool has padding bits

gnzlbg (Jan 14 2019 at 08:29, on Zulip):

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88662

gnzlbg (Jan 14 2019 at 14:28, on Zulip):

@RalfJ C does have a notion of a validity invariant, it is just not the same one as Rust

gnzlbg (Jan 14 2019 at 14:29, on Zulip):

In C, an lvalue expression involving a value that is not "valid" is UB

gnzlbg (Jan 14 2019 at 14:29, on Zulip):

So while one can materialize a _Bool with an invalid bit-pattern, that _Bool cannot participate in an lvalue expression (e.g. an assignment)

RalfJ (Jan 22 2019 at 12:15, on Zulip):

@gnzlbg

In C, an lvalue expression involving a value that is not "valid" is UB

can you point me at the part of the standard specifying this?

RalfJ (Jan 22 2019 at 12:16, on Zulip):

I guess aside from bool there's little C would consider "invalid" so the question doesn't come up very often

gnzlbg (Jan 22 2019 at 13:39, on Zulip):

It comes from the definition of trap representation - http://port70.net/~nsz/c/c11/n1570.html#6.2.6.1p5 (emphasis mine)

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 50) Such a representation is called a trap representation.

And the footnote: http://port70.net/~nsz/c/c11/n1570.html#note50

Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it.

That is, one can construct an invalid value, but there is very little that one can do with that value without invoking undefined behavior (e.g. an assignment from an invalid value bool a = invalid_bool is UB - but reading and writing bytes to it via a char* is ok).

Note that trap representations are different from _indeterminate_ representations (which might represent a value of the object type - it is just unknown which).

Last update: Nov 19 2019 at 18:15UTC