Stream: t-lang/wg-unsafe-code-guidelines

Topic: MaybeUninit and FFI


DutchGhost (Jul 30 2019 at 15:40, on Zulip):

Out of interest, I took a look at https://github.com/seanmonstar/num_cpus/blob/master/src/lib.rs .
I noticed a couple of mem::uninitialized() and mem::zeroed() calls, and wondered why MaybeUninit isn't used instead.

One answer I could come up with is that MaybeUninit isn't FFI - safe, but Im not sure. Could anyone clarify if it is FFI - safe, and if there are any niches like "Its just as FFI safe as the T it holds inside"?

(also, are the uses of mem::uninitialized() and mem::zeroed() sound in num_cpu's? or should they switch asap?)

gnzlbg (Jul 30 2019 at 16:42, on Zulip):

I noticed a couple of mem::uninitialized() and mem::zeroed() calls, and wondered why MaybeUninit isn't used instead.

Is that code older than 2 weeks ? If so, then the answer is probably that MaybeUninit was not available in stable or did not even exist when that code was written.

gnzlbg (Jul 30 2019 at 16:43, on Zulip):

The code appears to be 2 years old.

gnzlbg (Jul 30 2019 at 16:43, on Zulip):

So the answer is probably that MaybeUninit did not exist when that code was written.

DutchGhost (Jul 30 2019 at 17:12, on Zulip):

It could be, although there was a commit last month.
Anyway, I made a pr for it.

RalfJ (Jul 30 2019 at 18:25, on Zulip):

also, are the uses of mem::uninitialized() and mem::zeroed() sound in num_cpu's?

the ones for zeroed are likely fine, I guess cpu_set_t is just an integer type.
the ones for uninitialized are not fine under the current strict rules.

gnzlbg (Jul 31 2019 at 07:55, on Zulip):

It could be, although there was a commit last month.

You can right click on the line on github and click on "blame", it shows that some of the lines you mention where last modified 2 years ago.

Anyway, I made a pr for it.

Using MaybeUninit will bump the minimum supported Rust version to the latest one, which is a major semver breaking change and means that the package won't be buildable on Debian, Ubuntu, etc. when using the system toolchain

gnzlbg (Jul 31 2019 at 07:57, on Zulip):

num_cpus is a super popular package, I would recommend being careful when rolling these kind of breaking changes

RalfJ (Jul 31 2019 at 08:04, on Zulip):

which is a major semver breaking change

I don't think there is consensus on whether bumping the MSRV is semver-breaking or not

gnzlbg (Jul 31 2019 at 08:38, on Zulip):

Facts do not need consensus.

gnzlbg (Jul 31 2019 at 08:40, on Zulip):

I haven't seen anybody claim that bumping the MSRV does not break Rust code, and it is a claim that is trivial to refute.

gnzlbg (Jul 31 2019 at 08:42, on Zulip):

What there is no consensus about is whether bumping the MSRV is a change that should require a major semver version bump. But that has nothing to do whether the change is breaking.

Some people were complaining on reddit last month that their code broke because a new method was added to the standard library.

gnzlbg (Jul 31 2019 at 08:43, on Zulip):

A lot of people were trying to explain to them that those changes are not breaking, yet most of them did not get through to those users because they were pointing out the fact that their code is correct and was compiling yesterday but does not compile today anymore.

gnzlbg (Jul 31 2019 at 08:44, on Zulip):

Firefox was holding the latest Debian release back because it did not compile with the toolchain it was using.

gnzlbg (Jul 31 2019 at 08:47, on Zulip):

Our non-breaking changes policies aren't really policies about what are breaking changes and what are not. They are policies about which code we are ok with breaking. That does not mean that changes break no code, and bumping num_cpus to the latest released Rust version to fix undefined behavior would mean that Ubuntu, Debian, etc. cannot package that version of the crate because it does not compile with their toolchains. So if you wanted to make a big impact by fixing undefined behavior, doing it in such a way that cannot be distributed achieves very little short term.

RalfJ (Jul 31 2019 at 08:48, on Zulip):

Facts do not need consensus.

you are making a particular interpretation of semver here. there's nothing factual about that.

RalfJ (Jul 31 2019 at 08:48, on Zulip):

I think it is a perfectly reasonable stanza to say "my crate supports the latest 2 stable releases of Rust, and if a change only breaks with older compilers, that's not semver-breaking"

RalfJ (Jul 31 2019 at 08:49, on Zulip):

I have seen such discussions come up several times. Not everyone shares your interpretation of semver, and pretending that it is the only interpretation won't change that.

gnzlbg (Jul 31 2019 at 08:49, on Zulip):

I didn't suggest to bump the major semver version anywhere

gnzlbg (Jul 31 2019 at 08:50, on Zulip):

I mentioned the fact that bumping the MSRV version breaks code

RalfJ (Jul 31 2019 at 08:50, on Zulip):

"semver breaking" is defined as what triggers a major semver bump

RalfJ (Jul 31 2019 at 08:50, on Zulip):

saying something is semver breaking but doesnt cause a bump is self-contradicting

gnzlbg (Jul 31 2019 at 08:50, on Zulip):

then I retract that, you are right that there are many interpretations about what semver means

RalfJ (Jul 31 2019 at 08:51, on Zulip):

I specifically said semver breaking, not breaking. Causing inference failure by implementing a new trait is also definitely "breaking" code. it is not semver breaking though.

gnzlbg (Jul 31 2019 at 08:51, on Zulip):

what I meant to say is that bumping the MSRV often breaks a lot of code

gnzlbg (Jul 31 2019 at 08:51, on Zulip):

I tried bumping the MSRV of libc crate in march to Rust 1.20

gnzlbg (Jul 31 2019 at 08:51, on Zulip):

I had to revert the change

gnzlbg (Jul 31 2019 at 08:51, on Zulip):

because a lot of projects broke

gnzlbg (Jul 31 2019 at 08:51, on Zulip):

we are in 2019, and libc targets Rust 1.13.0

RalfJ (Jul 31 2019 at 08:52, on Zulip):

it breaks code for users of old compilers. not sure if that means the code is broken. but I agree with the gist.

gnzlbg (Jul 31 2019 at 08:52, on Zulip):

it makes code that used to compile not compile anymore

RalfJ (Jul 31 2019 at 08:52, on Zulip):

with old compilers

gnzlbg (Jul 31 2019 at 08:52, on Zulip):

you might not consider this as breaking, because old compilers

gnzlbg (Jul 31 2019 at 08:52, on Zulip):

other people say it broke my bash script, therefore is breaking

RalfJ (Jul 31 2019 at 08:53, on Zulip):

I guess all I am saying is "broken" is a compiler-version-relative notion here. but in the end for a user stuck on an old compiler that doesn't help, which is why I agree with your sentiment, just not with all of your terminology.

gnzlbg (Jul 31 2019 at 08:53, on Zulip):

calling it "semver breaking" is controversial, depends on who is doing the releases and who is consuming them, and what there opinions are

gnzlbg (Jul 31 2019 at 08:54, on Zulip):

we do automatically check the libc crate on CI for all API semver breaking changes, and we do break semver every release

gnzlbg (Jul 31 2019 at 08:54, on Zulip):

because it is our opinion that some changes do not break as much code as others

gnzlbg (Jul 31 2019 at 08:55, on Zulip):

we have "technical semver breaking changes" for all changes that should technically require a semver major bump because they break some code

RalfJ (Jul 31 2019 at 08:55, on Zulip):

we do automatically check the libc crate on CI for all API semver breaking changes, and we do break semver every release

ever minor release, that is? or even patch release? (not sure if libc is 1.0 yet so the difference would matter)

gnzlbg (Jul 31 2019 at 08:55, on Zulip):

and we have "non-technical breaking changes" for ones that are breaking according to the "API guidelines"

gnzlbg (Jul 31 2019 at 08:55, on Zulip):

every patch release

gnzlbg (Jul 31 2019 at 08:55, on Zulip):

libc does not do minor or major releases

gnzlbg (Jul 31 2019 at 08:56, on Zulip):

(it can't without breaking the world)

gnzlbg (Jul 31 2019 at 08:57, on Zulip):

there is a tool that puts two version of your crate into a third crate, and compares the typed AST of both crate versions, and produces a diff

gnzlbg (Jul 31 2019 at 08:58, on Zulip):

the diff includes whether the change is non-breaking, or whether it can break code, whether the breakage requires a patch version, minor version, major version bump, etc.

gnzlbg (Jul 31 2019 at 08:58, on Zulip):

pretty much all changes that add new items can break code for people doing glob imports, for example

gnzlbg (Jul 31 2019 at 08:59, on Zulip):

https://github.com/rust-dev-tools/rust-semverver/ - that's the tool

DutchGhost (Jul 31 2019 at 09:40, on Zulip):

I didn't think about if it was a breaking change or not. I'm not the author of num_cpu's, so he can decide what to do with the pr I made.
But, isn't fixing UB somewhat allowed to be a breaking change?
And people who don't have a compiler that supports MaybeUninit on stable can always use a previous version, no?

RalfJ (Jul 31 2019 at 10:11, on Zulip):

@DutchGhost another way to fix the UB there might be to replace all undefined by zeroed. (I didn't check though if that is sufficient.)

centril (Jul 31 2019 at 17:16, on Zulip):

we are in 2019, and libc targets Rust 1.13.0

An unreasonable situation if you ask me... the number of people who still use 1.13.0 are how many?...

gnzlbg (Jul 31 2019 at 18:28, on Zulip):

As I said, I tried to bump libc to 1.20 or so in march, and a couple of people complained

gnzlbg (Jul 31 2019 at 18:29, on Zulip):

it isn't worth it for us to bump it to Rust 1.15 or something like that, because there isn't anything there that helps

gnzlbg (Jul 31 2019 at 18:30, on Zulip):

like, IIRC, the regex crate supports old Rust versions, because it is required to build ripgrep, which is shipped by linux distros fixed with old rust toolchains, and might still ship patches

gnzlbg (Jul 31 2019 at 18:30, on Zulip):

that crate depends on libc, so we can't bump libc beyond what it supports

gnzlbg (Jul 31 2019 at 18:32, on Zulip):

all of this might be much easier if libc could just release a breaking version

Lokathor (Aug 01 2019 at 03:50, on Zulip):

I'd like the rust community to be a lot more aware of minimum rust version and that bumping it should always go with a semver version break bump as well (0.y+1 for pre-1.0, or x+1 for post 1.0, etc)

unfortunately, the RFC for that seems to be stalled a bit :/

RalfJ (Aug 01 2019 at 07:28, on Zulip):

I think that would be a desaster

RalfJ (Aug 01 2019 at 07:29, on Zulip):

it would basically grind the adoption of new features to a halt -- libraries tend to avoid major version bumps, for good reasons

gnzlbg (Aug 01 2019 at 07:30, on Zulip):

But, isn't fixing UB somewhat allowed to be a breaking change?

Do you know that the behavior is undefined?

gnzlbg (Aug 01 2019 at 07:31, on Zulip):

AFAICT the behavior is unspecified for mem::uninitialized(), and the behavior of that code is defined for mem::zeroed(). Those are repr(C) types using integers and pointers, and all bytes zeroed is a valid representaiton for them.

gnzlbg (Aug 01 2019 at 07:31, on Zulip):

Using mem::uninitialized() for those types _might_ be ok, it definetely is ok with the current implementation.

RalfJ (Aug 01 2019 at 07:31, on Zulip):

Using mem::uninitialized() for those types _might_ be ok, it definetely is ok with the current implementation.

we consider it UB though

gnzlbg (Aug 01 2019 at 07:31, on Zulip):

Where?

RalfJ (Aug 01 2019 at 07:32, on Zulip):

and call it out as such in e.g. the MaybeUninit docs

gnzlbg (Aug 01 2019 at 07:32, on Zulip):

So the MaybeUninit docs makes it undefined behavior?

RalfJ (Aug 01 2019 at 07:32, on Zulip):

I dont really see what "unspecified" means here

RalfJ (Aug 01 2019 at 07:32, on Zulip):

not having decided that it is allowed makes it UB

gnzlbg (Aug 01 2019 at 07:32, on Zulip):

Either we explicitly say that it is UB, or it is unspecified.

RalfJ (Aug 01 2019 at 07:33, on Zulip):

"unspecified" works for things like layout where the compiler makes a choice but doesnt say which. but for steps of the abstract machine, if "unspecified" means "the machine can take any step", that is equivalent to UB.

gnzlbg (Aug 01 2019 at 07:33, on Zulip):

Whether uninitialized is a valid representation of an i8

gnzlbg (Aug 01 2019 at 07:34, on Zulip):

Is that a step in the abstract machine?

RalfJ (Aug 01 2019 at 07:34, on Zulip):

no the step(s) is "execute let x: u8 = mem::uninitialized()"

gnzlbg (Aug 01 2019 at 07:34, on Zulip):

that was an example of how to use mem::uninitialized() in their docs for a long time

RalfJ (Aug 01 2019 at 07:34, on Zulip):

Whether uninitialized is a valid representation of an i8

I see, that's how you see this. interesting.

gnzlbg (Aug 01 2019 at 07:35, on Zulip):

that code is 2 years old

RalfJ (Aug 01 2019 at 07:35, on Zulip):

this might work. so far the wording we (a few people including @centril and me) have been using is that these things are UB until we decided that they are not

RalfJ (Aug 01 2019 at 07:35, on Zulip):

and you did not object to me using that wording for https://doc.rust-lang.org/nightly/nomicon/

gnzlbg (Aug 01 2019 at 07:35, on Zulip):

I tend to think that everything is unspecified, and that if something is UB, it should be UB by design

gnzlbg (Aug 01 2019 at 07:36, on Zulip):

and have wording and rationale for it

gnzlbg (Aug 01 2019 at 07:36, on Zulip):

even if the wording is "we don't know what to do here, so this is UB for now"

RalfJ (Aug 01 2019 at 07:36, on Zulip):

that doesnt work, unless unspecified = UB

RalfJ (Aug 01 2019 at 07:36, on Zulip):

because then it would be a breaking change to make it UB later

RalfJ (Aug 01 2019 at 07:36, on Zulip):

for integers we have enough of an idea of the spec that we boiled it down to two options, so there we have the option of saying "A or B"

gnzlbg (Aug 01 2019 at 07:36, on Zulip):

well then let x: u8 = mem::uninitialized(); can't be UB without introducing a breaking change

RalfJ (Aug 01 2019 at 07:37, on Zulip):

but in other cases that we explored less, we are nowhere close so that

gnzlbg (Aug 01 2019 at 07:37, on Zulip):

because that's the minimum the mem::uninitialized docs showed for it

RalfJ (Aug 01 2019 at 07:37, on Zulip):

you are basically saying because Rust 1.0 didnt come with a full UB spec, we cannot make anything UB as it was all just unspecified?

RalfJ (Aug 01 2019 at 07:37, on Zulip):

this is not C, where we claim any kind of exhaustive standard or so

RalfJ (Aug 01 2019 at 07:37, on Zulip):

and AFAIK in C, the "default" when the standard imposes no requirement actually is UB

gnzlbg (Aug 01 2019 at 07:37, on Zulip):

It did come with docs

gnzlbg (Aug 01 2019 at 07:38, on Zulip):

and those docs do show that some things work

RalfJ (Aug 01 2019 at 07:38, on Zulip):

yes for mem::uninitialized mistakes were made

gnzlbg (Aug 01 2019 at 07:38, on Zulip):

breaking those things are breaking changes, I do think that breaking changes are OK

gnzlbg (Aug 01 2019 at 07:38, on Zulip):

its an engineering tradeoff whether they are worth it

gnzlbg (Aug 01 2019 at 07:39, on Zulip):

but given that we haven't really formulated anywhere what let x: u8 = mem::uninitialized(); does, and that in the UCG there seems to be consensus that this should work or otherwise too much code would break

RalfJ (Aug 01 2019 at 07:39, on Zulip):

but there's tons of other open questions around deref/offset of dangling pointers, int-ptr-casts, ptr comparison, and then all the aliasing stuff of course

gnzlbg (Aug 01 2019 at 07:39, on Zulip):

sending PRs to crates fixing that is unecessary churn

gnzlbg (Aug 01 2019 at 07:39, on Zulip):

and it creates "myths" that those things are UB, that might survive a future RFC that says they are not

gnzlbg (Aug 01 2019 at 07:41, on Zulip):

so sure, if let x: u8 = mem::uninitialized(); is UB, then we should fix those now, but we should argue why that is UB, what that UB buys us, and we should not in 3 months pass an RFC that says that's ok because people from the outside would rightfully think that we have no clue what we are doing

gnzlbg (Aug 01 2019 at 07:41, on Zulip):

if we are changing our minds about such "basic" things every 3 months

RalfJ (Aug 01 2019 at 07:49, on Zulip):

(sorry had to switch buses)

RalfJ (Aug 01 2019 at 07:50, on Zulip):

sending PRs to crates fixing that is unecessary churn

you mean specifically for uninitialized integers?

RalfJ (Aug 01 2019 at 07:50, on Zulip):

I'd rather have a myth that makes people not use mem::uninitialized than a myth that makes them use it

RalfJ (Aug 01 2019 at 07:50, on Zulip):

but I see your point

RalfJ (Aug 01 2019 at 07:51, on Zulip):

my thinking was: if it can become UB at any point in the future, that's equivalent to already being UB from a "I want my code to keep working" perspective

RalfJ (Aug 01 2019 at 07:51, on Zulip):

we should certainly encourage people to write new code using MaybeUninit

gnzlbg (Aug 01 2019 at 08:12, on Zulip):

Sure, for new code I agree, but there is a lot of code already, and changing it is not only a cost, it is also an opportunity to teach people something. I fear that we waste this opportunity if we are not sure of what the semantics we want to teach are.

gnzlbg (Aug 01 2019 at 08:13, on Zulip):

If these PR would say "Hey, let x: u8 = mem::uninitialized(); is "ok", but using MaybeUninit is safer and more idiomatic" I would be ok with that

gnzlbg (Aug 01 2019 at 08:14, on Zulip):

or even "Hey, we don't know whether let x: u8 = uninitialized(); is going to be "ok" or not right now, it might be, but either way, using MaybeUninit is safer and more idiomatic anyways, so prefer that".

gnzlbg (Aug 01 2019 at 08:14, on Zulip):

and I'm not only thinking about PRs here, but also about clippy lints, warnings, etc.

gnzlbg (Aug 01 2019 at 08:15, on Zulip):

we could be liniting mem::uninitialized and mem::zeroed already, but we should do so with care of avoiding confusing users

RalfJ (Aug 01 2019 at 08:47, on Zulip):

or even "Hey, we don't know whether let x: u8 = uninitialized(); is going to be "ok" or not right now, it might be, but either way, using MaybeUninit is safer and more idiomatic anyways, so prefer that".

that's the message I am usually going for

RalfJ (Aug 01 2019 at 08:48, on Zulip):

also see my question above, we are specifically talking about uninitialized integers here, right?

RalfJ (Aug 01 2019 at 08:49, on Zulip):

we could be liniting mem::uninitialized and mem::zeroed already, but we should do so with care of avoiding confusing users

I think we should start by linting (for both uninitialized and zeroed) for types that don't allow the all-0 bit pattern -- i.e., references, NonNull, NonZero*, fn. That's definitely UB, and it has already repeatedly caused issues in practice.

RalfJ (Aug 01 2019 at 08:49, on Zulip):

I wouldnt lint for things where the rules are still open, that seems odd

DutchGhost (Aug 01 2019 at 15:08, on Zulip):

But, isn't fixing UB somewhat allowed to be a breaking change?

Do you know that the behavior is undefined?

From what I understand from ralf's blog https://www.ralfj.de/blog/2019/07/14/uninit.html , is when you write mem::uninitialized(), and use the value as is, the compiler has the right to shoot you in the foot whenever it wants. (unless the type you create is fine being unitialized, but such types are zst's, and MaybeUninit itself currently). Correct me if I'm wrong, but that seem pretty undefined to me.
<br>

If these PR would say "Hey, let x: u8 = mem::uninitialized(); is "ok", but using MaybeUninit is safer and more idiomatic" I would be ok with that

I did exactly this: https://github.com/seanmonstar/num_cpus/pull/82

Also, mem::uninitialized() will be depricated in 1.39, so eventually there will be a a kinda soft-force to switch over, sooner or later.

RalfJ (Aug 01 2019 at 15:14, on Zulip):

the compiler has the right to shoot you in the foot whenever it wants. (unless the type you create is fine being unitialized, but such types are zst's, and MaybeUninit itself currently).
Correct me if I'm wrong, but that seem pretty undefined to me

the thing is, the compiler curently won't shoot you in the foot if it is an integer type

RalfJ (Aug 01 2019 at 15:14, on Zulip):

and there's a lot of code relying on that. old code from when there was just no other way to work with uninitialized memory.

RalfJ (Aug 01 2019 at 15:15, on Zulip):

now one could say that code is wrong and shouldnt have been written (like some people argued for code that has panics passing through FFI), but well, the code exists and we have to do the best we can to deal with that.

RalfJ (Aug 01 2019 at 15:15, on Zulip):

Whether let x: u8 = mem::uninitialized() should be UB is being discussed at https://github.com/rust-lang/unsafe-code-guidelines/issues/71

RalfJ (Aug 01 2019 at 15:16, on Zulip):

and my personal position is that no, it should not be UB. (this has changed since that discussion started more than half a year ago.)

RalfJ (Aug 01 2019 at 15:16, on Zulip):

so this is basically "UB for forwards-compatibility"

RalfJ (Aug 01 2019 at 15:16, on Zulip):

which is unlike e.g. let x: bool = mem::uninitialized() which is UB "for real" and does already cause problems and there is no intention to make this defined.

gnzlbg (Aug 01 2019 at 15:26, on Zulip):

I think that when sending PRs to libraries, you better make sure that the things are "hard" UB if they are not super clearly specified yet - for num_cpus, one needs to inspect the types involved, and see if they use bool, or fn or &T, and based on that make an assesment

gnzlbg (Aug 01 2019 at 15:26, on Zulip):

at least until the specification advances a bit

centril (Aug 01 2019 at 21:28, on Zulip):

I think we should start by linting (for both uninitialized and zeroed) for types that don't allow the all-0 bit pattern -- i.e., references, NonNull, NonZero*, fn. That's definitely UB, and it has already repeatedly caused issues in practice.

PRs welcome :slight_smile:

RalfJ (Aug 01 2019 at 21:36, on Zulip):

time machines welcome ;)

centril (Aug 01 2019 at 21:38, on Zulip):

@RalfJ I don't mean from you necessarily ;)

Last update: Nov 20 2019 at 12:45UTC