Stream: general

Topic: Library level Undefined Behavior


Elichai Turkel (Apr 10 2020 at 14:10, on Zulip):

Hi,
Let's say I have a library that provides MyBool https://play.rust-lang.org/?gist=04a65599a5fe5df5ed56264384015fe6

Is this a correct/idiomatic use of unsafe? there will be no language UB or memory unsafety if you use it, but it will violate the library API contract.

centril (Apr 10 2020 at 14:26, on Zulip):

@Elichai Turkel I would do enum Something { Yes, No }

Elichai Turkel (Apr 10 2020 at 14:27, on Zulip):

@centril You're right. I was trying to show a simple example, but I'm sure you can imagine a library that assumes some struct has only certain values, and my question is if the unchecked version of the constructor should be unsafe or not

centril (Apr 10 2020 at 14:31, on Zulip):

@Elichai Turkel only if the invariants that need to be upheld can cause UB if not

Elichai Turkel (Apr 10 2020 at 14:31, on Zulip):

UB at a language level or also at a library level?

centril (Apr 10 2020 at 14:42, on Zulip):

@Elichai Turkel UB only exists at the "language level"; it occurs if you violate the operational semantics of the language causing the abstract machine to throw an Err(_) (in the Miri sense)

Mark Drobnak (Apr 10 2020 at 14:58, on Zulip):

It might be better to panic if the setter receives a value which is unexpected (not 0 or 1 in this case). Panicking is idiomatic when the issue is "unrecoverable", such as indexing out of bounds (array[100], array.get(100) still returns an Option).
https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html

Hanna Kruppe (Apr 10 2020 at 15:33, on Zulip):

A library can certainly consider some of its invariants relevant for safety (and thus reserve the right to cause UB if the invariant is broken) even if it does not currently do anything that could lead to UB if the invariant is broken. However, unsafe should not be "diluted" by wantonly using it for things that aren't and never will be relevant to the prevention of UB. So unless you have a specific reason to expect you'll need an invariant like this, it's probably best to be cautious and not introduce unsafe just for fun.

RalfJ (Apr 10 2020 at 16:58, on Zulip):

centril said:

Elichai Turkel UB only exists at the "language level"; it occurs if you violate the operational semantics of the language causing the abstract machine to throw an Err(_) (in the Miri sense)

to be fair, even Rust libstd is not entirely precise about this use of UB though. And UB is a common enough term in library APIs that I am not sure if it is worth to fight the idea of "library UB". think, e.g., of our allocator traits.

RalfJ (Apr 10 2020 at 16:59, on Zulip):

centril said:

Elichai Turkel I would do enum Something { Yes, No }

@Elichai Turkel I would say if you plan to have some code, elsewhere, rely on your invariant for safety/UB-freedom, then this is totally idiomatic.

RalfJ (Apr 10 2020 at 17:00, on Zulip):

generally, if your type's safety invariant could be violated by a method, I'd use unsafe for that

centril (Apr 10 2020 at 17:11, on Zulip):

@RalfJ well "library UI" meaning "Misuse of this library can result in R-AM UB right now or later" is something I'm behind, but using it for other things feels confusing

simulacrum (Apr 10 2020 at 17:12, on Zulip):

I think it is fine to use it for non-Rust abstract machine UB, too, though, right? Like this is the principle behind things like FFI being unsafe

centril (Apr 10 2020 at 17:12, on Zulip):

@simulacrum I believe we also accepted the notion of "target UB" in UCG

centril (Apr 10 2020 at 17:12, on Zulip):

which covers that aspect

simulacrum (Apr 10 2020 at 17:13, on Zulip):

ah, not sure I follow that -- I meant specifically that the set of non-confusing UB being larger than just that of R-AM is what I'd expect

centril (Apr 10 2020 at 17:13, on Zulip):

@simulacrum yeah I read your message again and I think I read it wrong

centril (Apr 10 2020 at 17:14, on Zulip):

some other abstract machine integrated with Rust also seems reasonable for "UB"

centril (Apr 10 2020 at 17:14, on Zulip):

E.g. the combined program using C++ and Rust causing either of the abstract machines to have UB

centril (Apr 10 2020 at 17:14, on Zulip):

probably LLVM UB of some form

simulacrum (Apr 10 2020 at 17:15, on Zulip):

sure, but I would say that limiting in such a way seems odd

RalfJ (Apr 10 2020 at 17:15, on Zulip):

centril said:

RalfJ well "library UI" meaning "Misuse of this library can result in R-AM UB right now or later" is something I'm behind, but using it for other things feels confusing

I would say that is exactly what "violating a library's safety invariant" is. That's what safety invariants are for.

centril (Apr 10 2020 at 17:15, on Zulip):

@RalfJ yeah that's how I'd phrase it as well :slight_smile:

simulacrum (Apr 10 2020 at 17:15, on Zulip):

yeah, e.g. it includes putting non-UTF8 data inside string (presuming that String never actually causes R-AM UB due to that)

RalfJ (Apr 10 2020 at 17:16, on Zulip):

simulacrum said:

yeah, e.g. it includes putting non-UTF8 data inside string (presuming that String never actually causes R-AM UB due to that)

exactly. (right now UTF-8 str is a R-AM condition but I think we should get rid of that.)

centril (Apr 10 2020 at 17:16, on Zulip):

That is:

/// # Safety

basically

centril (Apr 10 2020 at 17:16, on Zulip):

@RalfJ I thought everyone was agreed it's not a R-AM condition

RalfJ (Apr 10 2020 at 17:16, on Zulip):

well the reference says it's one^^

centril (Apr 10 2020 at 17:16, on Zulip):

oh wow

RalfJ (Apr 10 2020 at 17:16, on Zulip):

so I figured we'd need an RFC to change that

centril (Apr 10 2020 at 17:17, on Zulip):

that's a pretty weird and unusable form of UB it seems to me

centril (Apr 10 2020 at 17:17, on Zulip):

what is LLVM going to use that for... heh?

RalfJ (Apr 10 2020 at 17:17, on Zulip):

shrug IIRC I just grandfathered it when editing that page

RalfJ (Apr 10 2020 at 17:17, on Zulip):

centril said:

what is LLVM going to use that for... heh?

I dont know why you bring up LLVM here. MIR is an IR with semantics on its own right. ;)

centril (Apr 10 2020 at 17:18, on Zulip):

@RalfJ sure but many of the UB conditions we have are for some good reason (e.g. LLVM exploiting it)

centril (Apr 10 2020 at 17:18, on Zulip):

is this UB condition just a result of confusing safety invariant with UB?

simulacrum (Apr 10 2020 at 17:18, on Zulip):

I mean we could give LLVM, in theory, the right to optimize the search in UTF-8 data knowing that

RalfJ (Apr 10 2020 at 17:18, on Zulip):

I think this one exists because "someone added it to the reference when initially writing that page" :rofl:

RalfJ (Apr 10 2020 at 17:18, on Zulip):

centril said:

is this UB condition just a result of confusing safety invariant with UB?

I think so

centril (Apr 10 2020 at 17:18, on Zulip):

i.e. we really meant "safety invariant" but used the wrong term "accidentally"?

simulacrum (Apr 10 2020 at 17:19, on Zulip):

but yeah, I think it is just a safety invariant

RalfJ (Apr 10 2020 at 17:19, on Zulip):

it's older than that term

simulacrum (Apr 10 2020 at 17:19, on Zulip):

(ah, should be)

centril (Apr 10 2020 at 17:19, on Zulip):

I think that seems like a reasonable theory about what happened here

RalfJ (Apr 10 2020 at 17:20, on Zulip):

@centril it goes back aaaall the way: https://doc.rust-lang.org/1.0.0/reference.html#behavior-considered-undefined

centril (Apr 10 2020 at 17:20, on Zulip):

@RalfJ hmm... do you want to open a PR to the reference to remove that? If you want an FCP let's do it on an issue or so

RalfJ (Apr 10 2020 at 17:20, on Zulip):

centril said:

RalfJ hmm... do you want to open a PR to the reference to remove that? If you want an FCP let's do it on an issue or so

so you are saying no RFC needed? you are the process expert, fine for me ;)

RalfJ (Apr 10 2020 at 17:20, on Zulip):

I'll add doing that to my list

centril (Apr 10 2020 at 17:21, on Zulip):

That would be a pretty short RFC heh :D

centril (Apr 10 2020 at 17:21, on Zulip):

if you prefer that format over FCP that's fine by me -- I can't imagine anyone on the team actually thinking it's a part of the validity invariant

centril (Apr 10 2020 at 17:22, on Zulip):

(of course my imagination is sometimes limited ^^)

RalfJ (Apr 10 2020 at 17:22, on Zulip):

shrug I'm in for both, if FCP works it's less work for me ;)

simulacrum (Apr 10 2020 at 17:22, on Zulip):

it also seems like it's at least presumably not important for outside consumers?

centril (Apr 10 2020 at 17:22, on Zulip):

sgtm then ^^

RalfJ (Apr 10 2020 at 17:22, on Zulip):

in https://github.com/rust-lang/unsafe-code-guidelines/issues/78 everyone also agreed it should be safety, not validity

RalfJ (Apr 10 2020 at 17:23, on Zulip):

simulacrum said:

it also seems like it's at least presumably not important for outside consumers?

it's relevant in the sense that if you create a non-UTF-8 str and dont use it, this makes it UB

RalfJ (Apr 10 2020 at 17:23, on Zulip):

but if we only have "library UB" then that's actually fine

simulacrum (Apr 10 2020 at 17:23, on Zulip):

sure, yeah

RalfJ (Apr 10 2020 at 17:23, on Zulip):

you only start having to worry about this once you call str methods

simulacrum (Apr 10 2020 at 17:23, on Zulip):

though "using it" is a bit up in the air

RalfJ (Apr 10 2020 at 17:24, on Zulip):

ah yes

RalfJ (Apr 10 2020 at 17:24, on Zulip):

so if its a validity invariant then it has to hold even when the value is just constructed

RalfJ (Apr 10 2020 at 17:24, on Zulip):

but if its just a safety invariant then as long as you dont pass it to a foreign method as &str, you are definitely good

RalfJ (Apr 10 2020 at 17:24, on Zulip):

like, if you only pass it around in your own code at that type

centril (Apr 10 2020 at 17:24, on Zulip):

@RalfJ is the standard library not violating this validity invariant in a bunch of places?

centril (Apr 10 2020 at 17:25, on Zulip):

maybe it always works with [u8] internally in those APIs

simulacrum (Apr 10 2020 at 17:25, on Zulip):

hm, I think std is pretty good about it

centril (Apr 10 2020 at 17:27, on Zulip):

simulacrum said:

I mean we could give LLVM, in theory, the right to optimize the search in UTF-8 data knowing that

Out of curiosity, is there a plausible compiler where this could actually give some noticeable benefits?

centril (Apr 10 2020 at 17:27, on Zulip):

That's a very speculative question... but well, there it is :slight_smile:

centril (Apr 10 2020 at 17:29, on Zulip):

the UCG discussion makes it a fairly clear "no" I think

RalfJ (Apr 10 2020 at 17:36, on Zulip):

centril said:

RalfJ is the standard library not violating this validity invariant in a bunch of places?

no idea...

RalfJ (Apr 10 2020 at 17:36, on Zulip):

centril said:

simulacrum said:

I mean we could give LLVM, in theory, the right to optimize the search in UTF-8 data knowing that

Out of curiosity, is there a plausible compiler where this could actually give some noticeable benefits?

that search is in a method though, so this method could conceivably still do unsafe_assume(this_is_utf8) and use the safety invariant to justify the assume

simulacrum (Apr 10 2020 at 17:41, on Zulip):

true

Elichai Turkel (Apr 10 2020 at 20:25, on Zulip):

Hehe, I was thinking of utf8 but didn't want to bring it up because lately it comes up a lot when I talk with @RalfJ

RalfJ (Apr 11 2020 at 14:42, on Zulip):

@centril here you go: https://github.com/rust-lang/reference/pull/792

centril (Apr 11 2020 at 18:18, on Zulip):

@RalfJ :slight_smile: FCP as promised: https://github.com/rust-lang/rust/issues/71033#issuecomment-612481973

Lokathor (Apr 11 2020 at 20:17, on Zulip):

The Rustonomicon has long put "also anything else your library chooses to call unsafe" as part of what's unsafe.

marmeladema (Apr 11 2020 at 20:48, on Zulip):

(deleted)

Last update: May 29 2020 at 17:40UTC