Stream: t-lang

Topic: Unsafe and UB


RalfJ (Apr 16 2020 at 21:23, on Zulip):

@josh wrote.

it should not be possible to have undefined behavior without the unsafe keyword.

Is it "undefined behavior" you want to flag here? That statement is true, but it seems somewhat separate from the primary things unsafe exists to address.

I feel like that's related to what unsafe means, but not directly what unsafe means.

that thread went elsewhere so I am forking here

RalfJ (Apr 16 2020 at 21:24, on Zulip):

@Josh Triplett I would have defined exactly that to be the primary concern of unsafe -- to avoid UB.
What would you have said is "directly what unsafe means"?

Josh Triplett (Apr 16 2020 at 21:27, on Zulip):

At the risk of handwaving on the fly: I feel like unsafe is more closely related to invariants or requirements than to UB specifically.

Josh Triplett (Apr 16 2020 at 21:27, on Zulip):

An unsafe method on a type isn't necessarily going to produce UB if you call it with insufficient care.

Josh Triplett (Apr 16 2020 at 21:28, on Zulip):

It might just violate an invariant of the type, or produce an assert or panic, for instance. The existence of a check and panic might prevent any actual UB from occurring, but that doesn't mean the method should be marked safe.

Lokathor (Apr 16 2020 at 21:37, on Zulip):

well you just marked all of indexing unsafe with that rule :P

mark-i-m (Apr 17 2020 at 15:50, on Zulip):

I think I know what Josh is getting at: from an end-user's perspective, I don't really care about UB for its own sake. Rather, I care that my program is correct, fast, and not a pain to write. We define UB in rust to achieve rust's balance of these goals.

Josh Triplett (Apr 17 2020 at 15:53, on Zulip):

Thinking about it further, I think it's reasonable to describe unsafe as marking regions where the compiler doesn't prevent you from invoking UB (including library UB) and you need to take care not to do so. (Though I'd phrase it somewhat differently when explaining it to a new Rust user than when documenting it for a language definition.) I think there's a certain degree to which I may just be shying away from using "UB" as part of a description of a useful concept, because I've mostly seen "UB" used as a hammer to hit people over the head with and tell them that their programs were never right to begin with so it's OK for the compiler to break them.

Josh Triplett (Apr 17 2020 at 15:53, on Zulip):

I need to learn that not everyone uses the term "UB" that way.

RalfJ (Apr 17 2020 at 16:09, on Zulip):

Josh Triplett said:

At the risk of handwaving on the fly: I feel like unsafe is more closely related to invariants or requirements than to UB specifically.

hm, interesting. to me, the purpose of these invariants is to prove absence of UB.

RalfJ (Apr 17 2020 at 16:10, on Zulip):

but of course I come from the perspective of a PL researcher, always on the look for a theorem to prove

RalfJ (Apr 17 2020 at 16:10, on Zulip):

"without unsafe there is no UB" is a really juicy theorem :D

RalfJ (Apr 17 2020 at 16:11, on Zulip):

do you have interesting examples of invariants people are guarding with unsafe that are unrelated to UB?

Josh Triplett (Apr 17 2020 at 16:12, on Zulip):

Offhand, I can think of data structures involving multiple coordinated things and using indexes as "pointers" into a Vec (because you can't have references and the thing they reference in the same struct), where unsafe methods are the ones that don't guarantee every index has a corresponding Vec element.

RalfJ (Apr 17 2020 at 16:12, on Zulip):

one example I could think of (stealing from PL literature here, people did stuff like that e.g. for Haskell) is having a type that enforces information flow properties, so that no safe program can leak confidential data on untrusted channels. that is another theorem, rather independent from UB.

Josh Triplett (Apr 17 2020 at 16:13, on Zulip):

Effectively, data-structure invariants rather than language-level UB. If it were std, we'd call it "library UB".

RalfJ (Apr 17 2020 at 16:13, on Zulip):

Josh Triplett said:

Effectively, data-structure invariants rather than language-level UB. If it were std, we'd call it "library UB".

I think user crates can also have "library UB", std is not special here.

Josh Triplett (Apr 17 2020 at 16:14, on Zulip):

OK, I can't help but meme here:
"You can't just call everything undefined behavior!"
"haha unsafe go brrr"

RalfJ (Apr 17 2020 at 16:14, on Zulip):

actually this was recently discussed here ;)

RalfJ (Apr 17 2020 at 16:14, on Zulip):

Josh Triplett said:

OK, I can't help but meme here:
"You can't just call everything undefined behavior!"
"haha unsafe go brrr"

I got to admit this one is lost on me^^

RalfJ (Apr 17 2020 at 16:15, on Zulip):

in std I think a long time ago a policy was set not to use unsafe to guard against non-UB "dangerous" API usage. that was when mem::forget became safe.

Josh Triplett (Apr 17 2020 at 16:16, on Zulip):

Right. I think it's worth observing that not every library follows the same policy as std.

Josh Triplett (Apr 17 2020 at 16:16, on Zulip):

And in particular, I think if I were writing a library crate and there were a method that could leak memory if misused, I'd mark it unsafe.

Josh Triplett (Apr 17 2020 at 16:16, on Zulip):

As in "you need to carefully read the requirements of this method before using it, they may not be enforced by the interface".

RalfJ (Apr 17 2020 at 16:16, on Zulip):

hm, interesting.

RalfJ (Apr 17 2020 at 16:17, on Zulip):

but I am not sure how common that sentiment is

RalfJ (Apr 17 2020 at 16:17, on Zulip):

so "unsafe is to prevent UB" might still be the "least common denominator", the one thing we have to all agree on to make the ecosystem work

Josh Triplett (Apr 17 2020 at 16:17, on Zulip):

I don't know either. But given that people can define any method as unsafe to call, they can attach any semantic they want to it.

RalfJ (Apr 17 2020 at 16:18, on Zulip):

that doesn't exclude people from using stronger invariants locally.

Josh Triplett (Apr 17 2020 at 16:18, on Zulip):

RalfJ said:

so "unsafe is to prevent UB" might still be the "least common denominator", the one thing we have to all agree on to make the ecosystem work

Right. If something could cause UB, it must be marked as unsafe. But something being marked unsafe doesn't require that it relates to UB.

Shnatsel (Apr 17 2020 at 16:19, on Zulip):

Some crates don't actually uphold the guarantees laid down in the Nomicon - "you cannot cause UB without unsafe" - and expose some functions as safe that are possible, but not easy to misuse, because guarding against all misuse is too costly in terms of performance. libc and glium come to mind

Josh Triplett (Apr 17 2020 at 16:20, on Zulip):

I'm curious, what kinds of methods in libc potentially allow UB without being unsafe?

RalfJ (Apr 17 2020 at 16:20, on Zulip):

@Josh Triplett that's fair. but even with that I think the relation to UB is so supremely important that it should be part of whatever one-sentence "pitch" about unsafe ends up in a "values" doument.

RalfJ (Apr 17 2020 at 16:21, on Zulip):

Josh Triplett said:

I'm curious, what kinds of methods in libc potentially allow UB without being unsafe?

fork comes to my mind... I think there's still some crates out there that expose it safely, might not be libc.

Shnatsel (Apr 17 2020 at 16:21, on Zulip):

https://github.com/rust-lang/libc/issues/1501 for example

Josh Triplett (Apr 17 2020 at 16:21, on Zulip):

@RalfJ If that's the case, then it needs explanation for the benefit of people who don't know what that document may mean by "UB", or who have different associations with the concept than programming language researchers do. :)

Josh Triplett (Apr 17 2020 at 16:23, on Zulip):

@RalfJ Before I came to Rust, I understood in general what undefined behavior was, but my association with it very much included a critical subset of "what standards refuse to define despite it having a practical definition in real-world compilers, and which we have to keep passing more options to those compilers to remind them to stop breaking it".

RalfJ (Apr 17 2020 at 16:23, on Zulip):

oh I see, you are saying the term is burnt for some people

RalfJ (Apr 17 2020 at 16:24, on Zulip):

yeah I can see that

Josh Triplett (Apr 17 2020 at 16:24, on Zulip):

Right.

Josh Triplett (Apr 17 2020 at 16:24, on Zulip):

I mean, until some of the most recent C standards, it was UB to assume that integers are two's complement.

RalfJ (Apr 17 2020 at 16:24, on Zulip):

I agree UB in C is... sad :(

RalfJ (Apr 17 2020 at 16:24, on Zulip):

which is why I am working hard to make it better in Rust^^

Josh Triplett (Apr 17 2020 at 16:24, on Zulip):

And I think it's still UB to cast a pointer to a different type and dereference it, despite that happening all the time in systems programming contexts.

RalfJ (Apr 17 2020 at 16:24, on Zulip):

maybe we shouldnt have re-use the term

RalfJ (Apr 17 2020 at 16:25, on Zulip):

the underlying principle is the same but people can see UB in C and put out the baby with the bathwater

Josh Triplett (Apr 17 2020 at 16:25, on Zulip):

Yeah.

Josh Triplett (Apr 17 2020 at 16:26, on Zulip):

There are a lot of people whose fundamental reasoning is "I know what code this should turn into, and I know what that code does on the machine". Which is sometimes reasonable, and sometimes not.

RalfJ (Apr 17 2020 at 16:26, on Zulip):

what I mostly dont like about the term is that it's a negative one

RalfJ (Apr 17 2020 at 16:26, on Zulip):

I'd much rather say "this program is $safe" ($safe := "cannot cause UB"/"has well-defined semantics"/"does not hit an error state in the abstract machine") than "this program is UB-free"

Josh Triplett (Apr 17 2020 at 16:27, on Zulip):

:+1:

Josh Triplett (Apr 17 2020 at 16:27, on Zulip):

I agree with preferring positive statements to negative ones.

RalfJ (Apr 17 2020 at 16:28, on Zulip):

in PL research we sometimes use "safe" for that, might be confusing though with "safe code"... we'd basically be saying "even your unsafe code must be safe but then it is your responsibility; safe code is checked to be safe by the compiler"^^

Josh Triplett (Apr 17 2020 at 16:30, on Zulip):

Yeah. I wonder sometimes if there was a term like "unchecked" we could have used instead of "unsafe".

simulacrum (Apr 17 2020 at 16:30, on Zulip):

I do think it may be confusing, but can also be helpful. I recall finding it illuminating that "all code must be safe", i.e. that unsafe is not an escape hatch into UB is fine.

Josh Triplett (Apr 17 2020 at 16:30, on Zulip):

"Abandon all compiler aid, all ye who press enter here"

RalfJ (Apr 17 2020 at 16:31, on Zulip):

simulacrum said:

I do think it may be confusing, but can also be helpful. I recall finding it illuminating that "all code must be safe", i.e. that unsafe is not an escape hatch into UB is fine.

"statically" and "dynamically" safe woild be possible disambiguations here

simulacrum (Apr 17 2020 at 16:33, on Zulip):

Hm, that's an interesting way of looking at it. I think I may like it :)

Josh Triplett (Apr 17 2020 at 16:35, on Zulip):

Interesting. How does that interact with things like "dynamic" borrow checking, such as Rc?

RalfJ (Apr 17 2020 at 16:38, on Zulip):

hm, if someone said "dynamic borrow checking" I'd think they mean Stacked Borrows as that's how I have sometimes described it^^

RalfJ (Apr 17 2020 at 16:38, on Zulip):

and then it is quite parallel with statically/dynamically "safe" code

RalfJ (Apr 17 2020 at 16:39, on Zulip):

Rc has no equivalent to &mut, so I cannot really see it as a "dynamic borrow checker". if anything, that would be RefCell.

Josh Triplett (Apr 17 2020 at 16:39, on Zulip):

I very much think of stacked borrows as static borrow checking, just smarter static borrow checking.

RalfJ (Apr 17 2020 at 16:39, on Zulip):

but stacked borrows is dynamic

RalfJ (Apr 17 2020 at 16:39, on Zulip):

as in, it runs at "program run-time"

RalfJ (Apr 17 2020 at 16:40, on Zulip):

it's not a static analysis that you can run on code without executing said code

RalfJ (Apr 17 2020 at 16:40, on Zulip):

we have a static analysis that approximates stacked borrows, and it is called borrow checking

Josh Triplett (Apr 17 2020 at 16:44, on Zulip):

I wasn't aware of the dynamic version. I thought that stacked borrows was a model that was being used to analyze and reason about the borrow checker.

RalfJ (Apr 17 2020 at 16:44, on Zulip):

there only is a dynamic version

RalfJ (Apr 17 2020 at 16:45, on Zulip):

it is a model used to analyze the borrow checker, well, one could say that, but analysis happens by "run this code in miri"

Josh Triplett (Apr 17 2020 at 16:45, on Zulip):

/me is confused.

Josh Triplett (Apr 17 2020 at 16:46, on Zulip):

I thought that the model was a formal reasoning model, as in a set of written rules and theorems for human readers, not a computer implementation.

RalfJ (Apr 17 2020 at 16:46, on Zulip):

we can then prove theorems about programs running with stacked borrows, and those theorems are "static" in some sense

Josh Triplett (Apr 17 2020 at 16:46, on Zulip):

It's a code implementation, first and foremost?

RalfJ (Apr 17 2020 at 16:47, on Zulip):

Stacked Borrows is...

these two are basically equivalent, just one is written in Rust and the other in math/Coq

Josh Triplett (Apr 17 2020 at 16:47, on Zulip):

Ah, that makes more sense, thank you. I was only aware of the latter, not the former.

RalfJ (Apr 17 2020 at 16:48, on Zulip):

and both are "dynamic" in nature

RalfJ (Apr 17 2020 at 16:48, on Zulip):

in the sense that thy describe program executions

RalfJ (Apr 17 2020 at 16:48, on Zulip):

you wouldn't call Rust "static" just because there exists a "static" piece of code (Miri) that can run Rust code

Josh Triplett (Apr 17 2020 at 16:49, on Zulip):

No, certainly not. I was thinking of the formal model, and then of the borrow-checker rules in Rust that that model models.

RalfJ (Apr 17 2020 at 16:49, on Zulip):

so likewise, these formal "operational" rules, while being written for human and mathematical consumption, are as "dynamic" as an interpreter

RalfJ (Apr 17 2020 at 16:49, on Zulip):

they have the same information content

Josh Triplett (Apr 17 2020 at 16:49, on Zulip):

Thanks for the explanation and clarification. :)

RalfJ (Apr 17 2020 at 16:49, on Zulip):

sure. :) I realize I am using a lot of jargon and don't always know which part of it is confusing.

RalfJ (Apr 17 2020 at 16:50, on Zulip):

Josh Triplett said:

It's a code implementation, first and foremost?

historical trivia, I did the code implementation first, and then a colleague put everything into math ;)

Josh Triplett (Apr 17 2020 at 16:50, on Zulip):

In this case, it wasn't the jargon, just information I was missing about the nature of stacked borrows.

RalfJ (Apr 17 2020 at 16:51, on Zulip):

but in my head both are just different ways to express the same concepts

RalfJ (Apr 17 2020 at 16:52, on Zulip):

that's what I like so much about Miri -- it is an artifact that lets me very clearly communicate details of semantics with people that have no/not much formal background but understand interpreters, and at the same time it is a very precise mathematical specification of Rust behavior, and one that we can even test

RalfJ (Apr 17 2020 at 16:53, on Zulip):

so with my PhD advisor I'd discuss rust semantics via formal mathematical rules, and here on Zulip I'd discuss the same things by pointing at Rust code implementing steps in an interpreter ;)

scottmcm (Apr 17 2020 at 22:03, on Zulip):

One thing I've found valuable about std keeping a fairly hard line about unsafe meaning that misuse can be unsound is that it reduces the tendency for people to "how bad could it be?"

scottmcm (Apr 17 2020 at 22:05, on Zulip):

More more APIs that are unsafe for being easy to misuse, the more likely people are to say "I'm pretty sure that this won't be zero, so I'll just use NonZeroU32::new_unchecked; it's just a correctness hint".

simulacrum (Apr 17 2020 at 22:08, on Zulip):

I have wanted a separate lint, though likely deny by default, that is basically that middle ground of "be very careful but not UB"

Lokathor (Apr 18 2020 at 06:13, on Zulip):

I've never marked code as unsafe if i wasn't worried about UB happening, but I've also often wanted "hey watch it with this" signs

Lokathor (Apr 18 2020 at 06:15, on Zulip):

An attribute on functions that rustdoc knows about and that signals a warning by default seems an appropriate solution to this sort of thing

Lokathor (Apr 18 2020 at 06:15, on Zulip):

like the reverse of must_use

nikomatsakis (Apr 18 2020 at 11:23, on Zulip):

Catching up on this thread:

Amanieu (Apr 18 2020 at 15:12, on Zulip):

I'm against extending unsafe beyond "you may create UB". That's how we ended up with things like AssertUnwindSafe which I feel is a huge API wart.

simulacrum (Apr 18 2020 at 15:24, on Zulip):

Yeah, I would strongly agree about not extending unsafe. I think there may be room for creating something else (different keyword, etc) that is separate, specifically to provide more room for saying "unsafe is just about UB"

mark-i-m (Apr 18 2020 at 16:39, on Zulip):

ugh me and my fat fingers

mark-i-m (Apr 18 2020 at 16:40, on Zulip):

I like Ralf's description: "safe" = "has well-defined semantics"

mark-i-m (Apr 18 2020 at 16:40, on Zulip):

^^^ that's what I meant to type

mark-i-m (Apr 18 2020 at 16:45, on Zulip):

I've also really wondered about the practical edge cases in kernel development: in particular, it seems like rust's abstract machine kind of assumes a sane traditional process model in which processes have a normal address space that doesn't change observably during the program. This is absolutely not true for an operating system where parts of the address space may appear or disappear or be modified by hardware (e.g. consider memory-mapped io where a ld/st instruction doesn't mean what it normally means) or paging or switch from 32-bit mode to 64-bit mode... it means that crates like x86_64 are very hard to make ergonomic because nearly everything can break some assumption about the underlying state of the machine

Amanieu (Apr 18 2020 at 16:48, on Zulip):

I don't think that Rust's memory model is incompatible with this: such changes are simply concurrent modifications by another "thread" (in this case the MMU). All you need is an appropriate fence (either a compiler_fence or an asm! with the "memory" clobber) to avoid data races.

mark-i-m (Apr 18 2020 at 16:55, on Zulip):

@Amanieu I think I agree that it's not incompatible with rust's memory model, but I guess I'm not fully confident that the solution you mentioned is a general solution -- it seems like there could be corner cases. For example, switching modes means that the definition of a pointer changes (64-bit pointers may no longer be valid) and making paging changes may mean that two pointers start aliasing that weren't before. Somehow you need to inform the compiler about that or else use unsafe everywhere... doing it ergonomically is rather hard.

RalfJ (Apr 19 2020 at 08:03, on Zulip):

Amanieu said:

I'm against extending unsafe beyond "you may create UB". That's how we ended up with things like AssertUnwindSafe which I feel is a huge API wart.

AssertUnwindSafe is not unsafe though, so it is explicitly not an example for extending unsafe beyond UB.

RalfJ (Apr 19 2020 at 08:03, on Zulip):

@mark-i-m those are some valid and seriously hard questions you are raising there

RalfJ (Apr 19 2020 at 08:05, on Zulip):

in terms of academic PL research, most researchers run away screaming when you ask them about page table changes, so there's not a ton of work in that area. ;) even just precisely specifying hardware behavior of these things, or other things like "what when executing an instructions races with other code that modifies the encoded instruction", is a niche. I know of no work that tries to then push such low-level shenanigans up through a high-level abstract machine.

RalfJ (Apr 19 2020 at 08:06, on Zulip):

do people really switch address mode between 32 and 64 bit in Rust? like, during the execution of a single rust program? that's... wow.^^

Josh Triplett (Apr 19 2020 at 14:24, on Zulip):

If you're writing an operating system in Rust, sure. You have to.

RalfJ (Apr 19 2020 at 14:40, on Zulip):

I thought this would happen sometime in some assembly-written preamble early in the bootloader or so

RalfJ (Apr 19 2020 at 14:41, on Zulip):

but my thoughts in this area are clearly very naive

RalfJ (Apr 19 2020 at 14:41, on Zulip):

also I forgot about running 32bit programs on a 64bit host...

Josh Triplett (Apr 19 2020 at 14:46, on Zulip):

That's one case; another is booting a 64-bit OS on 32-bit firmware.

Josh Triplett (Apr 19 2020 at 14:47, on Zulip):

There will definitely be assembly involved, though.

Josh Triplett (Apr 19 2020 at 14:47, on Zulip):

For the actual mode switch.

Josh Triplett (Apr 19 2020 at 14:49, on Zulip):

Entering 64-bit mode requires page tables, and I certainly don't want to construct page tables in assembly. I would rather do that in Rust...

Josh Triplett (Apr 19 2020 at 14:50, on Zulip):

But at that point I think those might just be separate Rust programs. One compiled as 32-bit and one compiled as 64-bit.

mark-i-m (Apr 19 2020 at 18:32, on Zulip):

@RalfJ Hmm... that's good to know.

I am aware of a lot of ongoing work about verifying kernels, file systems, secure enclaves, hypervisors, and hardware (in fact, most recent SOSPs and OSDIs have had at least one track about this). I know they are different problems from specifying an abstract machine (like for a PL), but it seems plausible that work in those areas could be used to inspire an abstract machine specification. Here is one example of such work: https://dl.acm.org/doi/10.1145/3132747.3132782. It comes to mind because it talks a bit about some of the crazy things they have to deal with when modeling hardware, and it also shows how they make their proofs scale.

But at that point I think those might just be separate Rust programs. One compiled as 32-bit and one compiled as 64-bit.

Actually, this is how I have tried to deal with it (e.g. context-switching). I've tried to set things up so that at the point where weird things happen, it should be "equivalent" to a program ending and another starting. Of course, I don't know if I'm doing it right, which is what prompted my question...

nikomatsakis (Apr 20 2020 at 20:01, on Zulip):

Amanieu said:

I'm against extending unsafe beyond "you may create UB". That's how we ended up with things like AssertUnwindSafe which I feel is a huge API wart.

I don't think this is a good example -- there is no unsafe required to use that API. I see @RalfJ made this point already, sorry.

Last update: Jun 05 2020 at 22:55UTC