Stream: wg-secure-code

Topic: panics


Shnatsel (Jul 20 2019 at 18:04, on Zulip):

So I've been looking at advisories like https://rustsec.org/advisories/RUSTSEC-2019-0010.html or https://rustsec.org/advisories/RUSTSEC-2018-0003.html plus yet-unreported bugs such as https://github.com/Frommi/miniz_oxide/issues/14 and it stood out to me that these issues only occur on panic.
So I've been wondering, is there a reason something that doesn't actually catch panics would still want to have them enabled? At this point it looks like panic=abort is actually more secure and should be the default in the vast majority of cases.

briansmith (Jul 21 2019 at 01:59, on Zulip):

I think most security people would agree panic=abort is safer. Performance optimization people will agree it is more efficient (especially if using xargo to elminate all the unwinding logic in libstd). But people who ask me to integrate Rust into non-Rust applications have never wanted to abort on panic. Panics are just too likely in Rust. A language subset without slice operator[], and libraries built to that subset, would be needed to make those people comfortable with panic=abort.

Zach Reizner (Jul 21 2019 at 04:31, on Zulip):

There is always the no_panic crate used to check that a function can't panic.

Zach Reizner (Jul 21 2019 at 04:31, on Zulip):

But I think it doesn't work if panic=abort.

Zach Reizner (Jul 21 2019 at 04:32, on Zulip):

It would be nice if there was better first class support for checking at compile time that a block can't panic.

Tony Arcieri (Jul 21 2019 at 04:39, on Zulip):

there was... something presented at one of the OxidizeConf impl days... I forget its name :cry:

Tony Arcieri (Jul 21 2019 at 05:15, on Zulip):

I don't think it was no_panic, it was some academic project and looked like it did a pretty rigorous analysis

Tony Arcieri (Jul 21 2019 at 05:16, on Zulip):

it was primarily targeting high assurance embedded use cases

Tony Arcieri (Jul 21 2019 at 05:16, on Zulip):

it continues to amaze me the kind of static analysis you can do on single-core embedded Rust programs which isn't possible on the language as a whole

Tony Arcieri (Jul 21 2019 at 05:17, on Zulip):

see also: cortex-m-rtfm asserting your program is deadlock-free

Tony Arcieri (Jul 21 2019 at 05:20, on Zulip):

@Shnatsel the main reason you want panic = 'unwind' in release targets is so when a program crashes you get a reasonable error message. we go one step further than that and collect backtraces and other information on panic, log them to a crash reporter, then exit

Tony Arcieri (Jul 21 2019 at 05:21, on Zulip):

if you just abort, all of that is lost

rkruppe (Jul 21 2019 at 08:55, on Zulip):

I don't think that's true? panic=abort still prints panic message and backtrace (if enabled), and you can set up a hook that runs on panics even if they abort: https://doc.rust-lang.org/std/panic/fn.set_hook.html

RalfJ (Jul 21 2019 at 09:05, on Zulip):

yeah, the key part about panic=abort is that you dont unwind. whether you abort immediately or first do some diagnostic printing does not matter.

Shnatsel (Jul 21 2019 at 09:51, on Zulip):

But people who ask me to integrate Rust into non-Rust applications have never wanted to abort on panic.

Actually, no, unwinding across the FFI boundary is undefined behaviour, and you also cannot rely on any shared data structures being in a consistent state after Rust panicked, so aborting immediately is actually the only reasonable thing Rust can do when plugged into other languages

simulacrum (Jul 21 2019 at 14:29, on Zulip):

@Shnatsel You can definitely do things after Rust panics, I don't see why that would be a problem. Even across FFI -- you just need to pass the panic through FFI as a pointer to the "panic object" via catch_unwind

simulacrum (Jul 21 2019 at 14:30, on Zulip):

Now, that's not always possible, and you definitely need to be careful about data structures and such, but it is by no means unsound etc

Shnatsel (Jul 21 2019 at 14:40, on Zulip):

It's not automatically unsound as long nobody does unsafe, but it's usually an unexpected condition with no reasonable response but to terminate the program. Internal data structures may be left in a state that makes no sense from the application logic perspective even if it's still technically memory safe.

Alex Gaynor (Jul 21 2019 at 14:41, on Zulip):

In applications like web servers, you're basically always going to want the panic to only kill one request's worth of execution :shrug:

Shnatsel (Jul 21 2019 at 14:41, on Zulip):

I have no clue how panics interact with async/await btw

Shnatsel (Jul 21 2019 at 14:42, on Zulip):

And also, killing one request's worth of execution requires extra effort to structure the code in a way that makes it so. It's great that there is such an opportunity, but that doesn't mean it should be the default

simulacrum (Jul 21 2019 at 14:43, on Zulip):

Not really in any interesting way; most runtimes will kill only that one thread and restart it quickly and you'll get an Err in the spawn() call

simulacrum (Jul 21 2019 at 14:43, on Zulip):

I don't think I agree that "not automatically unsound as long nobody does unsafe" -- even in unsafe, you should be unwind-safe

simulacrum (Jul 21 2019 at 14:44, on Zulip):

i.e., it must be sound to unwind even through unsafe code

Shnatsel (Jul 21 2019 at 14:45, on Zulip):

It's the programmer's responsibility to make it so, and we're seeing people mess that up repeatedly - see bugs linked at the beginning of this thread. So my point is that if you're not deliberately catching panics, you should be using panic=abort because that protects you from such unsoundness

simulacrum (Jul 21 2019 at 14:46, on Zulip):

Hm, well, in some sense, yes. I agree that if you have no plan/need to catch panics as a binary crate, panic=abort makes a lot of sense. But, libraries should be written with the assumption of panic=unwind

simulacrum (Jul 21 2019 at 14:46, on Zulip):

e.g. hyper uses catch_unwind I believe per-request to avoid bringing down the whole server

Shnatsel (Jul 21 2019 at 14:54, on Zulip):

Or rather, libraries should be written with the assumption of either being used as a possibility. I've seen a crate that used catch_unwind to mask panics on malformed inputs that were not handled correctly, and that would be very easy to crash with panic=abort

simulacrum (Jul 21 2019 at 14:54, on Zulip):

Oh, sure, yeah, I guess that's a better phrasing :)

Shnatsel (Jul 21 2019 at 14:56, on Zulip):

Okay, another TODO entry: write an article calling to use panic=abort in binary crates that don't catch panics themselves

simulacrum (Jul 21 2019 at 14:59, on Zulip):

(we should make sure that panic=abort is "good enough" though -- backtraces, etc.) I think it might need some configuration today, but maybe that can be alleviated -- a crate or something that does it for you

Shnatsel (Jul 21 2019 at 15:00, on Zulip):

Can you even implement that as a crate? And "add extra dependency for more security" does not sound convincing to me

Shnatsel (Jul 21 2019 at 15:01, on Zulip):

https://doc.rust-lang.org/std/panic/struct.AssertUnwindSafe.html - why does this not require an unsafe block to use?

simulacrum (Jul 21 2019 at 15:01, on Zulip):

Because UnwindSafe is a roadblock, it's not anything more than that

simulacrum (Jul 21 2019 at 15:01, on Zulip):

code must be unwind safe, but e.g. Mutex use after unwinding is _probably_ not actually what you wanted

simulacrum (Jul 21 2019 at 15:02, on Zulip):

I think to an extent there's not a good understanding of whether/how to deal with UnwindSafe + AssertUnwindSafe

Shnatsel (Jul 21 2019 at 15:02, on Zulip):

bummer

simulacrum (Jul 21 2019 at 15:03, on Zulip):

I got the impression last time I looked at this that basically ~all uses of catch_unwind probably get wrapped in AssertUnwindSafe because it's just too painful to do otherwise

simulacrum (Jul 21 2019 at 15:04, on Zulip):

because most of the time you're not actually _doing_ anything after that unwind happens, e.g., you're killing the thread of execution or something along those lines

Tony Arcieri (Jul 22 2019 at 13:57, on Zulip):

somehow I've managed to do things in ways where I don't need AssertUnwindSafe. I think it might've been a combination of making all state either Send + Sync + 'static and/or owned by the thread on its stack?

simulacrum (Jul 22 2019 at 14:19, on Zulip):

perhaps, yeah

simulacrum (Jul 22 2019 at 14:20, on Zulip):

I don't quite recall but I think my use cases related to non-thread type stuff -- but I don't recall what exactly

Tony Arcieri (Jul 22 2019 at 14:42, on Zulip):

I remember rustc asking me to AssertUnwindSafe at one point, but after refactoring some I no-longer needed it. I think it's all about keeping the other side of the unwind boundary (i.e. the "catcher") stateless

Tony Arcieri (Jul 22 2019 at 14:43, on Zulip):

I have toplevel panic handlers on all worker threads

Tony Arcieri (Jul 22 2019 at 14:43, on Zulip):

I also use catch_unwind as a failsafe in places where there's nontrivial things happening in a Drop handler

Tony Arcieri (Jul 22 2019 at 14:44, on Zulip):

mostly to get a good error message. I exit the program in the event they ever happen

Tony Arcieri (Jul 22 2019 at 14:44, on Zulip):

otherwise panicking in a Drop handler is not a good time

Tony Arcieri (Jul 22 2019 at 14:45, on Zulip):

iirc it used to print "You’ve met with a terrible fate, haven’t you?"

simulacrum (Jul 22 2019 at 15:10, on Zulip):

sure, yeah -- however, sometimes you can't keep it stateless

simulacrum (Jul 22 2019 at 15:11, on Zulip):

e.g. you're passing panics over FFI boundary and then re-panicking

Tony Arcieri (Jul 22 2019 at 15:22, on Zulip):

haha yeah, fortunately I'm not doing anything like that

Tony Arcieri (Jul 22 2019 at 15:22, on Zulip):

just toplevel exception handlers

briansmith (Jul 22 2019 at 19:13, on Zulip):

Actually, no, unwinding across the FFI boundary is undefined behaviour, and you also cannot rely on any shared data structures being in a consistent state after Rust panicked, so aborting immediately is actually the only reasonable thing Rust can do when plugged into other languages

We use catch_unwind at every FFI boundary and return an error indicator (usually our ol' friend NULL).

Shnatsel (Jul 22 2019 at 19:13, on Zulip):

and you only ever feed it transient state?

briansmith (Jul 22 2019 at 19:16, on Zulip):

Yes, in fact we serialize everything into a string for input and output.

briansmith (Jul 22 2019 at 19:18, on Zulip):

Regardless, I agree that many crates aren't panic safe and that's a bad thing. I'm just saying that panic=abort doesn't sound palatable (sometimes, not even feasible) to many people.

Thom Chiovoloni (Jul 22 2019 at 19:22, on Zulip):

Yeah, we do the same: catch_unwind at FFI boundary, turn into an error on the other end. This still ends up poisioning a mutex, which we keep mainly because of a lack of confidence that our dependencies are still memory safe in the case of a panic.

Shnatsel (Jul 22 2019 at 19:40, on Zulip):

I'm just saying "I know what I'm doing" should be opt-in and the safe option the default. But at this point it's too late to change the language (unless it's an edition change?) so we'll have to promote it for binaries I guess

briansmith (Jul 22 2019 at 19:52, on Zulip):

Again, I'd most like to see a way to opt into a panic-free subset of the language, like [no_std]. I would definitely use it myself.

Thom Chiovoloni (Jul 22 2019 at 19:54, on Zulip):

I guess in the panic free subset you'd just have slice.get()/get_mut() and not index operators?

briansmith (Jul 22 2019 at 19:55, on Zulip):

Right.

Thom Chiovoloni (Jul 22 2019 at 19:58, on Zulip):

If it could be used in small sections that would make sense. For an entire program it sounds rather difficult to work with, TBH.

Thom Chiovoloni (Jul 22 2019 at 20:02, on Zulip):

Anyway, re: suggesting the use for binaries, in practice panic != abort matters much more in cases where exiting isn't what was going to happen anyway. If you don't catch panics, unwrap() on thread joining, etc. then the amount of safety that this would gain seems rather small

briansmith (Jul 22 2019 at 20:09, on Zulip):

I think that would encourage people to write code that assumes panics abort. Then they'll run into trouble when they try to factor that code out from their binaries into libraries.

briansmith (Jul 22 2019 at 20:11, on Zulip):

I think offering more building blocks for secure, no-unsafe programming is a good path forward. In the start of this thread, panics were identified as a source of vulnerabilities, but ultimately those vulnerabilities were also due to the use of unsafe. Arguably the use of unsafe is the primary cause.

Shnatsel (Jul 22 2019 at 20:15, on Zulip):

https://github.com/rust-secure-code/safety-dance :wink:

Thom Chiovoloni (Jul 22 2019 at 20:26, on Zulip):

ultimately those vulnerabilities were also due to the use of unsafe. Arguably the use of unsafe is the primary cause.

Yeah, this is more-or-less how I feel too. Removing the need for unsafe seems like the fix for this, more than trying to urge people to abort on panic.

Tony Arcieri (Jul 22 2019 at 23:44, on Zulip):

today for me: people talking about making Erlangy supervisors for Rust threads to restart them when they crash, and also people who think the best option for panicking is to abort :wink:

Tony Arcieri (Jul 22 2019 at 23:45, on Zulip):

to which I say correspondingly: mutable state and PoisonError, and contrarily crash reporters, and also the aforementioned people who want to try to make programs that "let it crash"

briansmith (Jul 24 2019 at 20:57, on Zulip):

I was thinking about this today, and I remembered previous discussions with the libs team, there's a lot of momentum towards using panics to indicate "you are using this API wrongly" and these APIs are ones that often need to be used to avoid unsafe. There was a discussion about this for copy_within, for example, where there is no version that returns an error on out-of-bounds. In other words, the standard library is sometimes imposing a dichotomy of "avoid unsafe" vs "avoid panics." Later we sometimes end up adding non-panicking versions of the APIs, resulting in API bloat. Perhaps we need a new idiom for communicating API misuse that avoids panics but is more ergonomic than using Result currently is.

briansmith (Jul 24 2019 at 20:58, on Zulip):

In ring we take the opposite approach and try to avoid panics, returning Result an annoying amount. The CSPRNG API is an example where people hate this. (In contrast, BoringSSL uses abort() deep in its internals on CSPRNG failures.)

Shnatsel (Aug 11 2019 at 10:23, on Zulip):

Could you provide some examples of non-panicking APIs being added later, other than indexing ([x] vs .get(x))?

RalfJ (Aug 11 2019 at 12:39, on Zulip):

was .get added later? I thought it always existed?
Conversely, there are all the checked_* arithmetic operations, but I am not sure if they were added "later".

simulacrum (Aug 11 2019 at 14:35, on Zulip):

try_reserve and friends, kinda, though the previous APIs aborted I believe

Last update: Nov 11 2019 at 22:50UTC