Stream: t-lang/wg-unsafe-code-guidelines

Topic: pthread_cancel looks unsound


gnzlbg (Oct 15 2019 at 15:22, on Zulip):

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=aa730479e323f7efe10c4d5acd590190

gnzlbg (Oct 15 2019 at 15:23, on Zulip):

notice how the destructor of DropGuard is never called

gnzlbg (Oct 15 2019 at 15:23, on Zulip):

Yet the DropGuard variable is deallocated

gnzlbg (Oct 15 2019 at 15:24, on Zulip):

We do guarantee that destructors of variables are called before they are deallocated (required for Pin being sound)

gnzlbg (Oct 15 2019 at 15:42, on Zulip):

@Amanieu I recall an issue about this somewhere but I can't find it

gnzlbg (Oct 15 2019 at 15:43, on Zulip):

I also recall that actually something like this should cause the other thread to unwind from a cancellation point

rkruppe (Oct 15 2019 at 15:45, on Zulip):

Good thing it's unsafe then ;)

gnzlbg (Oct 15 2019 at 15:47, on Zulip):

https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=596dc7790e74de1da07fbbceb29e7854

gnzlbg (Oct 15 2019 at 15:47, on Zulip):

It sometimes unwinds, and sometimes just gets deallocated without unwinding

gnzlbg (Oct 15 2019 at 15:49, on Zulip):

it depends on whether I run it in debug or release

gnzlbg (Oct 15 2019 at 15:50, on Zulip):

and the timings

gnzlbg (Oct 15 2019 at 15:53, on Zulip):

So I kind of expected that this would unwind

gnzlbg (Oct 15 2019 at 15:53, on Zulip):

That's bad in another whole dimension (all the syscalls or libc APIs are nounwind)

gnzlbg (Oct 15 2019 at 15:53, on Zulip):

But what I did not expect is that this would also terminate the thread without unwinding

gnzlbg (Oct 15 2019 at 15:54, on Zulip):

I'm not sure how that could even happen, a signal maybe ?

gnzlbg (Oct 15 2019 at 15:55, on Zulip):

But I don't think std::threads catch any signals by default

Amanieu (Oct 15 2019 at 15:58, on Zulip):

https://github.com/rust-lang/rust-memory-model/issues/46

Amanieu (Oct 15 2019 at 16:00, on Zulip):

@gnzlbg The difference between debug and release is probably because the nounwind marking on the FFI sleep call is wrong.

gnzlbg (Oct 15 2019 at 16:00, on Zulip):

I don't think we misoptimize this case due to nounwind, but I can check that

gnzlbg (Oct 15 2019 at 16:00, on Zulip):

(the only known misoptimization requires noreturn + nounwind)

Amanieu (Oct 15 2019 at 16:02, on Zulip):

If LLVM can see that there is no way for the unwind path to be reached then it will simply eliminate the unwind path.

Amanieu (Oct 15 2019 at 16:03, on Zulip):

Unwinding is still happening, it's just that the destructor isn't getting called because LLVM "knows" that it is unreachable.

Amanieu (Oct 15 2019 at 16:04, on Zulip):

One possible fix is to open up "man 7 pthreads" and add #[unwind] to all functions listed as cancellation points.

gnzlbg (Oct 15 2019 at 16:04, on Zulip):

@Amanieu but the issue is that in debug builds destructors don't run, but they do run on release

gnzlbg (Oct 15 2019 at 16:04, on Zulip):

that's the opposite of what I would expect if your hypothesis was true (LLVM optimizing debug builds, but not release builds)

Amanieu (Oct 15 2019 at 16:06, on Zulip):

In your second example you have a race. Add handle.join() at the end.

Amanieu (Oct 15 2019 at 16:06, on Zulip):

The threads are overlapping

gnzlbg (Oct 15 2019 at 16:09, on Zulip):

Ah, I see what you mean now

gnzlbg (Oct 15 2019 at 16:11, on Zulip):

Weird

gnzlbg (Oct 15 2019 at 16:15, on Zulip):

@Amanieu so the race is part of the problem

gnzlbg (Oct 15 2019 at 16:16, on Zulip):

without .join(), the thread gets dettached

gnzlbg (Oct 15 2019 at 16:16, on Zulip):

but pthread_cancel was already called

gnzlbg (Oct 15 2019 at 16:16, on Zulip):

so even if unwinding starts happening, it can happen that main exits before that

Amanieu (Oct 15 2019 at 16:16, on Zulip):

If you look up, it's aborting with an error

Amanieu (Oct 15 2019 at 16:16, on Zulip):

FATAL: exception not rethrown

gnzlbg (Oct 15 2019 at 16:17, on Zulip):

In the other case I mean

gnzlbg (Oct 15 2019 at 16:17, on Zulip):

https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=fafb18f0e64c08faae4d602cbd120de1

gnzlbg (Oct 15 2019 at 16:17, on Zulip):

In that case, without .join(), the second thread has already been cancelled before main tries to exit, but it maybe gets dettached before the unwind is raised there

Amanieu (Oct 15 2019 at 16:17, on Zulip):

Oh that's just main exiting before the thread gets to handle the exception

gnzlbg (Oct 15 2019 at 16:17, on Zulip):

yes

Amanieu (Oct 15 2019 at 16:18, on Zulip):

So this is all a false alert, unwind is working fine.

gnzlbg (Oct 15 2019 at 16:18, on Zulip):

yes

gnzlbg (Oct 15 2019 at 16:18, on Zulip):

well, "fine"

gnzlbg (Oct 15 2019 at 16:18, on Zulip):

it is still unsound, but it is working reliably

Amanieu (Oct 15 2019 at 16:18, on Zulip):

Well apart from FATAL: exception not rethrown, but that's because it's crashing in the catch_unwind at the root of the thread

Amanieu (Oct 15 2019 at 16:19, on Zulip):

The unwind exception thrown by pthread_cancel is not meant to be caught

gnzlbg (Oct 15 2019 at 16:19, on Zulip):

Yes, the problem is that catch_unwind probably doesn't know how to handle foreign exceptions

gnzlbg (Oct 15 2019 at 16:19, on Zulip):

or the force exception

Amanieu (Oct 15 2019 at 16:19, on Zulip):

Maybe we should fix catch_unwind to ignore any non-Rust exceptions

gnzlbg (Oct 15 2019 at 16:19, on Zulip):

that's what C++ does

gnzlbg (Oct 15 2019 at 16:20, on Zulip):

well, C++ allows you to catch them in catch(...)

gnzlbg (Oct 15 2019 at 16:20, on Zulip):

but not in any other context

gnzlbg (Oct 15 2019 at 16:20, on Zulip):

so if the error is "exception not rethrown", this must be a "force unwind"

gnzlbg (Oct 15 2019 at 16:20, on Zulip):

and the Itanium ABI says that those must always be rethrown

gnzlbg (Oct 15 2019 at 16:22, on Zulip):

_Unwind_ForcedUnwind

gnzlbg (Oct 15 2019 at 16:23, on Zulip):

https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#base-framework

"forced" unwinding (such as caused by longjmp or thread termination).

During "forced unwinding", on the other hand, an external agent is driving the unwinding. For instance, this can be the longjmp routine. This external agent, not each personality routine, knows when to stop unwinding. The fact that a personality routine is not given a choice about whether unwinding will proceed is indicated by the _UA_FORCE_UNWIND flag.

Amanieu (Oct 15 2019 at 16:28, on Zulip):

IMO catch_unwind should only catch native rust exceptions. If you want to catch foreign exceptions, write a wrapper in C or C++.

gnzlbg (Oct 15 2019 at 16:29, on Zulip):

but this is not a foreign exception

gnzlbg (Oct 15 2019 at 16:29, on Zulip):

this is a ForceUnwind

gnzlbg (Oct 15 2019 at 16:31, on Zulip):

The ABI allows catching foreign exceptions and interrupting unwinding. What it doesn't allow is catching ForceUnwinds and interrupting unwinding (you can pause it, do something, but must rethrow the ForceUnwind afterwards).

gnzlbg (Oct 15 2019 at 16:31, on Zulip):

That doesn't mean that catch_unwind should catch foreign exceptions

gnzlbg (Oct 15 2019 at 16:32, on Zulip):

But even if it doesn't, we probably want to do so on the thread boundary anyways, and abort

gnzlbg (Oct 15 2019 at 16:33, on Zulip):

or something like that

gnzlbg (Oct 15 2019 at 16:37, on Zulip):

Otherwise the unwind will just terminate the process anyways IIUC

gnzlbg (Oct 15 2019 at 16:38, on Zulip):

Or how is unwinding propagated across a thread boundary ?

gnzlbg (Oct 15 2019 at 16:40, on Zulip):

Ah! This particular unwinding is allowed to reach the thread boundary, and that will cancel the thread

comex (Oct 16 2019 at 01:48, on Zulip):

what happens with -C panic=abort?

RalfJ (Oct 16 2019 at 17:37, on Zulip):

https://github.com/rust-lang/rust-memory-model/issues/46

moved that to the UCG repo: https://github.com/rust-lang/unsafe-code-guidelines/issues/211

RalfJ (Oct 16 2019 at 17:38, on Zulip):

(we don't already have an issue covering that, do we?)

RalfJ (Oct 16 2019 at 17:38, on Zulip):

but we dont have a fitting A-* I think... hm

RalfJ (Oct 16 2019 at 17:38, on Zulip):

A-unwind? A-drop?

gnzlbg (Oct 17 2019 at 08:47, on Zulip):

@comex AFAICT, with -C panic=abort, the stack frame unwinds

gnzlbg (Oct 17 2019 at 08:47, on Zulip):

Otherwise you can't cancel threads in C without -fexceptions

gnzlbg (Oct 17 2019 at 08:48, on Zulip):

Destructors aren't necessarily run when that happens though

Last update: Nov 19 2019 at 17:35UTC