Stream: project-error-handling

Topic: Error-Handling in FFI context


view this post on Zulip Lucius Hu (Sep 24 2020 at 20:12):

I wish this is not duplicated. I found an annoying pain-point: Error Handling in FFI.


First, for recoverable erroritself, its handling in FFI context is not much different from the general case.
But I'd like to get more guides on how to turn the convert between a Rust recoverable error to its C counterpart.

Specifically, C doesn't have a native error handling paradigm, but there are a few common practices. For example, using error code to represent an error type, or using a string buffer provided by the caller to write error message in.

So for example, it would be helpful to know how to cut the boilerplate codes on the conversion.
It's also helpful to if Rust adopters to adopt some some practices, instead of using their own makeshift representation of C-compatible error types.


For unrecoverable error, correct me if I were wrong, it's at least a synonym to, if not identical to, panic in most context.
But panic in FFI context is really annoying since it's intertwined with memory management and safety.

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:21):

@Lucius Hu I think improving interoperability with C error handling is quite relevant to the project group and probably something we should look into.

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:22):

as for the panic handling, I think this is potentially something we could look into but might be something that is better handled by project-ffi-unwind

view this post on Zulip oliver (Sep 24 2020 at 21:22):

Generally though Rust can't enforce safety inside another language

view this post on Zulip oliver (Sep 24 2020 at 21:23):

Wouldn't panic handling effectively attempt to do that?

view this post on Zulip Lucius Hu (Sep 24 2020 at 21:23):

true. But we should promote good practice to make things right on Rust side

view this post on Zulip oliver (Sep 24 2020 at 21:23):

So wrapping unsafe in a panic handler

view this post on Zulip oliver (Sep 24 2020 at 21:26):

C does also have a method for native 'faux'-error handling which is the compiler as does the Rust compiler which can identify unsoundness

view this post on Zulip Lucius Hu (Sep 24 2020 at 21:28):

So we cannot prevent error on the other side of FFI boundary, but we need to make sure errors occured in Rust is managed properly.

view this post on Zulip Lucius Hu (Sep 24 2020 at 21:30):

in short we cannot enforce any safety rules on other languages, but we need to minimize the chance memory safety issues could raise, by doing things right on Rust side.

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:31):

i believe this is what project-ffi-unwind is trying to do maybe

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:31):

not sure

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:31):

should check the charter

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:31):

https://github.com/rust-lang/project-ffi-unwind/blob/master/charter.md

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:31):

cc @BatmanAoD (Kyle Strand)

view this post on Zulip Lucius Hu (Sep 24 2020 at 21:32):

thx

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:32):

and https://github.com/rust-lang/project-ffi-unwind/blob/master/rfcs/0000-c-unwind-abi.md

view this post on Zulip Lucius Hu (Sep 24 2020 at 21:32):

but there's some overlapping parts for error-handling too

view this post on Zulip Jane Lusby (Sep 24 2020 at 21:33):

for sure, I just want to be careful about duplicating effort

view this post on Zulip oliver (Sep 24 2020 at 21:33):

project-ffi-unwind is a lang team project idk what that implies

view this post on Zulip oliver (Sep 24 2020 at 22:03):

I guess I don't know all of the context behind unsafe but it seems to me that
one isn't meant to be using it to introduce unsoundness

view this post on Zulip DPC (Sep 25 2020 at 00:40):

project-ffi-unwind is a lang team project idk what that implies

It requires a lot of work on the parts of the codebase that's part of the lang team (while this project is under the libs team)

view this post on Zulip oliver (Sep 25 2020 at 00:43):

it was a rhetorical question but thanks for your input

view this post on Zulip oliver (Sep 25 2020 at 00:43):

for your reference questions in the english language typically end in a ? mark

view this post on Zulip oliver (Sep 25 2020 at 00:45):

rhetorical meaning not requiring a response not intended to further the discussion

view this post on Zulip DPC (Sep 25 2020 at 00:53):

i know what's a rhetorical question, but in chat applications people often tend to omit the ? even for normal questions

view this post on Zulip Ashley Mannix (Sep 25 2020 at 05:22):

for your reference questions in the english language typically end in a ? mark

@Oliver I'm sure it wasn't meant this way but this can come across as a bit condescending when we only have text to work with. Let's all try keep on topic, with language discussion focused on the ones we program in :smile:

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:33):

Hi, I'll try to explain my views on the scope of the FFI-unwind project as it pertains to error-handling conventions. @Jane Lusby, thanks for tagging me.

First, though, @Lucius Hu, I'd like to understand what you mean by recoverable error and unrecoverable error. I think of those as purely "conceptual", i.e., they're not referring to specific language features but to an idea of how certain errors can or cannot be handled. I think this is how you mean it as well, but this part of your first post confuses me:

Lucius Hu said:

For unrecoverable error, correct me if I were wrong, it's at least a synonym to, if not identical to, panic in most context.

I would agree that panic is the best mechanism for handling errors that the programmer thinks are not recoverable, and I suspect that this is a fairly widely-shared consensus in the community. Is that what you're suggesting here?

I think I can also clarify the connection between panic and aborting:

Lucius Hu said:

- Second, the documentation mentioned that it cannot catch panics which abort the process. But it'd be really helpful, especially beginners, if the documentation elaborates on how to verify this. Otherwise it's for people to confidently use catch_unwind.
- Another common practice is to utilize edition 2018's feature aborting on panic.

These are actually referring to the same feature, which is not specific to the 2018 edition (despite appearing in the edition guide, which is a recognized point of confusion). The feature causes _all_ uses of panic in a compiled Rust binary (executable) into "abort" statements, causing the program to terminate immediately. A single executable should never have a mix of "abort" panics and "normal" (unwinding) panics.

catch_unwind can't catch aborts for the simple reason that when a process aborts, it does not check first whether a catch_unwind exists. So when compiled with panic=abort, catch_unwind essentially has no effect. So it is safe to use catch_unwind in libraries, regardless of whether they are used in programs using panic=abort or not.

Regarding clean-up on panic=abort, in most contexts, there is actually no problem with not freeing memory. If your target platform has an operating system, it will automatically free all memory associated with a process once that process terminates. This is part of why it's actually considered safe in Rust to leak memory! There are some resources that _won't_ be cleaned up on some platforms; for instance, I believe that on most file systems, file descriptors can leak, which could be a problem. But I believe it is generally considered safe to abort a process on a "full" (i.e. not real-time) OS, because the OS will usually keep anything too bad from happening.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:37):

@BatmanAoD (Kyle Strand)
I used the terminology defined in the workgroup charter

Recoverable error: An error that can be reacted and recovered from when encountered e.g. a missing file.
Unrecoverable error: An error that cannot reasonably be reacted to or recovered from and which indicates a bug e.g. indexing out of bounds.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:38):

Oh wow! I didn't see that the charter was so fleshed out already. Sorry for missing that context; thanks for pointing me to it!

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:39):

Thanks for the detailed explanations. It's true that on a modern OS memory not freed would eventually be recycled by OS. But people may use Rust-FFI for jobs where SECURITY is the top priority

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:39):

I would say that the first two bullet points in the "Come to a consensus on current best practices" confirm what you wrote, then, that panic is synonymous with "unrecoverable error", or at least, that by convention they should be treated synonymously.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:40):

Hm. I wouldn't think of memory-cleanup as a security protection. Allocators do not typically _overwrite_ old memory.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:40):

(when freeing it)

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:41):

I'm not talking about specific platform or OS, but just generic FFI. For simplicity we can assume it's something like x86_64 *unix. For more specific cases, they may better fit Embedded WG or others.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:43):

So the concern of memory security really troubles people. So if they don't matter due to some of Rust's design or due to guarantees offered by modern OS, I think we should explicit document it.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:43):

Just for references.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:45):

Either way, I don't see a close connection between security-related cleanup and either FFI or error handling in general. I think you're potentially right that for security-critical data, abort might be a dangerous choice, but I don't know enough about the field, and I think that's probably somewhat outside the scope of this group's concerns.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:47):

I found the security guide from the French spy agency really helpful https://anssi-fr.github.io/rust-guide/07_ffi.html#memory-and-resource-management

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:48):

Regarding FFI-unwind: our charter is fairly minimal. We are specifically concerned with control-flow constructs that cross between language boundaries, and making sure that the Rust compiler is able to emit code that interacts with these constructs without causing undefined behavior. The connection to error-handling, of course, is that such constructs are usually used for error-handling.

Specifically, our first RFC was to establish a way for C++ exceptions to cross "through" Rust and back into C++, and for Rust panics to cross "through" C++ and back into Rust. In each case:

Our next major effort, I think, will be to specify when it is safe to drop Rust frames without unwinding. This is what happens on most platforms with longjmp (which is true in C++ as well). In general, relying on longjmp should not be considered a "best practice" in Rust, but it's something that we sometimes need to deal with when interfacing with C libraries. Note that there is no way whatsoever for any language to perform cleanup on arbitrary target platforms when C longjmps through it, so we definitely won't be introducing mechanisms for using Drop types in the presence of longjmp.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:49):

I agree that for some works maybe memory leak is not a very severe issue. But in some cases, there are strict guidelines developers must comply with if they ever want integration of Rust-FFI.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:49):

I think we would never recommend exposing panic across a language boundary as a go-to error-communication mechanism, though.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:50):

For example, if I want to make an R extension and publish it in CRAN, I think I need to ensure no memory leak. Because CRAN is running multiple tools to check memory leak all the time.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:51):

Is that still related to error-handling? I'm a little confused.

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:52):

It is. Because if we just let the Rust program panic, without proper handling, then memory could be leaked.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:53):

That's more about RAII than error-handling.

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:53):

_wondering how old this document from the spy agency you linked is_

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:53):

It's published this year.

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:53):

interesting

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:53):

It's immensely strict

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:53):

fn foo_create() -> *mut RawFoo;

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:54):

im surprised this isn't something like fn foo_create() -> Option<NonNull<RawFoo>>; or w/e the syntax is

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:54):

keep in mind i've done like no ffi

view this post on Zulip Lucius Hu (Sep 26 2020 at 20:54):

For a state-level adversary, someone must use that level of cautions lol

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 20:55):

Re: unrecoverable errors and FFI: the "sandbox" approach for panic is almost certainly the only viable (and safe) global/default approach. In our RFC for cross-language unwinding, we specified that "C" will now abort rather than let a panic cross the boundary. This makes extern "C" functions somewhat "sandbox-like" in that regard.

view this post on Zulip Jane Lusby (Sep 26 2020 at 20:56):

sick

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:03):

Right, we need some guidelines like this. But sometimes abort alone is not enough to prevent memory leak.

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:04):

@Lucius Hu in abort you'd still have the chance to run a panic hook, assuming you're aborting via libstd's aborting infrastructure

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:04):

i think

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:04):

let me double check

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:04):

In many cases one side of FFI boundary, either Rust or C, may need to manually call the destructor to free the memory. The problem of both abort and panic is that, if that ever happens, the program may not get the chance to call the destructor.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 21:05):

I guess my summary would be that FFI-unwind is (currently) for dealing with the messy realities of systems programming as it exists today, where C is the lingua franca, so there is no language-level mechanism for handling errors, much less for distinguishing recoverable from non-recoverable errors. In this context, our job is mostly to "protect" Rust from C, and vice-versa.

For less "antagonistic" cross-language error handling, I think we rely somewhat on the progress of some broader efforts in this direction:

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:06):

Yep there are some facilities for that. But we need to document and summarize them.

view this post on Zulip nagisa (Sep 26 2020 at 21:06):

Lucius Hu said:

For a state-level adversary, someone must use that level of cautions lol

why is encrypted memory not being used in the first place, or if it is, is there _really_ any distinction between memory that has been cleared and hasn't?

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:09):

ok, for example, though it's not related to Rust, one way of hacking MacBook is to dump the memory. Because in hibernation the memory still posses data which could be highly sensible. Clearly it's not feasible to encrypt all the memory right

view this post on Zulip nagisa (Sep 26 2020 at 21:10):

Fair, I guess we're still a couple years away from memory controllers in conventional laptops having ability to encrypt _all_ memory.

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:11):

In many scenarios we can not assume the underlying infrastructure has encryption, that's why we need extra effort to ensure no memory leak in the first place

view this post on Zulip nagisa (Sep 26 2020 at 21:11):

This is something that is available in server-chip space (e.g. as implemented by EPYCs) and I remember seeing embedded chips that have this functionality too.

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:11):

this is seeming more and more like a security topic that happens to intersect with error handling than something that is _about_ error handling

view this post on Zulip nagisa (Sep 26 2020 at 21:12):

yeah my bad ^^, feel free to split out the messages into a topic in #general

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:12):

no i dont think its you specifically, just the entire conversation has been drifting this way

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:12):

so I agree that we have many ways to ensure securities or let the OS recycle leaked memory. But it's best we can do as good as we can so we don't need those alternative approaches

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:13):

Lucius Hu said:

so I agree that we have many ways to ensure securities or let the OS recycle leaked memory. But it's best we can do as good as we can so we don't need those alternative approaches

this seems quite reasonable, the point I'm making though is I don't think it makes sense for the error handling project group in particular to oversee this work

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:13):

well to return to the topic, if we don't properly manage the "unrecoverable error", bad things could happen

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:13):

to me this seems more suited to an independent RFC

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:14):

im not sure if theres a security wg of some sort

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:14):

i know theres the unsafe code guidelines wg

view this post on Zulip nagisa (Sep 26 2020 at 21:14):

#wg-secure-code probably

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:15):

cc @Tony Arcieri

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:15):

there could be thousands of ways to cause a security breach, for this WG specifically, I think we need guidelines and documentations

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:15):

ohai

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:16):

to summarize the backscroll, we're talking about the security implementations of panics and ffi

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:16):

aah

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:16):

and I think this might be more up your alley than mine

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:17):

so I'm not proposing that we need to invent additional stuff to ensure security, which is clearly not related to the WG. But since this WG is for generic error-handling, I think it fits to include guidelines for error-handling in FFI context

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:17):

especially since you got that zeroize crate

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:17):

seems very relevant

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:17):

so uhh, I'm not sure how much I can talk about here, but Google is working on some cool stuff in LLVM

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:17):

and yes I wrote zeroize but we can do a lot better :sweat_smile:

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:18):

the idea I like the most is, for a lack of a better term, "stack bleaching" which is something being used a bit in the Linux kernel but also potentially possible with LLVM even across FFI boundaries

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:18):

the general idea is you decide you begin an operation with transient secrets on the stack, and the hard part is calculating the high watermark on the stack

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:19):

and once the entire set of operations is done, you zero the stack up to the high watermark

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:19):

Lucius Hu said:

so I'm not proposing that we need to invent additional stuff to ensure security, which is clearly not related to the WG. But since this WG is for generic error-handling, I think it fits to include guidelines for error-handling in FFI context

aah, okay that sounds fine. I think if we end up writing an error handling book that might be a good place to put guidelines about ffi memory safety concerns around error handling

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:19):

I'll have to defer to you on writing those guidelines though, but I'll be happy to review any chapters you contribute as much as I am able

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:19):

Hi Tony, my intention is, providing guidelines to better manage errors in FFI context, so we may not even need to deal with so much complications

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:19):

zeroize is kind of a hack to accomodate what is possible in stable Rust today

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:20):

It's best if there's no error. If there is, it's best it's recoverable. If it's unrecoverable, it's best we "sandbox" it so we can gracefully do the clean up.

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:20):

yeah I'm not sure exactly what you need here per se. if there were really secrets on the stack, stack bleaching is probably the best option, otherwise I'm not sure what else you can do

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:21):

If otherwise, then things gets really complicated.... And that's where I want to avoid

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:21):

yeah exactly

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:21):

the nice thing about the stack bleaching approach is it's actually relatively simple and works cleanly across FFI boundaries, at least in the context of LLVM-compiled languages

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:22):

but when and if it hits mainline LLVM... TBD

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:22):

Lucius Hu said:

It's best if there's no error. If there is, it's best it's recoverable. If it's unrecoverable, it's best we "sandbox" it so we can gracefully do the clean up.

regarding this point, it seems like the ffi unwind group's current plans cover this

view this post on Zulip Tony Arcieri (Sep 26 2020 at 21:22):

and then there's Cranelift to think about

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:22):

the unwinding panic turns into an abort if it hits a c boundary and isn't caught with catch::unwind

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:23):

but you're still already doing the proper unwinding all the way up to that point

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:23):

with the ability to setup your panic hook and catch it as you see fit to secure the memory

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:23):

that sounds great I will take a look

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:24):

but in the meantime, I'm not sure when their efforts will materialize...

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:29):

so even if their FFI unwinding mechanism cannot be completed, with careful handling of errors we can still do something to mitigate the risk of memory leak, with the existing features of Rust

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:31):

In my view that's also the value of error-handling. Although there are multiple ways to deal with the same problem ( e.g. memory leak ), it's best if we do it right in every aspect

view this post on Zulip Lucius Hu (Sep 26 2020 at 21:32):

so they keep making their FFI unwind, and we could also contain the risk with good error handling

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 21:36):

Jane Lusby said:

Lucius Hu said:

It's best if there's no error. If there is, it's best it's recoverable. If it's unrecoverable, it's best we "sandbox" it so we can gracefully do the clean up.

regarding this point, it seems like the ffi unwind group's current plans cover this

My only caveat would be that the development and communication of best practices would probably fall more in this group's purview than in FFI-unwind's. But yes, the RFC does specify a way to sandbox panics! I think we would also like to add some additional "sandboxing" features, but that starts to creep into effects-algebra territory, which is much broader.

view this post on Zulip BatmanAoD (Kyle Strand) (Sep 26 2020 at 21:37):

Lucius Hu said:

so even if their FFI unwinding mechanism cannot be completed, with careful handling of errors we can still do something to mitigate the risk of memory leak, with the existing features of Rust

This is absolutely true. I think it would be entirely reasonable for a security-conscientious app to have a stack-bleach followed by abort inside a catch_unwind at the top of every extern "C" function. It's noisy, but it's safe.

view this post on Zulip Jane Lusby (Sep 26 2020 at 21:39):

that sounds sick


Last updated: Jan 29 2022 at 10:29 UTC