Stream: project-error-handling

Topic: Stack overflow errors


view this post on Zulip Mario Carneiro (Feb 25 2021 at 04:47):

This is maybe a problem for the language/compiler, not just libs, but I would really like to see better error reporting for stack overflows:

fn im_a_functional_programmer(a: usize, b: usize) -> usize {
    if a == 0 { return b }
    im_a_functional_programmer(a - 1, b + 1)
}

fn main() {
    println!("{}", im_a_functional_programmer(100000000, 0))
}
$ cargo run
   Compiling rust-test v0.1.0 (/home/mario/Documents/rust-test)
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/rust-test`

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
fish: “cargo run” terminated by signal SIGABRT (Abort)

In particular, it is really important to get a stack trace to debug stack overflows! Rust is great about this except when the error is a stack overflow.

view this post on Zulip Mario Carneiro (Feb 25 2021 at 04:50):

There are also lots of cool things you can do with a stack trace for stack overflows like identify the cycle and present only that, rather than a mile long stack trace

view this post on Zulip oliver (Feb 25 2021 at 04:54):

this is done with backtrace-rshttps://docs.rs/backtrace/0.3.56/backtrace/

view this post on Zulip oliver (Feb 25 2021 at 04:57):

the idea is to define a Backtrace trait in core and std

view this post on Zulip Mario Carneiro (Feb 25 2021 at 04:57):

Which part? The cycle detection?

view this post on Zulip oliver (Feb 25 2021 at 04:57):

the stack tracing

view this post on Zulip Mario Carneiro (Feb 25 2021 at 04:58):

Even if you hit stack overflow? My impression is that rust just aborts on this condition and there is no space for a lib to run any code

view this post on Zulip Jane Lusby (Feb 25 2021 at 04:59):

im not sure how stack overflows are handled

view this post on Zulip oliver (Feb 25 2021 at 04:59):

I do think that backtrace-rs can do the capture

view this post on Zulip Jane Lusby (Feb 25 2021 at 04:59):

oliver said:

I do think that backtrace-rs can do the capture

i dont think that's the issue

view this post on Zulip oliver (Feb 25 2021 at 05:00):

well, prior art then

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

no i mean, during a stack overflow

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

you have no memory with which to run the functions to capture a stacktrace

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

im guessing?

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

tho this doesn't make a lot of sense to me

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

so I need to dig into this more

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:00):

I dont even know if stack overflows trigger the panic handler

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:01):

but it might be as simple as adding the call to backtrace printing used in the panic handler to whatever part of the runtime handles stack overflows

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:01):

it clearly had enough stack space to report the error, so hopefully thats sufficient

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:01):

and the only issue here is that it isn't integrated with that part of the runtime yet

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:02):

This needs compiler support, but it might be possible to allocate a separate stack for the stack overflow handler. Alternatively, if the error type is owned and 'static maybe you can just throw the main stack away and reuse it

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:02):

Mario Carneiro said:

This needs compiler support, but it might be possible to allocate a separate stack for the stack overflow handler. Alternatively, if the error type is owned and 'static maybe you can just throw the main stack away and reuse it

there shouldn't even necessarly be an error type associated with the overflow right, at least from the user's perspective

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:02):

Who prints fatal runtime error: stack overflow?

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:03):

Mario Carneiro said:

Who prints fatal runtime error: stack overflow?

checking now

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:03):

#[allow(dead_code)] // stack overflow detection not enabled on all platforms
pub unsafe fn report_overflow() {
    dumb_print(format_args!(
        "\nthread '{}' has overflowed its stack\n",
        thread::current().name().unwrap_or("<unknown>")
    ));
}

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:03):

from std/src/sys_common/util.rs

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:05):

Which is called from https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/stack_overflow.rs#L105

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:05):

and the windows one

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:05):

yea

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:05):

i think I know why this is not working

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:05):

or i mean

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:05):

why we can't do this from stack overflows yet

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:06):

wait

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:06):

no...

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:06):

this is std code

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:06):

ignore me

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

@Mario Carneiro you should try grabbing the panic code from default_hook

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

and put it into report_overflow, see what happens

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

backtrace code*

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

not panic

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

    let backtrace_env = if panic_count::get() >= 2 {
        RustBacktrace::Print(crate::backtrace_rs::PrintFmt::Full)
    } else {
        backtrace::rust_backtrace_env()
    };

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:08):

        match backtrace_env {
            RustBacktrace::Print(format) => drop(backtrace::print(err, format)),
            RustBacktrace::Disabled => {}
            RustBacktrace::RuntimeDisabled => {
                if FIRST_PANIC.swap(false, Ordering::SeqCst) {
                    let _ = writeln!(
                        err,
                        "note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace"
                    );
                }
            }
        }

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:08):

there shouldn't even necessarly be an error type associated with the overflow right, at least from the user's perspective

You're right. We still have to do some forensics on the main stack to work out the backtrace but I think we can just throw the stack away and reuse it if we need space for nice printing

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:09):

Mario Carneiro said:

there shouldn't even necessarly be an error type associated with the overflow right, at least from the user's perspective

You're right. We still have to do some forensics on the main stack to work out the backtrace but I think we can just throw the stack away and reuse it if we need space for nice printing

messing with the stack is a bit lower level than I'm familiar with

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:09):

might be good to bring this up with the compiler team

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:09):

I assume that's what backtrace-rs is doing

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:09):

Mario Carneiro said:

I assume that's what backtrace-rs is doing

I think backtrace-rs calls into libunwind

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:09):

or w/e its called

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:10):

I haven't even tried modifying std before :upside_down:

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:10):

at least on unix

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:10):

Mario Carneiro said:

I haven't even tried modifying std before :upside_down:

first time! woo!

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:10):

rustc-dev-guide makes it pretty easy to get started

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:11):

I need to figure out how rust signal handlers work. Surely we need some stack space just for that, since it's not written in assembler

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:12):

like are we just hoping the compiler can inline and eliminate all locals?

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:12):

/shrug

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:12):

I wish I knew or could respond to these questions

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:12):

but yea

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:14):

Oh, actually a bit later in that file there is NEED_ALTSTACK and sigaltstack, so it sounds like there is a stack being allocated just for the signal handler

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:14):

fancy

view this post on Zulip Mario Carneiro (Feb 25 2021 at 05:14):

in which case there should be enough to print a basic stack trace, assuming no user code is called

view this post on Zulip Jane Lusby (Feb 25 2021 at 05:15):

do it do it do it

view this post on Zulip Charles Ellis O'Riley Jr. (Feb 25 2021 at 06:26):

Don't know if this link will be of benefit to the conversation on "stack overflow errors" but I found this link for debugging the issue specifically for a mac: https://dev.to/jasonelwood/setup-gdb-on-macos-in-2020-489k

view this post on Zulip Jakub Duchniewicz (Feb 25 2021 at 07:45):

gdb is a useful tool (and lldb) but for stack overflows they would not help too much (just point to the place that overflows)

view this post on Zulip Jakub Duchniewicz (Feb 25 2021 at 07:46):

and yeah, stack overflows with a backtrace would be useful (though for recursive calls we would get a long backtrace of almost useless info)

view this post on Zulip Jakub Duchniewicz (Feb 25 2021 at 07:47):

maybe we should skip printing the long backtrace after e.g. 3 same func calls?

view this post on Zulip Charles Ellis O'Riley Jr. (Feb 25 2021 at 07:49):

@Jakub Duchniewicz Thanks. I'll file that info away :+1:

view this post on Zulip Jakub Duchniewicz (Feb 25 2021 at 07:50):

and ping me if you need help with using gdb, it can be pretty overwhelming at first glance (and is pretty powerful - even has TUI mode with -tui flag!)

view this post on Zulip Charles Ellis O'Riley Jr. (Feb 25 2021 at 07:55):

Thanks, I'll keep that offer in mind but I know absolutely nothing about stack overflows. I just researched it and came across gdb and thought it might be of assistance to those looking into the issue.

view this post on Zulip nagisa (Feb 25 2021 at 13:25):

stack overflow produces as bad messages as they do now because you cannot really allocate anything on stack if you want to have any success of reporting the overflow

view this post on Zulip nagisa (Feb 25 2021 at 13:25):

separate stacks for signal handlers are not universal.

view this post on Zulip nagisa (Feb 25 2021 at 13:26):

If my memory serves me right, anyway.

view this post on Zulip nagisa (Feb 25 2021 at 13:28):

This also reminds me that people were complaining in the past that we don't always print "stack overflowed" message when there's an actual stack overflow. Our policy was always that this detection is on a best-effort basis. And if we start doing more (printing stack traces), people will have further expectations.

view this post on Zulip nagisa (Feb 25 2021 at 13:30):

I wonder if it wouldn't make sense to print out more information on any fault in that case.

view this post on Zulip nagisa (Feb 25 2021 at 13:30):

like go runtime IIRC prints stack trace, register values etc on sigquit too.

view this post on Zulip Mario Carneiro (Feb 25 2021 at 13:32):

I think it's okay if the enhanced stack overflow support is also best effort (but hopefully at least works on tier 1)

view this post on Zulip nagisa (Feb 25 2021 at 13:33):

you can't always figure out if a fault was caused by a stack overflow in the first place, even on T1 targets.

view this post on Zulip Mario Carneiro (Feb 25 2021 at 13:34):

The currently implemented method (watching for a guard page access) seems to be fairly reliable, although it has issues if you have really large stack frames

view this post on Zulip Mario Carneiro (Feb 25 2021 at 13:35):

but possibly the compiler can do some special handling for that - most C compilers already have to probe the stack when you put a giant array on the stack or use alloca to ensure that the guard page is hit

view this post on Zulip Mario Carneiro (Feb 25 2021 at 13:36):

I would assume rust gets that for free via LLVM

view this post on Zulip Mario Carneiro (Feb 25 2021 at 13:36):

but that doesn't necessarily work on embedded targets without a guard page

view this post on Zulip nagisa (Feb 25 2021 at 13:45):

I don't remember exact particularities, but it doesn't always work despite the probe.See e.g. https://users.rust-lang.org/t/is-rust-guaranteed-to-detect-stack-overflows/52593

view this post on Zulip Jane Lusby (Feb 25 2021 at 17:37):

nagisa said:

stack overflow produces as bad messages as they do now because you cannot really allocate anything on stack if you want to have any success of reporting the overflow

I think the interface for backtrace printing used in panics is currently non-allocating though, so this might be fine?

view this post on Zulip Edmund Cape (Feb 25 2021 at 19:04):

Hello - May you send me the audio link? I thought I had it from the previous meeting...

view this post on Zulip Joshua Nelson (Feb 26 2021 at 03:35):

Jane Lusby said:

im not sure how stack overflows are handled

there's a stack guard I think

view this post on Zulip Joshua Nelson (Feb 26 2021 at 03:35):

oh sorry nagisa already explained it better than I could


Last updated: Jan 29 2022 at 09:51 UTC