Stream: t-compiler

Topic: Optimizing away bounds checks


Josh Triplett (May 07 2020 at 23:15, on Zulip):

I'm trying to figure out under what circumstances rust can optimize away bounds checks if it knows an index is in bounds. I wrote a very simple test case, and can't seem to get rust to omit the bounds checks, even if I use assert!. (I know that I could trivially write it using iterators instead, but I'm trying to figure out rust's ability to optimize here.)

Josh Triplett (May 07 2020 at 23:15, on Zulip):

I wrote this:

#[no_mangle]
fn f(slice: &[u64], start: usize, end: usize) -> u64 {
    let mut total = 0;
    assert!(start < end && start < slice.len());
    for i in start..end {
        total += slice[i];
    }
    total
}
Josh Triplett (May 07 2020 at 23:16, on Zulip):

I put that into the compiler explorer, with -O, and the resulting assembly looks like this:

f:
        push    rax
        cmp     rsi, rdx
        jbe     .LBB5_6
        cmp     rdx, rcx
        jae     .LBB5_6
        xor     eax, eax
.LBB5_3:
        cmp     rdx, rsi
        jae     .LBB5_7
        add     rax, qword ptr [rdi + 8*rdx]
        add     rdx, 1
        cmp     rcx, rdx
        jne     .LBB5_3
        pop     rcx
        ret
.LBB5_7:
        lea     rax, [rip + .L__unnamed_5]
        mov     rdi, rdx
        mov     rdx, rax
        call    qword ptr [rip + core::panicking::panic_bounds_check@GOTPCREL]
        ud2
.LBB5_6:
        call    std::panicking::begin_panic
        ud2
Josh Triplett (May 07 2020 at 23:18, on Zulip):

Based on the x86 calling convention, rdi contains the slice base address, rsi contains the slice length, rdx contains start, and rcx contains end.

Josh Triplett (May 07 2020 at 23:18, on Zulip):

So, the first two comparisons verify the assertion and jump to .LBB5_6 if it fails, to panic.

Josh Triplett (May 07 2020 at 23:19, on Zulip):

Then inside the loop, there's still a comparison of rdx to rsi, and a jump to .LBB5_7 to panic if out of bounds.

Josh Triplett (May 07 2020 at 23:19, on Zulip):

That's exactly the same comparison.

Josh Triplett (May 07 2020 at 23:20, on Zulip):

Should rustc be able to optimize away that bounds check?

Josh Triplett (May 07 2020 at 23:48, on Zulip):

Submitted as https://github.com/rust-lang/rust/issues/71997 .

nikomatsakis (May 08 2020 at 12:47, on Zulip):

it would be good, @Josh Triplett, to create comparable C code and compare with clang

Josh Triplett (May 08 2020 at 17:38, on Zulip):

@nikomatsakis What would be the equivalent of a bounds-checked slice in C?

Bastian Kauschke (May 08 2020 at 17:47, on Zulip):

manually checking for idx < len I guess :laughing:

Hanna Kruppe (May 08 2020 at 17:59, on Zulip):

If you're fine with C++, you could use std::vector and its at(i) method, I guess?

Josh Triplett (May 08 2020 at 18:19, on Zulip):

@Hanna Kruppe That works. I'm not even slightly experienced with modern C++, but I know how to use STL.

Hanna Kruppe (May 08 2020 at 18:20, on Zulip):

I meant more like, if C++ is as good as C as point of comparison :p

Josh Triplett (May 08 2020 at 18:21, on Zulip):

Ah, fair.

Josh Triplett (May 08 2020 at 18:21, on Zulip):

Worth trying regardless.

nikomatsakis (May 08 2020 at 18:34, on Zulip):

Heh, a fair question, I think C++ with at would be fair

nikomatsakis (May 08 2020 at 18:34, on Zulip):

I'm not so interested in comparing against languages

nikomatsakis (May 08 2020 at 18:34, on Zulip):

as I am in checking whether clang/LLVM does this sort of optimization in ideal cases

nikomatsakis (May 08 2020 at 18:34, on Zulip):

and trying to see what aspect of Rust's translation is preventing it from doing so, if so

nikomatsakis (May 08 2020 at 18:35, on Zulip):

in other words, rustc wouldn't be doing this optimization, llvm would

oli (May 08 2020 at 18:36, on Zulip):

but... but... what if wg-mir-opt does it and then noone notices? :P

nikomatsakis (May 08 2020 at 18:40, on Zulip):

@oli I didn't mean that to imply rustc couldn't :)

nikomatsakis (May 08 2020 at 18:40, on Zulip):

or shouldn't

Josh Triplett (May 08 2020 at 18:42, on Zulip):

Language comparisons aside, it turns out that Rust will do this optimization in some cases, but there's what appears to be an off-by-one issue preventing it from doing so in this case.

Josh Triplett (May 08 2020 at 18:42, on Zulip):

See https://github.com/rust-lang/rust/issues/71997#issuecomment-625958662 and the comment above it.

Josh Triplett (May 08 2020 at 18:43, on Zulip):

(That's leaving aside that the assert! should be enough, without the min.)

Josh Triplett (May 08 2020 at 19:01, on Zulip):

I just confirmed that naively translated test cases in C++ have about the same effect.

Josh Triplett (May 08 2020 at 19:02, on Zulip):

clang with C++ can't eliminate the bounds check for .at(i) with just the assert, nor can it with if (end > slice.size()) end = slice.size();, but it can if I change that to >=.

Josh Triplett (May 08 2020 at 19:02, on Zulip):

And the generated code looks approximately the same, modulo how C++ spells the "panic on bounds check failure".

Nikita Popov (May 09 2020 at 18:50, on Zulip):

I poked at the C++ case a bit. The bounds check gets hoisted out of the loop (this is a new "loop exit predication" feature in LLVM 10), but not optimized away. (It would be possible to get a pretty good result by performing a loop unswitch after induction variable simplification, which would keep only a single unnecessary check before the loop.)

Nikita Popov (May 09 2020 at 18:52, on Zulip):

Neither SCEV nor InstCombine manage to fold the hoisted check. On the SCEV level we need to prove that (((-1 * %start) + (%slice_len umax %start)) umin (-1 + (-1 * %start) + %end)) < ((-1 * %start) + (%slice_len umax %start)), which is not so easy. The %slice_len umax %start parts really should get folded, but don't, because only non-recursive reasoning is used for that, while loop-entry-guard reasoning is needed to simplify.

Nikita Popov (May 09 2020 at 18:55, on Zulip):

For the rest, things would go well if we could eliminate the (-1 * %start) part from both sides, but that's only possible under nowrap conditions. Which we do have here, but that requires reasoning about multiple conditions at the same time.

Josh Triplett (May 09 2020 at 20:38, on Zulip):

When I tested with godbolt and LLVM trunk, it didn't pull the bounds check out of the lop.

Last update: Jun 04 2020 at 18:15UTC