Stream: t-compiler/wg-llvm

Topic: add nsw + sadd.with.overflow


dlrobertson (Mar 02 2019 at 00:02, on Zulip):

@scottmcm mentioned https://bugs.llvm.org/show_bug.cgi?id=38146 to me a while ago as an upstream LLVM bug that could improve Range::step_by performance

dlrobertson (Mar 02 2019 at 00:04, on Zulip):

So would it fall within the goals of this working group?

nagisa (Mar 02 2019 at 01:14, on Zulip):

yes.

Nikita Popov (Mar 02 2019 at 13:02, on Zulip):

I believe that one is already resolved by the switch to the usub.add intrinsics, as far as Range::step_by is concerned.

Nikita Popov (Mar 02 2019 at 13:11, on Zulip):

Though looking at the code it's not clear to me where it would be relevant. In any case, it would be good to have that fold.

Nikita Popov (Mar 02 2019 at 13:14, on Zulip):

If you're interested in doing this, the place to implement it would be around https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L2019.

dlrobertson (Mar 02 2019 at 17:49, on Zulip):

This may have actually already been fixed.
EDIT: nvm it was a poorly written test

dlrobertson (Mar 02 2019 at 19:28, on Zulip):

I've got the following to pass llvm-lit, but I've got some serious cleanup to do.

; RUN: opt < %s -instcombine -S | FileCheck %s

declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32)

define { i32, i1 } @fold_uadd_with_overflow(i32) {
  ; CHECK-Label: @fold_add_with_overflow
  ; CHECK:       %2 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %0, i32 20)
  ; CHECK-NEXT:  ret { i32, i1 } %2
  %2 = add nuw i32 %0, 7
  %3 = tail call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %2, i32 13)
  ret { i32, i1 } %3
}
dlrobertson (Mar 02 2019 at 19:29, on Zulip):

In theory I think we could tackle (s|u)mul and (s|u)sub, but I'm super new to LLVM, so I'd like to keep the scope as small as possible

dlrobertson (Mar 02 2019 at 19:30, on Zulip):

So I'll just stick to (s|u)add

nagisa (Mar 02 2019 at 19:33, on Zulip):

I'd like to keep the scope as small as possible

thats what LLVM people prefer either way

dlrobertson (Mar 03 2019 at 02:24, on Zulip):

https://reviews.llvm.org/D58881

dlrobertson (Mar 03 2019 at 02:25, on Zulip):

Decided to just stick to sadd.with.overflow. I'll submit a follow-up for uadd.with.overflow

scottmcm (Mar 04 2019 at 05:08, on Zulip):

I think there's a whole series of potential work related to the overflow intrinsics. For example, it seems like LLVM doesn't know that a non-overflowing multiplication is a multiple of its operands: https://rust.godbolt.org/z/dNB2xK

Nikita Popov (Mar 04 2019 at 09:19, on Zulip):

@scottmcm These kind of optimizations are pretty hard to do, because they only apply in code conditioned on the overflow check. That's not a check that can be performed in InstCombine. There's really no good place where such an optimization can go right now.

Nikita Popov (Mar 04 2019 at 09:21, on Zulip):

I think InstCombine does these kinds of optimizations (not for overflow but for comparisons) in some really primitive fashion, like only if the condition is directly on the only predecessor block. Anything else would be too expensive.

scottmcm (Mar 06 2019 at 09:25, on Zulip):

@Nikita Popov Ah, so basically anything behind control flow (as opposed to direct SSA use/defs) is impractical, basically? Makes sense; thanks.

Nikita Popov (Mar 10 2019 at 22:27, on Zulip):

Status update on sadd.with.overflow: https://reviews.llvm.org/D58881 has landed and @dlrobertson has submitted a followup in https://reviews.llvm.org/D59071. I'm trying to lay some groundwork to turn this into a more general overflow check optimization.

dlrobertson (Mar 11 2019 at 01:10, on Zulip):

@Nikita Popov awesome work!

dlrobertson (Mar 11 2019 at 01:24, on Zulip):

Still trying to wrap my head around it a bit, but I think I see where you're going with it.

dlrobertson (Mar 11 2019 at 01:27, on Zulip):

Are there good docs on how the internals of LLVM work? E.g. the rustc guide or even just a few good descriptions of ValueTracking, instcombine, instsimplify, etc?

Nikita Popov (Mar 16 2019 at 12:34, on Zulip):

As so often, I don't think there's any good docs apart from the language reference and (somewhat less useful) the programmer manual.

Nikita Popov (Mar 16 2019 at 12:38, on Zulip):

Mostly a matter of looking at what the code around you is doing and trying to copy it :) As to the ones you mentioned, the important distinction is that ConstantFolding deals with folding instructions that have only constant or undef operands, InstSimplify deals with simplifications that don't create new instructions and are relatively cheap (no recursive analysis) and InstCombine does anything else, with the restriction that it should never increase instruction count, to avoid looping. ValueTracking are helper utilities to determine various properties of instructions (known bits, known sign bits, etc), often in a recursive fashion.

Nikita Popov (Mar 16 2019 at 12:41, on Zulip):

ValueTracking is a bit of a wart (imho) because the proper way to do this would be to sparsely propagate the information with a fixed point algorithm, but the recursive computations are much easier to throw into random places in the compiler. The end effect is that this information is computed again and again and a sizable chunk of compile time may be spent just in known bits calculations.

Nikita Popov (Mar 16 2019 at 12:47, on Zulip):

@dlrobertson In any case, I ended up hijacking the saddo mixed signed patch for the general overflow optimization... In the meantime, would you like to extend the add nsw + saddo changes to the other cases? The add nuw + uaddo case should be fairly simple and integrate well into your existing code.

Nikita Popov (Mar 16 2019 at 12:49, on Zulip):

The sub case is tricky, because LLVM canonicalizes "sub %x, C" to "add %x, -C" and looses the nuw flag while doing so. For the ssubo case, I think a good way to handle it is to also canonicalize ssubo(X, C) to saddo(X, -C), which means that the existing saddo code will already handle it. (It also has the additional advantage that things like CSE and GVN will know that ssubo(X, C) and saddo(X, -C) are the same.)

Nikita Popov (Mar 16 2019 at 12:51, on Zulip):

I don't think there's anything that can be done in the usubo case, the necessary nuw information is lost during canonicalization, so no way to get that back...

dlrobertson (Mar 16 2019 at 13:35, on Zulip):

@Nikita Popov Thanks for the info! I read through the lang ref a few years ago, but it might be worthwhile to skim through it again.

I ended up hijacking the saddo mixed signed patch for the general overflow optimization

Np. I'll move on to uaddo.

I think a good way to handle it is to also canonicalize ssubo(X, C) to saddo(X, -C)

Interesting. I'll try it out

dlrobertson (Mar 20 2019 at 13:06, on Zulip):

The sub case is tricky, because LLVM canonicalizes "sub %x, C" to "add %x, -C" and looses the nuw flag while doing so. For the ssubo case,

Ah! This makes sense now. I was wondering why saddo (X -nsw C0), C1 -> saddo X, C0 - C1 "just works"

dlrobertson (Mar 20 2019 at 13:08, on Zulip):

Is it worth writing a test case for folding sub nsw + saddo? Or is this sufficiently covered by the tests for sub %x, C -> add %x, -C and the current saddo tests?

Nikita Popov (Mar 20 2019 at 14:59, on Zulip):

@dlrobertson Well, adding it certainly won't hurt ;) Might also test something fancy like sub nsw %x, SignedMin, which will likely not work (at a guess this is canonicalized to add %x, SignedMin without the nsw flag).

dlrobertson (Mar 20 2019 at 15:02, on Zulip):

:+1: I'll bundle it in with the NFC commit with the baseline tests for ssubo X, C -> saddo X, -C

dlrobertson (Apr 09 2019 at 13:22, on Zulip):

@Nikita Popov sorry for being absent recently... I just started a new job... Should be around a bit more again. I'm updating the ssubo diff now

Nikita Popov (Apr 10 2019 at 16:33, on Zulip):

Okay, the ssubo canonicalization has landed, the ConstantRange based overflow checks have landed and I've just added some missing AlwaysOverflows support. With that, I think we have all the missing InstCombine bits for with.overflow intrinsics.

Nikita Popov (Apr 10 2019 at 16:35, on Zulip):

I think Rust should consider moving away from uadd.with.overflow and usub.with.overflow though. These have simple IR replacements, which I suspect to optimize better than the intrinsics.

Nikita Popov (Apr 10 2019 at 16:37, on Zulip):

LLVM can form these intrinsics during CodeGenPrepare if profitable, but I think there's relatively little benefit to using them in IR and a lot of drawbacks.

nagisa (Apr 10 2019 at 19:35, on Zulip):

what are the "simple IR replacements"?

nagisa (Apr 10 2019 at 19:35, on Zulip):

doing stuff on a larger iX?

Nikita Popov (Apr 10 2019 at 22:01, on Zulip):

@nagisa The canonical patterns are %add = add iN %x, %y; %ov = icmp ult iN %add, %x and %sub = sub iN %x, %y; %ov = icmp ult %x, %y.

nagisa (Apr 11 2019 at 11:30, on Zulip):

seems fine to me?

nagisa (Apr 11 2019 at 11:31, on Zulip):

feels like an easy change to make and test

nagisa (Apr 11 2019 at 11:33, on Zulip):

But this does not generalise to multiplication, say.

nagisa (Apr 11 2019 at 11:34, on Zulip):

Is LLVM really able to figure out that it should just look at the overflow flag of the operation instead of doing the comparison on all the relevant architectures?

Nikita Popov (Apr 11 2019 at 16:17, on Zulip):

yes, this is only for uadd and usub, everything else has more complicated expansions and it makes sense to use the overflow intrinsics for that.

Nikita Popov (Apr 11 2019 at 16:19, on Zulip):

uadd.with.overflow and usub.with.overflow are formed during CGP in a target-independent way, though there are TLI hooks to opt out

Nikita Popov (Apr 11 2019 at 16:21, on Zulip):

Though having checked just now, it looks like the usubo formation is disabled by default and only x86 opts in.

nagisa (Apr 11 2019 at 16:22, on Zulip):

what about i128/u128 on, say 32-bit targets?

nagisa (Apr 11 2019 at 16:23, on Zulip):

I recall changing expansion for uaddo i128 to be better in LLVM… I think.

nagisa (Apr 11 2019 at 16:23, on Zulip):

ah no that was umulo

nagisa (Apr 11 2019 at 16:24, on Zulip):

I guess it works then.

Nikita Popov (Apr 11 2019 at 16:24, on Zulip):

It's only going to be done if uaddo/usubo are actually non-expand. If they aren't, it will be expanded back to add/sub + setcc anyway.

nagisa (Apr 11 2019 at 16:24, on Zulip):

if anybody is interested in making change to rust, I could tell where to look, it is a fairly easy change that can be done entirely in the library to test the waters.

Nikita Popov (Apr 11 2019 at 16:26, on Zulip):

https://github.com/rust-lang/rust/blob/9ebf47851a357faa4cd97f4b1dc7835f6376e639/src/libcore/num/mod.rs#L1331 and friends

Nikita Popov (Apr 11 2019 at 16:26, on Zulip):

Might be hard to get actual perf data though

dlrobertson (Apr 11 2019 at 21:41, on Zulip):

With that, I think we have all the missing InstCombine bits for with.overflow intrinsics.

Nice!

Nikita Popov (Apr 18 2019 at 16:27, on Zulip):

For the record, https://reviews.llvm.org/D60650 and https://reviews.llvm.org/D60656 improve with.overflow handling in LVI/CVP, which is where LLVM performs non-local range based optimizations.

Last update: Nov 15 2019 at 09:40UTC