Stream: t-compiler

Topic: Miscompilation with target-cpu=znver1 #63959


pnkfelix (Nov 19 2019 at 14:41, on Zulip):

Is our current goal to try to continue reducing the test case?

pnkfelix (Nov 19 2019 at 14:42, on Zulip):

in particular, have we learned enough to be able to be pretty sure that this is an LLVM bug; i.e. something that we might be able to translate to C or LLVM assembly?

pnkfelix (Nov 19 2019 at 14:49, on Zulip):

at least, that's how I interpreted eddyb's comment here; though its possible @eddyb actually meant that one could/should trace through the LLVM internals while still driving the test from rustc itself.

Siavosh Zarrasvand (Nov 19 2019 at 14:51, on Zulip):

From what I have read on the Github thread and here in this channel, we still haven't nailed it down 100%. @eddyb mentioned he had looked at the IR and ASM of files compiled with different CPU targets, and there are some differences but not further analyzed. My proposal is to compile for different targets and track down exactly what happens on the bits and byte levels when it crashes.

In theory, LLVM and the CPU OEM could just blame each other. LLVM manages to create code that compiles for both znver1 and other CPUs, it is when it is executed on znver1 that the crash happens... It's a bit of a chicken and egg situation but the overarching problem seems to me to figure out exactly what happens, and take it from there?

Please correct me if I am wrong.

pnkfelix (Nov 19 2019 at 14:52, on Zulip):

Is it at a point where we could throw our generated .bc file(s) into LLVM's bugpoint tool?

Siavosh Zarrasvand (Nov 19 2019 at 14:54, on Zulip):

I don't know what .bc files are, and not familiar with bugpoint. I've done exploit development so I would have used IDA Pro and GDB :rolling_on_the_floor_laughing: Let me read up on those and come back, either with an answer or further questions?

pnkfelix (Nov 19 2019 at 14:54, on Zulip):

oh: .bc files are the bitcode files that LLVM generates

Siavosh Zarrasvand (Nov 19 2019 at 14:55, on Zulip):

oh: .bc files are the bitcode files that LLVM generates

Are they same as the IR, Intermediate Representation?

pnkfelix (Nov 19 2019 at 14:55, on Zulip):

they are basically a binary file representation of the low-level internal representation that LLVM uses. (And .ll files are a human-readable (or at least semi-human readable) version of that

pnkfelix (Nov 19 2019 at 14:55, on Zulip):

Yeah, .bc files hold LLVM IR

pnkfelix (Nov 19 2019 at 14:56, on Zulip):

and you can specify you want .bc output from rustc

pnkfelix (Nov 19 2019 at 14:56, on Zulip):

I cannot tell whether bugpoint is a good match for this particular bug

Siavosh Zarrasvand (Nov 19 2019 at 14:56, on Zulip):

In that case, I believe so, yes. We can reliably compile both passing examples for cpu-target=native and failing for cpu-target=znver1

pnkfelix (Nov 19 2019 at 14:56, on Zulip):

since I do not know/remember how flexible it is with how it automatically runs the test code it compiles as part of its search process.

pnkfelix (Nov 19 2019 at 14:57, on Zulip):

i.e. if I understand correctly, you need to run the generated code under Wine in order to observe the bug. I do not know if bugpoint lets one do that (but I hope it does!)

Siavosh Zarrasvand (Nov 19 2019 at 14:57, on Zulip):

It is worth a shot, the issue is, compilation works fine. It is in the execution that the znver1-targeted binary hicks-up

pnkfelix (Nov 19 2019 at 14:57, on Zulip):

Right. bugpoint can handle issues that arise from running the generated code

Siavosh Zarrasvand (Nov 19 2019 at 14:58, on Zulip):

i.e. if I understand correctly, you need to run the generated code under Wine in order to observe the bug. I do not know if bugpoint lets one do that (but I hope it does!)

I will research and give further details here. But if that doesn't work, then I assume we think using GDB and IDA Pro might not be the worst path forward?

pnkfelix (Nov 19 2019 at 14:59, on Zulip):

Sure, trying to understand the actual dynamic execution under a debugger isn't a terrible idea

Siavosh Zarrasvand (Nov 19 2019 at 14:59, on Zulip):

Aha! Lovely!
Thank you, this was highly helpful :slight_smile:

pnkfelix (Nov 19 2019 at 14:59, on Zulip):

I just figured that we might also learn a lot from finding out which LLVM pass is causing the code's behavior to change

pnkfelix (Nov 19 2019 at 14:59, on Zulip):

since presumably it is some optimization ...

pnkfelix (Nov 19 2019 at 15:00, on Zulip):

... but the other dangerous thing here is that @eddyb earlier hypothesized that the use of llvm.lifetime.start and llvm.lifetime.end could be related to the problem here

pnkfelix (Nov 19 2019 at 15:00, on Zulip):

which brings us back to something I asked near the beginning: If its possible that Rust is generating invalid LLVM IR

pnkfelix (Nov 19 2019 at 15:01, on Zulip):

Because if the problem is due to invalid use of llvm.lifetime.start and llvm.lifetime.end, (which are essentially constructs that tell LLVM's optimizers that certain values have reached the end of their dynamic extent and thus optimizations can assume they won't be used), then LLVM will just say "sure, this is a Garbage-In-Garbage-Out scenario."

pnkfelix (Nov 19 2019 at 15:02, on Zulip):

unfortunately dissecting that is probably going to require analyzing the use of llvm.lifetime.{start,end} in the generated code.

Siavosh Zarrasvand (Nov 19 2019 at 15:03, on Zulip):

Interesting, hmm... I guess we figure out. I will keep this in my mind when I am debugging.

pnkfelix (Nov 19 2019 at 15:03, on Zulip):

in any case, I'm glad to hear you were able to successfully reproduce the bug locally!

eddyb (Nov 19 2019 at 15:50, on Zulip):

@pnkfelix you'd be right about the llvm.lifetime stuff, except... effectively identical IR doesn't crash when compiled for non-windows or non-znver1

eddyb (Nov 19 2019 at 15:51, on Zulip):

it just explains why certain aspects of the Rust code influence LLVM at all, but I don't think this is about UB

eddyb (Nov 19 2019 at 15:52, on Zulip):

LLVM doesn't seem to optimize differently when it outputs a broken executable, it just uses different registers and whatnot

eddyb (Nov 19 2019 at 15:53, on Zulip):

it's possible someone dedicated enough might be able to reduce the LLVM IR, but it's hard to keep it correct (at least in the Rust code, it being safe is a good metric)

eddyb (Nov 19 2019 at 15:54, on Zulip):

@pnkfelix oh and bugpoint would likely just remove all the checks around the assert and just make LLVM trigger it...

pnkfelix (Nov 19 2019 at 15:54, on Zulip):

can you make bugpoint solely reduce the set of LLVM passes, and leave the input source unchanged?

eddyb (Nov 19 2019 at 15:54, on Zulip):

I haven't seen bugpoint success stories outside of tracking down crashes within LLVM

pnkfelix (Nov 19 2019 at 15:54, on Zulip):

interesting, okay

eddyb (Nov 19 2019 at 15:54, on Zulip):

@pnkfelix it's not a LLVM pass

eddyb (Nov 19 2019 at 15:54, on Zulip):

all of this happens after LLVM IR

eddyb (Nov 19 2019 at 15:55, on Zulip):

it's mis-instruction-selection not mis-optimization :/

pnkfelix (Nov 19 2019 at 15:55, on Zulip):

oh, okay; I had assumed that some LLVM optimization was responsible for part of the transformation at fault.

eddyb (Nov 19 2019 at 15:56, on Zulip):

there's almost nothing going on just a lot of SIMD copies and the instructions and registers chosen differ between working and broken versions

eddyb (Nov 19 2019 at 15:56, on Zulip):

I think @nagisa and I looked at something similar (but much simpler than heavy SIMD code) on ARM and it turned out to be something wrong in LLVM's information for an instruction or something like that

pnkfelix (Nov 19 2019 at 15:57, on Zulip):

@eddyb if you do not think this is about UB, does that mean we might at least be at the point where we could file a bug with LLVM with the generated .bc ?

eddyb (Nov 19 2019 at 15:57, on Zulip):

well, you almost never want to touch .bc, it's like an unstable compression format for .ll :P

rkruppe (Nov 19 2019 at 15:57, on Zulip):

Silly question but did someone try to remove just the lifetime.{start,end} calls from the LLVM IR and see if it still reproduces?

eddyb (Nov 19 2019 at 15:58, on Zulip):

but we could theoretically submit the .ll - maybe we should replace the panic with something else? like exit(1)?

eddyb (Nov 19 2019 at 15:58, on Zulip):

@rkruppe this is stack corruption bug so if the stack layout is different you can't repro

rkruppe (Nov 19 2019 at 15:58, on Zulip):

oh, too bad

eddyb (Nov 19 2019 at 15:58, on Zulip):

you need weirdly precise sizes of those variables. there's potentially a simpler repro but I'm not aware of it

nagisa (Nov 19 2019 at 15:59, on Zulip):

You will want it to be reproducible with llc when reporting upstream

eddyb (Nov 19 2019 at 15:59, on Zulip):

it requires compiling a windows program and ru- wait no

eddyb (Nov 19 2019 at 15:59, on Zulip):

@nagisa how do I link and run a windows target executable, for/on linux?

rkruppe (Nov 19 2019 at 15:59, on Zulip):

Getting rid of panic seems nice for reducing. But panicking code adds some temporaries on the stack, I think, So it might be difficult to do that while still reproducing, if it's so dependent on the frame layout.

eddyb (Nov 19 2019 at 16:00, on Zulip):

like I want windows codegen but to call linux libc functions

eddyb (Nov 19 2019 at 16:00, on Zulip):

@rkruppe it's dependent on the layout of the data that's being corrupted (with massive SIMD copies)

nagisa (Nov 19 2019 at 16:00, on Zulip):

@eddyb specifying the msvc windows target for that object should suffice in llc

eddyb (Nov 19 2019 at 16:00, on Zulip):

I think we have some flexibility otherwise

eddyb (Nov 19 2019 at 16:00, on Zulip):

@nagisa mhmm then how can I make that into a linux executable?

eddyb (Nov 19 2019 at 16:01, on Zulip):

or link it into one

eddyb (Nov 19 2019 at 16:01, on Zulip):

I couldn't tell from the assembly that it's going to corrupt the data and trigger the panic

nagisa (Nov 19 2019 at 16:01, on Zulip):

gcc object.o -o out maybe?

rkruppe (Nov 19 2019 at 16:01, on Zulip):

That seems too frankenstein to be good for the purpose of demonstrating a bug to upstream. First comment: why the hell does the target triple say windows when you run it on Linux?

eddyb (Nov 19 2019 at 16:02, on Zulip):

we can give both mingw and native repro shrug

eddyb (Nov 19 2019 at 16:03, on Zulip):

I just wanted to avoid needing mingw + wine, but I think that path would also work

eddyb (Nov 19 2019 at 16:03, on Zulip):

@Siavosh Zarrasvand do you want to spend some time on this? or should I try it? I probably won't get to it until the weekend (unless I'm tempted by how ridiculous this bug is)

eddyb (Nov 19 2019 at 16:04, on Zulip):

@rkruppe @nagisa oh the other thing is that the assert_eq prints the data so you can see it's wrong and corruption has occurred, maybe I should just call printf on the data?

rkruppe (Nov 19 2019 at 16:06, on Zulip):

Sure, that would work, if the output is not too long or hard to diff by eye. Would probably make the asm shorter, too.

eddyb (Nov 19 2019 at 16:07, on Zulip):

@Siavosh Zarrasvand so yeah maybe you can make a version that just uses printf to show the bytes and even try to get it to be #[no_std] if you want, but that probably doesn't matter much at this point

eddyb (Nov 19 2019 at 16:08, on Zulip):

@rkruppe it's like ([11; 8], [22; 8], [33; 8], [44; 8]) right now

Siavosh Zarrasvand (Nov 19 2019 at 16:08, on Zulip):

eddyb specifying the msvc windows target for that object should suffice in llc

Can't cross-compile for msvc in Ubuntu at least, linker does not exists. I had to use gnu-gcc. See my compile flags.

eddyb (Nov 19 2019 at 16:09, on Zulip):

yeah the -gnu and -msvc targets both work for this, it's really the windows calling convention that seems to be required

Siavosh Zarrasvand (Nov 19 2019 at 16:09, on Zulip):

Siavosh Zarrasvand do you want to spend some time on this? or should I try it? I probably won't get to it until the weekend (unless I'm tempted by how ridiculous this bug is)

I intend to spend time on this from Thursday and onwards... :slight_smile: But do not let me block anyone else. Please work on it if you can. I will post all my updates here so that we don't perform redundant work

eddyb (Nov 19 2019 at 16:09, on Zulip):

@rkruppe @nagisa more evil ideas: can I just set the windows calling convention on a function on a non-windows target, or will that generate weird SEH things when it shouldn't?

Siavosh Zarrasvand (Nov 19 2019 at 16:09, on Zulip):

@eddyb Sorry, forgot to tag you on the above message

nagisa (Nov 19 2019 at 16:10, on Zulip):

@eddyb yes you can

rkruppe (Nov 19 2019 at 16:10, on Zulip):

@eddyb pls stop giving me nightmare fuel

rkruppe (Nov 19 2019 at 16:10, on Zulip):

This will just make people look for bugs in these weird-ass bits

nagisa (Nov 19 2019 at 16:11, on Zulip):

x64 calling convention is universally supported across all targets AFAIK

nagisa (Nov 19 2019 at 16:11, on Zulip):

and is otherwise very well defined.

eddyb (Nov 19 2019 at 16:11, on Zulip):

I wish I had fine-grained register control because it looks like the big difference is something to do with xmm6

nagisa (Nov 19 2019 at 16:11, on Zulip):

(until you reach 128 bit stuff, that is)

eddyb (Nov 19 2019 at 16:11, on Zulip):

hey wouldn't it be funny if one of the registers was just broken but only windows used it?

nagisa (Nov 19 2019 at 16:12, on Zulip):

@eddyb would be hilarious and I kinda want it to happen just so I can return my ryzen and go get a better one :D

eddyb (Nov 19 2019 at 16:12, on Zulip):

(presumably broken on the LLVM side, our chances of finding a hardware bug that isn't even limited to AMD Ryzen seem slim to impossible)

eddyb (Nov 19 2019 at 16:12, on Zulip):

@nagisa this repros on anything that doesn't SIGILL

eddyb (Nov 19 2019 at 16:12, on Zulip):

the generated instructions don't use ymm or zmm AFAICT!

Siavosh Zarrasvand (Nov 19 2019 at 16:13, on Zulip):

hey wouldn't it be funny if one of the registers was just broken but only windows used it?

Does ubuntu PCs on znver1 targets run the binary? if not, wouldn't it be more of a znver1 issue than a windows issue?

eddyb (Nov 19 2019 at 16:13, on Zulip):

it's just that the Ryzen cost tables + windows calling convention register allocation makes LLVM generate very different machine code

Siavosh Zarrasvand (Nov 19 2019 at 16:14, on Zulip):

it's just that the Ryzen cost tables + windows calling convention register allocation makes LLVM generate very different machine code

Ah, oki, cool :slight_smile:

eddyb (Nov 19 2019 at 16:14, on Zulip):

@Siavosh Zarrasvand if you run it under wine, it will run on the CPU, wine only hooks windows APIs at the library level AFAICT

Vincent Rouillé (Nov 19 2019 at 16:36, on Zulip):

Hi, I own an AMD R7 1800X on Windows. If you need anything to help you find the root cause of this issue, just ask.

Vincent Rouillé (Nov 19 2019 at 16:49, on Zulip):

The set of instructions just before core::ptr::real_drop_in_place<syn::derive::DeriveInput> in the example given by novacrazy looks to be the same kind of those I found when I investigated #65618

eddyb (Nov 19 2019 at 16:54, on Zulip):

@Vincent Rouillé it's probably better to ignore all of the early investigation because of how simple we were able to make the whole setup (also, it's harder to debug crashes in dylibs loaded by rustc, which is where the original failures were)

eddyb (Nov 19 2019 at 16:55, on Zulip):

also, I think we sadly run out of problems we need a Windows Ryzen machine for and fell into problems we need a lowest-parts-of-LLVM expert for :(

eddyb (Nov 19 2019 at 16:56, on Zulip):

if I were to investigate this further I would just install mingw and use that with wine (or try to bypass cross-compilation entirely and just get the windows+znver1 codegen on non-Ryzen linux, which sounds doable)

eddyb (Nov 19 2019 at 16:58, on Zulip):

@nagisa at least with your llc suggestion, we can have no optimization passes run and (presumably) still repro, so I think that's significant?

eddyb (Nov 19 2019 at 16:59, on Zulip):

and maybe there are some LLVM IR instructions that aren't needed - although I don't look forward to redu- OH

eddyb (Nov 19 2019 at 17:02, on Zulip):

@pnkfelix @nagisa @rkruppe I know how to use bugpoint for this!

have a harness that tests both non-windows and/or non-znver1 and windows+znver1 and only pretends to crash/fail when only windows+znver1 produces the wrong result

eddyb (Nov 19 2019 at 17:02, on Zulip):

you couldn't just look at one of those situations because that would make bugpoint be super helpful and replace the SIMD corruption with something arbitrary that looks corrupted

eddyb (Nov 19 2019 at 17:04, on Zulip):

I suspect we shouldn't use any C library calls for this, but rather IR for a #![no_std] staticlib with one exported u64 -> u64 function that should behave like the identity function (and I guess you'd use only one of the memory locations that gets corrupted - presumably you can get 3 or so different bugpoint reductions)

Vincent Rouillé (Nov 19 2019 at 17:09, on Zulip):

In the repro case at the end of the issue, did you figure out which instructions are wrong?

eddyb (Nov 19 2019 at 17:11, on Zulip):

hmm. maybe if you compile the testcase with full debuginfo enabled (-C debuginfo=2) and get lucky with LLVM so it doesn't get optimized out and single-step through the execution, constantly printing all the locals?

cc @Siavosh Zarrasvand you said you wanted to use a debugger on this

Vincent Rouillé (Nov 19 2019 at 17:11, on Zulip):

From what I just saw: [3u8;8] is stored in rdi and untouched
([1u8;8], [2u8;8]) is stored in xmm6 and is overwritten by vmovups ymm6,ymmword ptr [rsp+0B0h]

eddyb (Nov 19 2019 at 17:12, on Zulip):

ooor someone more patient than me can read through the assembly :)

eddyb (Nov 19 2019 at 17:12, on Zulip):

wait I thought ymm wasn't used, only xmm. maybe I wasn't paying attention

eddyb (Nov 19 2019 at 17:14, on Zulip):

oh I was looking at vmovaps xmm6, xmmword ptr [rip + .LCPI5_0] and missed all the other instructions using ymm

Vincent Rouillé (Nov 19 2019 at 17:15, on Zulip):
00007FF7F4A61358  vmovaps     xmm6,xmmword ptr [__xmm@02020202020202020101010101010101 (07FF7F4A7A000h)]
//                            ^ put ([1u8;8], [2u8;8]) into xmm6
00007FF7F4A61360  lea         rsi,[rsp+80h]
00007FF7F4A61368  mov         rdi,303030303030303h
//                            ^ put [3u8;8] into rdi
00007FF7F4A61372  nop         word ptr cs:[rax+rax]
                }
                Evil {
                    data: ([1; 8], [2; 8], [3; 8]),
                    padding: MaybeUninit::uninit(),
                }
            };
            let evil2 = evil1;
00007FF7F4A61380  vmovups     ymm0,ymmword ptr [rsp+140h]
00007FF7F4A61389  vmovups     ymm1,ymmword ptr [rsp+130h]
00007FF7F4A61392  vmovups     ymm6,ymmword ptr [rsp+0B0h]
//                            ^ overwrite xmm6
00007FF7F4A6139B  vmovups     ymm2,ymmword ptr [rsp+0F0h]
00007FF7F4A613A4  vmovups     ymm3,ymmword ptr [rsp+110h]
        let allocated = opaque_iter_next(&mut allocator).unwrap();
00007FF7F4A613AD  mov         rcx,rsi
                }
                Evil {
                    data: ([1; 8], [2; 8], [3; 8]),
                    padding: MaybeUninit::uninit(),
                }
            };
            let evil2 = evil1;
00007FF7F4A613B0  vmovups     ymmword ptr [rsp+1F0h],ymm0
00007FF7F4A613B9  vmovups     ymmword ptr [rsp+1E0h],ymm1
00007FF7F4A613C2  vmovups     ymm1,ymmword ptr [rsp+0D0h]
00007FF7F4A613CB  vmovups     ymmword ptr [rsp+1C0h],ymm3
00007FF7F4A613D4  vmovups     ymmword ptr [rsp+1A0h],ymm2
00007FF7F4A613DD  vmovups     ymmword ptr [rsp+160h],ymm6
            evil2
        };
        let evil4 = evil3;
00007FF7F4A613E6  vmovups     ymmword ptr [rsp+110h],ymm3
00007FF7F4A613EF  vmovups     ymmword ptr [rsp+0F0h],ymm2
00007FF7F4A613F8  vmovups     ymmword ptr [rsp+0B0h],ymm6
                }
                Evil {
                    data: ([1; 8], [2; 8], [3; 8]),
                    padding: MaybeUninit::uninit(),
                }
            };
            let evil2 = evil1;
00007FF7F4A61401  vmovups     ymmword ptr [rsp+180h],ymm1
            evil2
        };
        let evil4 = evil3;
00007FF7F4A6140A  vmovups     ymmword ptr [rsp+0D0h],ymm1
00007FF7F4A61413  vmovups     ymm5,ymmword ptr [rsp+1F0h]
00007FF7F4A6141C  vmovups     ymm4,ymmword ptr [rsp+1E0h]
00007FF7F4A61425  vmovups     ymmword ptr [rsp+140h],ymm5
00007FF7F4A6142E  vmovups     ymmword ptr [rsp+130h],ymm4
        let allocated = opaque_iter_next(&mut allocator).unwrap();
00007FF7F4A61437  vzeroupper
00007FF7F4A6143A  call        rust_63959::opaque_iter_next<mut core::mem::maybe_uninit::MaybeUninit<rust_63959::Evil>*,core::slice::IterMut<core::mem::maybe_uninit::MaybeUninit<rust_63959::Evil>>> (07FF7F4A612F0h)
00007FF7F4A6143F  test        rax,rax
00007FF7F4A61442  je          rust_63959::main+213h (07FF7F4A61523h)
        let data = &allocated.write(evil4).data;
00007FF7F4A61448  vmovups     xmmword ptr [rax],xmm6
//                            ^ put xmm6 (i.e. garbage)  into [rax]
        let data = &allocated.write(evil4).data;
00007FF7F4A6144C  mov         qword ptr [rax+10h],rdi
//                            ^ put rdi (i.e. [3u8;8])  into [rax+10h]
eddyb (Nov 19 2019 at 17:15, on Zulip):

https://godbolt.org/z/KmMdZD is the relevant output ftr

Vincent Rouillé (Nov 19 2019 at 17:16, on Zulip):

it's easier to find where the data flow with a debugger :)

eddyb (Nov 19 2019 at 17:17, on Zulip):

I remember seeing an output like what you pasted there from a tool, ages ago, but I forget what it was

eddyb (Nov 19 2019 at 17:18, on Zulip):

and I've forgotten it's a thing at all

Vincent Rouillé (Nov 19 2019 at 17:18, on Zulip):

The heavy visual studio...

eddyb (Nov 19 2019 at 17:18, on Zulip):

ah that would explain it

eddyb (Nov 19 2019 at 17:18, on Zulip):

@Vincent Rouillé well, that gives me a pretty good idea of one way this might happen

eddyb (Nov 19 2019 at 17:19, on Zulip):

@rkruppe @nagisa @Nikita Popov do you think it's possible for LLVM to miss a conflict between xmmN and ymmN?

eddyb (Nov 19 2019 at 17:21, on Zulip):

the non-windows znver1 codegen uses 0x0303030303030303 in r14

eddyb (Nov 19 2019 at 17:21, on Zulip):

oh wow it still uses ymm6, ugh I should've paid more attention to the actual diff between those two

eddyb (Nov 19 2019 at 17:22, on Zulip):

so if I ignore the stack offsets, the use of r14 vs xmm6 are the only difference

eddyb (Nov 19 2019 at 17:23, on Zulip):

well, either r14 or vmovaps xmm0, xmmword ptr [rip + .LCPI5_0] followed by using xmm0

eddyb (Nov 19 2019 at 17:25, on Zulip):

both versions freely use all ymm registers, so I'm not sure I understand what in the calling convention makes windows use xmm6?!

eddyb (Nov 19 2019 at 17:26, on Zulip):

the windows version appears to have to save and restore xmm6, which may be the calling convention difference, but that wouldn't explain why it uses it for anything else

eddyb (Nov 19 2019 at 17:29, on Zulip):

okay I've taken a closure look and it's vmovaps xmmN, xmmword ptr [rip + .LCPI5_0] in both, but both the N and the location of the instruction differs:

Vincent Rouillé (Nov 19 2019 at 17:29, on Zulip):

in godbolt, the linux (left) is writing ([1u8;8],[2u8;8]) into xmm0 later than the windows (right) version

Vincent Rouillé (Nov 19 2019 at 17:30, on Zulip):

yeah same conclusion

eddyb (Nov 19 2019 at 17:30, on Zulip):

there is the uninlined opaque_iter_next call in between, I wonder if on windows xmm6 specifically is callee-saved, but on non-windows there are no callee-saved xmm registers?

eddyb (Nov 19 2019 at 17:31, on Zulip):

so on windows it tries to aggressively take advantage of that one callee-saved register?

eddyb (Nov 19 2019 at 17:31, on Zulip):

well, there goes my bugpoint harness, we can probably reduce the LLVM by hand now taking into account that one register usage

eddyb (Nov 19 2019 at 17:38, on Zulip):

@pnkfelix @rkruppe I should not trust my intuition with regards to why the size matters, this isn't even a stack corruption but register corruption... (before I managed to remove allocation was assuming it was an allocator size class, lol)

eddyb (Nov 19 2019 at 17:39, on Zulip):

so I think I know where that large size is coming from too! one ymm register is 4 u64s, so you need at least 4*6+1 u64s for ymm6 to be needed, that's 25. 3 of them are in data, and 22 are in padding

nagisa (Nov 19 2019 at 17:40, on Zulip):

rkruppe nagisa Nikita Popov do you think it's possible for LLVM to miss a conflict between xmmN and ymmN?

Hard to imagine this being a cae.

nagisa (Nov 19 2019 at 17:41, on Zulip):

FWIW microsoft’s x86 calling convention only uses xmm0-3 in its calling convention

eddyb (Nov 19 2019 at 17:42, on Zulip):

it only treats xmm6 as callee-save despite using ymm0-ymm6 (maybe ymm7 is also callee-save? I should check)

nagisa (Nov 19 2019 at 17:42, on Zulip):

xmm6-xmm15 are non-volatile registers, so it might be a case of it failing to spill/restore it right if it overwrites it

Vincent Rouillé (Nov 19 2019 at 17:43, on Zulip):

by the way target-cpu=core-avx2, doesn't have this problem, it still write to xmm6, but do not use ymm6.

nagisa (Nov 19 2019 at 17:43, on Zulip):

the upper portions of YMM0-15 and ZMM0-15 are considered volatile and must be considered destroyed on function calls

also as per the docs.

eddyb (Nov 19 2019 at 17:44, on Zulip):

@Vincent Rouillé I think that's where the znver1 cost tables come in, it's supposedly faster/more efficient on Ryzens to do what LLVM is doing here

eddyb (Nov 19 2019 at 17:45, on Zulip):

@nagisa okay with 22+2*4 for the padding length I can see xmm7 and xmm8 also getting callee-saved

eddyb (Nov 19 2019 at 17:45, on Zulip):

above that it crosses a threshold and uses memcpy or something

Vincent Rouillé (Nov 19 2019 at 17:49, on Zulip):

it's funny to see that at 32, it stops using ymm6 and use ymm7

eddyb (Nov 19 2019 at 17:52, on Zulip):

I summarized everything so far into https://github.com/rust-lang/rust/issues/63959#issuecomment-555627472

eddyb (Nov 19 2019 at 17:54, on Zulip):

@nagisa found at least where this is declared https://github.com/llvm/llvm-project/blob/e531750c6cf9ab6ca987ffbfe100b1d766269eb5/llvm/lib/Target/X86/X86CallingConv.td#L1071-L1072

eddyb (Nov 19 2019 at 17:54, on Zulip):

now that I know what to look for :D

eddyb (Nov 19 2019 at 17:57, on Zulip):

non-windows doesn't appear to have xmm callee-saved registers, at all? (there are some exceptions i.e. "regcall". hmmmmmmmm)

eddyb (Nov 19 2019 at 17:59, on Zulip):

we don't expose regcall in Rust?

eddyb (Nov 19 2019 at 18:00, on Zulip):

wait, now that we know it's the opaque_iter_next.... extern "win64" fn opaque_iter_next works on non-windows and produces very similar ASM...

eddyb (Nov 19 2019 at 18:06, on Zulip):

see https://github.com/rust-lang/rust/issues/63959#issuecomment-555633777

eddyb (Nov 19 2019 at 18:07, on Zulip):

I need to go grab something to eat but this feels like a big step

nagisa (Nov 19 2019 at 18:15, on Zulip):

should be trivial at this point to get to a LLVM-IR to report upstream

mati865 (Nov 19 2019 at 18:16, on Zulip):

For previous questions:
It's definitely not a hardware bug. You can reproduce crash on any CPU that has necessary registers/instructions.
This code works fine on Linux with 2nd gen Ryzen CPU.
I tried to use bugpoint but it appears to not work at all on windows.

nagisa (Nov 19 2019 at 18:16, on Zulip):

@mati865 now that reproducer works on linux, should be viable to use it

mati865 (Nov 19 2019 at 18:17, on Zulip):

Oh, neat!

eddyb (Nov 19 2019 at 18:21, on Zulip):

pnkfelix nagisa rkruppe I know how to use bugpoint for this!

have a harness that tests both non-windows and/or non-znver1 and windows+znver1 and only pretends to crash/fail when only windows+znver1 produces the wrong result

you still need something like this ^^

eddyb (Nov 19 2019 at 18:21, on Zulip):

if you want to use bugpoint

eddyb (Nov 19 2019 at 18:22, on Zulip):

but at least this time it can be an entirely linux-based procedure, and you just toggle whether opaque_id has win64 calling convention or not

mati865 (Nov 19 2019 at 18:22, on Zulip):

That's exactly what I did for creduce :upside_down:

eddyb (Nov 19 2019 at 18:22, on Zulip):

(you'd probably have to inject it into whatever bugpoint gives you, I'm guessing?)

eddyb (Nov 19 2019 at 18:24, on Zulip):

I've sunk enough time as it is into this, I guess if nobody has taken over in an hour or so I'll do that

eddyb (Nov 19 2019 at 18:26, on Zulip):

(I guess both win64 and regcall calling conventions could be attempted, I expect results from both, but maybe not identical)

mati865 (Nov 19 2019 at 18:26, on Zulip):

I'm out of time.
This probably proves #65618 is something entirely different?
It works on mingw bug doesn't with msvc.

mati865 (Nov 19 2019 at 18:27, on Zulip):

And is also related to znver1.

nagisa (Nov 19 2019 at 18:27, on Zulip):

could very well be more of the same.

eddyb (Nov 19 2019 at 18:28, on Zulip):

yeah just more sensitive to certain things

mati865 (Nov 19 2019 at 18:30, on Zulip):

Shouldn't it reproduce on windows with mingw then?
Or maybe there is some platform specific code somewhere along.

Nikita Popov (Nov 19 2019 at 21:57, on Zulip):

Looking through debug logs, it looks like originally a correct register allocation is produced, but then critical anti-dep edge breaking replaces a ymm0 use with a conflicting ymm6 use.

eddyb (Nov 19 2019 at 21:57, on Zulip):

ymm0 -> ymm6? not xmm0 -> xmm6?

eddyb (Nov 19 2019 at 21:58, on Zulip):

because xmm0 is being used when xmm6 is not available due to being callee-saved, so it would make sense that xmm0 -> xmm6 happens with win64 (or regcall, if you switch that manually in the IR)

eddyb (Nov 25 2019 at 18:14, on Zulip):

uhh it's been (far) more than an hour

eddyb (Nov 25 2019 at 18:15, on Zulip):

but right now I'm reducing bugpoint output

eddyb (Nov 25 2019 at 18:15, on Zulip):

this is what I've used to get that https://gist.github.com/eddyb/7b08cdc2ff0a40aed8adcfaf63120598

eddyb (Nov 25 2019 at 18:16, on Zulip):

bugpoint annoyingly leaves a bunch of things undef, which only do anything useful through the sheer power of strong coincidence, but I was able to get rid of it

eddyb (Nov 25 2019 at 18:43, on Zulip):

wrote it up on thread https://github.com/rust-lang/rust/issues/63959#issuecomment-558287117

eddyb (Nov 25 2019 at 18:44, on Zulip):

cc @nagisa @Nikita Popov @rkruppe does anyone volunteer to submit to LLVM?

eddyb (Nov 25 2019 at 18:44, on Zulip):

it's been a while for me

rkruppe (Nov 25 2019 at 18:45, on Zulip):

I still don't have an LLVM bugzilla account >_>

eddyb (Nov 25 2019 at 18:45, on Zulip):

oh wait I'm not sure I do, I forgot something happened at some point

eddyb (Nov 25 2019 at 18:47, on Zulip):

I guess I do, oh well I'll do it

eddyb (Nov 25 2019 at 19:22, on Zulip):

there we go: https://bugs.llvm.org/show_bug.cgi?id=44140

eddyb (Nov 25 2019 at 19:23, on Zulip):

sorry for all the delays thus far, this turned out to be far more manageable and not as much of a timesink as I was worrying

Nikita Popov (Nov 25 2019 at 22:29, on Zulip):

Based on ctoppers comment it looks like this is indeed caused by critical anti-dep breaking not handling subregs correctly. Sorry for not following up on that.

nagisa (Nov 26 2019 at 23:41, on Zulip):

@eddyb https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.2-Go-Register-Corrupt in related news.

eddyb (Nov 27 2019 at 13:03, on Zulip):

oof

pnkfelix (Nov 28 2019 at 14:30, on Zulip):

hey @eddyb given that you've identified this as an LLVM bug, what next for #63959 itself? Should I close it as "LLVM bug" ? Or downgrade it to P-medium at least? I don't think it warrants further investigation on our part, right?

eddyb (Nov 28 2019 at 16:03, on Zulip):

@pnkfelix I think the issue should be closed by a PR that updates LLVM to include a backport of the fix

pnkfelix (Nov 28 2019 at 16:04, on Zulip):

oh, right, that's probably the way to go

eddyb (Nov 28 2019 at 16:04, on Zulip):

could be done as soon as the fix lands upstream, or even earlier if we don't want to wait. I wish I knew who also handles stuff like this (i.e. LLVM updates and backports), other than @Alex Crichton. worst case someone ping me and I can probably do it

pnkfelix (Nov 28 2019 at 16:05, on Zulip):

thanks!

eddyb (Nov 28 2019 at 16:05, on Zulip):

not sure if we have a strategy for backporting LLVM updates due to LLVM backports back to beta

eddyb (Nov 28 2019 at 16:05, on Zulip):

(that was... a mouthful)

mati865 (Nov 28 2019 at 16:08, on Zulip):

Upstream fix landed in llvm master branch already.
AFAIK anybody can open backport PR, even I did it in the past.

eddyb (Nov 28 2019 at 16:12, on Zulip):

Upstream fix landed in llvm master branch already.

wheeeee

eddyb (Nov 28 2019 at 16:12, on Zulip):

wait wouldn't the bug report get closed and therefore I should've gotten an email?

mati865 (Nov 29 2019 at 09:27, on Zulip):

@eddyb should I open backport PR (after testing OFC) or do you want to do it?

eddyb (Nov 29 2019 at 15:17, on Zulip):

@mati865 you can do it, I'm swamped again

mati865 (Nov 29 2019 at 15:18, on Zulip):

Sure

Last update: Dec 12 2019 at 01:00UTC