Stream: project-inline-asm

Topic: Big endian


Amanieu (Feb 26 2020 at 15:49, on Zulip):

I think the current approach of accepting any type as long as it fits in the target register won't work on big endian.

Amanieu (Feb 26 2020 at 15:51, on Zulip):

For example if you want to put an i8 into a 128-bit SIMD register, you can either zero-extend to i128 and transmute, or convert to i8x1 and then zero-extend the vector to i8x16.

Amanieu (Feb 26 2020 at 15:52, on Zulip):

On big-endian those two operations will yield different results.

Amanieu (Feb 26 2020 at 15:52, on Zulip):

cc @Josh Triplett since you were arguing for allowing i128 operands for vector registers.

Amanieu (Feb 26 2020 at 15:53, on Zulip):

IMO we have to go back to the old system where only certain types are allowed as asm operands, depending on the register class. (Rather than just checking the size)

Josh Triplett (Feb 26 2020 at 15:55, on Zulip):

Registers don't normally have endianness, but vector registers are an interesting case...

Josh Triplett (Feb 26 2020 at 15:56, on Zulip):

But I can't imagine treating an i8 as i8x1 and extending to i8x16.

Josh Triplett (Feb 26 2020 at 15:57, on Zulip):

Trying to use an i8 where a larger register size is expected only makes sense as the equivalent of an "as" cast to me.

Josh Triplett (Feb 26 2020 at 15:57, on Zulip):

(also, that would be a sign extend)

Josh Triplett (Feb 26 2020 at 15:58, on Zulip):

All that said, I don't know that I would expect to be able to use an i8 as a 128-bit register directly. I expect that to work on a general purpose register because it may be already stored in one.

Amanieu (Feb 26 2020 at 16:07, on Zulip):

Another example on AArch64. There are separate instructions for loading an i64 and i8x8 from memory, and they return different register contents on big endian from the same bytes in memory.

Amanieu (Feb 26 2020 at 16:22, on Zulip):

Well then, time to redesign this shit again...

Josh Triplett (Feb 26 2020 at 18:34, on Zulip):

I think we'd never want to convert to a vector "implicitly"

Josh Triplett (Feb 26 2020 at 18:34, on Zulip):

But converting from i8 to an i64 register makes sense.

Josh Triplett (Feb 26 2020 at 18:34, on Zulip):

It makes even more sense on x86 where it may already be in al.

Amanieu (Feb 26 2020 at 18:51, on Zulip):

My current plan is to specify the contents of registers as if they loaded the input value using X instruction.

Amanieu (Feb 26 2020 at 18:52, on Zulip):

So for example, on ARM vector would be as if loaded by the LD1 instruction, which handles big-endian correctly.

Amanieu (Feb 26 2020 at 18:53, on Zulip):

Or on x86 for k[1-7] registers, it is specified as if the value was loaded by KMOVB, KMOVW, KMOVD, KMOVQ.

Josh Triplett (Feb 26 2020 at 19:11, on Zulip):

That sounds reasonable to me.

Amanieu (Feb 26 2020 at 19:12, on Zulip):

But this means that loading i8 into an SSE register is not allowed

Josh Triplett (Feb 26 2020 at 19:12, on Zulip):

I can live with that.

Amanieu (Feb 26 2020 at 19:12, on Zulip):

Because there is no instruction for that. There is only movss and movsd for i32 and i64

Josh Triplett (Feb 26 2020 at 19:12, on Zulip):

I expect to be able to load an i8, i16, i32, or i64 into a general-purpose register.

Josh Triplett (Feb 26 2020 at 19:13, on Zulip):

But I don't necessarily expect to be able to load an i8 directly into an SSE register.

Amanieu (Feb 26 2020 at 19:13, on Zulip):

Sure, there are byte load, i16 load, etc instructions

Amanieu (Feb 26 2020 at 19:13, on Zulip):

for general regs only

Josh Triplett (Feb 26 2020 at 19:13, on Zulip):

Well, there's also al. ;)

Amanieu (Feb 26 2020 at 19:14, on Zulip):

I'm using loading from memory for the spec, anything else is a compiler optimization

Josh Triplett (Feb 26 2020 at 19:14, on Zulip):

Sure.

Josh Triplett (Feb 26 2020 at 19:14, on Zulip):

Though...

Josh Triplett (Feb 26 2020 at 19:14, on Zulip):

That still leaves the question of whether we can avoid doing a movzx.

Josh Triplett (Feb 26 2020 at 19:17, on Zulip):

I would like to be able to say "that's already in al, and you asked for in(reg), so here's al".

Amanieu (Feb 26 2020 at 19:28, on Zulip):

Hmm good point. My specification does leave it somewhat ambiguous as to what value the upper bits will have.

Amanieu (Feb 26 2020 at 19:29, on Zulip):

Strictly speaking a load instruction (on ARM) will zero out the upper bits. But we don't want that since that's a useless zero-extension

Amanieu (Feb 26 2020 at 19:29, on Zulip):

/me goes back to poking at LLVM internals to figure out how this stuff works

Josh Triplett (Feb 26 2020 at 19:37, on Zulip):

What I'd like to do is to detect the type you hand the asm! directive, provide the corresponding register size, and then rely on the asm string to do with that (potentially smaller) register what it wants. Someone can always movzx from a short register into a long one, if they need a full-width value.

Lokathor (Feb 26 2020 at 19:37, on Zulip):

Does the "load with zero extension" take more time than "load without zero extension"? All the bits effectively gets determined in parallel right? Since at that level it's electrical signals per wire and such.

Josh Triplett (Feb 26 2020 at 19:37, on Zulip):

Note that that isn't giving people access to undefined bits, because being handed al does not entitle you to presume anything about the value of the upper 56 bits.

Josh Triplett (Feb 26 2020 at 19:38, on Zulip):

@Lokathor movzx costs more than "do nothing, it's already in a register". ;)

Josh Triplett (Feb 26 2020 at 19:38, on Zulip):

I think it's architecture-dependent whether movzx is more expensive than mov.

Josh Triplett (Feb 26 2020 at 19:39, on Zulip):

On a modern processor, movzx shouldn't be any more expensive than mov, but it's still more expensive than doing nothing. ;)

Amanieu (Feb 26 2020 at 19:42, on Zulip):

If you need zero-extension then you can always just as u64.

Lokathor (Feb 26 2020 at 19:43, on Zulip):

Ah, so we're comparing with doing nothing at all. Right

Amanieu (Feb 26 2020 at 19:44, on Zulip):

Basically it comes from the calling convention: on most platforms if you pass an i8 (or anything smaller than the register size), the upper bits are undefined.

Lokathor (Feb 26 2020 at 19:52, on Zulip):

For ASM stuff I've exclusively seen docs given in terms of old C code where (naturally) arguments are always just "int"

Amanieu (Feb 26 2020 at 19:55, on Zulip):

Yea inline asm is not really specified properly at all.

Josh Triplett (Feb 26 2020 at 20:33, on Zulip):

Amanieu said:

If you need zero-extension then you can always just as u64.

Agreed completely. I think in(reg) some_u8_value should pass it in the low 8-bits of a general-purpose register, and in(reg) some_u8_value as u64 can zero-extend, and those are all the options we need.

Amanieu (Feb 26 2020 at 20:36, on Zulip):

The only architecture which does not specify the upper bits as undefined is RISCV.

Amanieu (Feb 26 2020 at 20:37, on Zulip):

It's a bit weird in that if you pass a f32 into inline asm, it gets NaN-boxed into an f64.

Amanieu (Feb 26 2020 at 20:39, on Zulip):

... and since inline asm doesn't have a spec, I don't know whether this is intentional or if this is an accident of the upper bits being undefined.

Amanieu (Feb 26 2020 at 20:40, on Zulip):

(RISC-V single-precision instructions require that the upper 32 bits of the 64-bit FP register are 0xffffffff, otherwise the result is NaN)

Last update: Jul 02 2020 at 20:00UTC