Stream: project-inline-asm

Topic: WebAssembly


Josh Triplett (Dec 15 2019 at 17:31, on Zulip):

LLVM, and Rust's current asm!, both support inline assembly for WebAssembly. Take a look at https://godbolt.org/z/h9C4KE (previously linked from an internals thread). I think it seems worthwhile to document such support in the initial RFC.

Josh Triplett (Dec 15 2019 at 17:32, on Zulip):

I'm currently looking through LLVM's source trying to find more information on the constraints LLVM supports for inline assembly on a WebAssembly target, since the documentation doesn't talk about that.

Josh Triplett (Dec 15 2019 at 17:55, on Zulip):

...interesting. Apparently that backend only supports the "r" constraint, treating it as an index for a local; you can't access things directly from the stack in any way. That makes a certain amount of sense, and hopefully the optimizer can eliminate inefficiencies like loading the same local multiple times.

Josh Triplett (Dec 15 2019 at 17:55, on Zulip):

So, it seems worth documenting how asm! will work with WebAssembly, as a somewhat unique backend that doesn't have named registers at all.

Amanieu (Dec 15 2019 at 18:04, on Zulip):

Is there any actual use case for inline asm for wasm though? The only one I can think of is hint::black_box.

Josh Triplett (Dec 15 2019 at 18:21, on Zulip):

Sure. Directly writing algorithms in assembly, experimenting with new instructions...

Lokathor (Dec 16 2019 at 16:37, on Zulip):

A direct line to WASM sqrt on stable would be nice. Currently you have to call the unstable core intrinsic (Nightly) or go through std (and the rest of the crate might generally otherwise be no_std otherwise)

Josh Triplett (Dec 16 2019 at 17:07, on Zulip):

This shouldn't stop us from stabilizing intrinsics like that, but yes.

Lokathor (Dec 16 2019 at 17:22, on Zulip):

Oh there's a whole mess of issues on that particular intrinsic :P

Josh Triplett (Dec 16 2019 at 19:26, on Zulip):

Oh? Do tell.

Lokathor (Dec 16 2019 at 22:33, on Zulip):

(deleted)

comex (Dec 17 2019 at 06:58, on Zulip):

The optimizer can't eliminate loading the same local multiple times if you do it within an asm block.

comex (Dec 17 2019 at 06:59, on Zulip):

Unless you want to deviate from the general behavior that the optimizer doesn't inspect asm blocks.

comex (Dec 17 2019 at 07:01, on Zulip):

I think that ideally we'd have a more efficient way to pass inputs to WebAssembly asm blocks, which would put things directly on the stack for you. Of course, that would require backend support, and it's far from urgent.

Josh Triplett (Dec 17 2019 at 11:27, on Zulip):

I don't mean the rustc optimizer in that case.

Josh Triplett (Dec 17 2019 at 11:28, on Zulip):

I mean the wasm optimizer. It should trivially be able to eliminate "store to 1 then load to 1".

gnzlbg (Dec 19 2019 at 09:55, on Zulip):

@Josh Triplett which wasm optimizer do you use ?

gnzlbg (Dec 19 2019 at 09:56, on Zulip):

AFAIK the way to optimize wasm is to generate it from, e.g., C code, using an optimizing compiler toolchain to WASM (this won't optimize that because the toolchain won't look at the inline assembly string)

gnzlbg (Dec 19 2019 at 09:57, on Zulip):

but once that is generated, most (all?) of the WASM -> machine code generators will just generate machine code without performing any significant optimizations

gnzlbg (Dec 19 2019 at 09:57, on Zulip):

and AFAIK none of them optimizes across two consecutive wasm instructions

gnzlbg (Dec 19 2019 at 09:59, on Zulip):

Is there a tool that takes WASM and optimizes it into better WASM ?

Josh Triplett (Dec 19 2019 at 15:32, on Zulip):

There's wasm-opt.

gnzlbg (Dec 23 2019 at 12:37, on Zulip):

Cool! Does it run as part of the LLVM pipeline ?

Lokathor (Dec 23 2019 at 14:46, on Zulip):

I believe that one is generally expected to run it themselves as part of their make script

gnzlbg (Dec 28 2019 at 13:34, on Zulip):

We would kind of need LLVM to guarantee never to run those as part of its pipeline.

gnzlbg (Dec 28 2019 at 13:34, on Zulip):

(If the semantics of asm!("...") are that we never inspect the assembly string, since otherwise running those would "modify" the assembly string)

gnzlbg (Dec 28 2019 at 13:35, on Zulip):

That is, an argument in favor of not specifying that the assembly string is preserved "verbatim" (modulo interpolation), is that it prevents optimizations on the resulting machine code, e.g., via wasm-opt for WASM.

Amanieu (Dec 28 2019 at 17:28, on Zulip):

I personally think that we should simply not expose inline asm for wasm.

Amanieu (Dec 28 2019 at 17:28, on Zulip):

On most architectures, people really do rely on the fact that the exact instructions specified in the asm block are the ones that are executed. This is observable in various ways such as signal handlers, debugging through ptrace, and even just reading the instruction bytes from memory. None of these are available in WASM, which is why the compiler is allowed to mess around with instructions at the machine code level.

Lokathor (Dec 28 2019 at 18:43, on Zulip):

We should expose it

Lokathor (Dec 28 2019 at 18:43, on Zulip):

We should simply say that on wasm it's not assured to include those final instructions, if we have to do that

Lokathor (Dec 28 2019 at 20:35, on Zulip):

What if we initially allowed inline wasm but not with the volatile effect?

Amanieu (Dec 28 2019 at 20:53, on Zulip):

I'd rather just allow it with that caveat than mess with the asm! flags.

Lokathor (Dec 29 2019 at 01:16, on Zulip):

Also works

gnzlbg (Jan 02 2020 at 13:58, on Zulip):

The only thing that worries me is that this is true for WASM today

gnzlbg (Jan 02 2020 at 13:58, on Zulip):

its unclear to me whether this will continue to be true for WASM in the future

gnzlbg (Jan 02 2020 at 13:59, on Zulip):

For example, consider https://github.com/WebAssembly/simd/issues/118

gnzlbg (Jan 02 2020 at 14:00, on Zulip):

For some final "targets" some WASM code generators can lower some vector shuffle sequences more efficiently than others

gnzlbg (Jan 02 2020 at 14:00, on Zulip):

that is, people actually want to specify different vector shuffles for WASM depending on which engine and which final target is that WASM going to end up running

gnzlbg (Jan 02 2020 at 14:01, on Zulip):

One can currently use inline assembly to tell LLVM to emit a particular sequence of instructions, e.g., that is known to be faster than the "optimized WASM" on say V8 targetting x86

gnzlbg (Jan 02 2020 at 14:03, on Zulip):

Arguably, this is a very hacky thing to do in the first place, but these hacks all break down if LLVM is able to optimize the WASM inline assembly, or if a generic WASM optimizer is used after the fact

Lokathor (Jan 02 2020 at 21:01, on Zulip):

well nothing can ever assure that no one anywhere will use wasm-opt after the fact

Lokathor (Jan 02 2020 at 21:01, on Zulip):

the same as you can't assure the programmer that no one will call strip or objcopy on their program after the fact

Lokathor (Jan 02 2020 at 21:02, on Zulip):

so it comes down to how much info LLVM and other backend options will allow us to specify regarding "please don't do anything to alter this text",

Lokathor (Jan 02 2020 at 21:03, on Zulip):

and if the final wasm is then further post-processed after rustc puts it on disk, that's user error

Josh Triplett (Jan 02 2020 at 21:11, on Zulip):

Not necessarily user error. It's perfectly acceptable to run wasm-opt or similar tools.

Josh Triplett (Jan 02 2020 at 21:11, on Zulip):

We should guarantee that rustc won't unexpectedly modify the assembly.

Josh Triplett (Jan 02 2020 at 21:12, on Zulip):

I don't see that as incompatible with providing options that allow the user to invoke wasm-opt or similar, or with letting LLVM's backend run wasm-opt or equivalent, as long as there's a way to avoid that if you don't want it.

gnzlbg (Jan 03 2020 at 13:59, on Zulip):

@Lokathor @Josh Triplett FWIW I was responding to @Amanieu :

I personally think that we should simply not expose inline asm for wasm.

On most architectures, people really do rely on the fact that the exact instructions specified in the asm block are the ones that are executed. This is observable in various ways such as signal handlers, debugging through ptrace, and even just reading the instruction bytes from memory. None of these are available in WASM, which is why the compiler is allowed to mess around with instructions at the machine code level.

by providing an example where WASM users are already relying today on inline assembly not modifying their assembly strings.

Last update: Jan 28 2020 at 00:40UTC