Stream: t-compiler/wg-llvm

Topic: Intrinsic for freezing bytes


aqjune (Sep 19 2019 at 10:29, on Zulip):

Hi all,

There was a discussion at https://bugs.llvm.org/show_bug.cgi?id=42435 with @RalfJ about introducing an operation that replaces uninitialized bytes with arbitrary(but defined) bytes, which will be helpful for implementing fast memory initialization in Rust (e.g. https://github.com/rust-lang/rust/pull/58363 ).
Here is a prototype of a patch that adds intrinsic llvm.freeze_mem to LLVM: https://github.com/aqjune/llvm-freeze/tree/freeze_mem .
It is either lowered into no-op assembly, or single round-trip of one-byte load/store assembly to address the case when OS may change bytes of untouched memory.

Rather than pushing this patch by myself, it will be great if anyone who is working both in LLVM and Rust could take this patch and drive instead, because I and colleagues are already pushing freeze instruction to LLVM (pushing llvm.freeze_mem by us might possibly distract people (e.g. people can assume that we're suggesting an alternative solution to freeze, which is not), which is not desirable).

Currently this patch works without compiler crash and is tested with SPEC CPU2017 (I tested with Clang that inserts llvm.freeze_mem after alloca). It is tested on x86-64, but not on other architectures.
Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.

Introducing anyone who might be interested in this patch is greatly welcome.
I’ll gladly be of help if have any inquiry. :)

gnzlbg (Sep 19 2019 at 12:56, on Zulip):

If it is true, this is lowered into assembly that reads a random one byte
between %ptr and %ptr + %size - 1 and writes it back to the same
address.

Note that this is probably overkill

gnzlbg (Sep 19 2019 at 12:56, on Zulip):

You only need to touch one byte per memory page.

gnzlbg (Sep 19 2019 at 12:59, on Zulip):

After freeze_mem(ptr, size, touch_mem) is called, it is
guaranteed that there is no undef or poison bit in bytes from ptr to
ptr + size - 1. Calculation of ptr + size - 1 follows the way how
getelementptr is evaluated.

There are no tests checking that this is the case AFAICT

gnzlbg (Sep 19 2019 at 13:00, on Zulip):

Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.

Is there a different patch including these ?

gnzlbg (Sep 19 2019 at 13:01, on Zulip):

If touch_mem is true and size is larger than the size of one page
(which is determined at runtime),
only one of the pages that this freeze_mem spans is touched.

This behavior is incorrect right ?

gnzlbg (Sep 19 2019 at 13:01, on Zulip):

Each page must be touched

gnzlbg (Sep 19 2019 at 13:01, on Zulip):

otherwise the content of these other pages has not been frozen

aqjune (Sep 19 2019 at 13:23, on Zulip):

Hi @gnzlbg ! :D

If touch_mem is true and size is larger than the size of one page
(which is determined at runtime),
only one of the pages that this freeze_mem spans is touched.

This behavior is incorrect right ?

It depends on how Rust inserts llvm.freeze_mem.
In my patch, I was assuming that Rust should insert a loop that looks like this:

while (ptr < ptr_end) {
  llvm.freeze_mem(ptr, min(PAGE_SIZE, ptr_end - ptr))
  ptr += PAGE_SIZE
}

Another choice would be to let llvm.freeze_mem emit assembly with loops, but guess the result would be the same if LLVM's loop analyzer is taught about what llvm.freeze_mem is.

aqjune (Sep 19 2019 at 13:25, on Zulip):

Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.

Is there a different patch including these ?

No, this was kind of a very beginning phase, so there was no updates on optimizers yet.

aqjune (Sep 19 2019 at 13:26, on Zulip):

You only need to touch one byte per memory page.

This is true - but it is pretty hard to statically determine in a compiler whether two allocations are in the same page, I believe, especially because unused mallocs can be removed.

gnzlbg (Sep 19 2019 at 13:27, on Zulip):

In my patch, I was assuming that Rust should insert a loop that looks like this:

That makes sense

gnzlbg (Sep 19 2019 at 13:29, on Zulip):

I'm not sure whether it is better to have an intrinsic with the touch option, or two intrinsics, but that probably does not matter

gnzlbg (Sep 19 2019 at 13:30, on Zulip):

It would be great if there would be an optimization to turn a freeze with touch == true to touch == false

gnzlbg (Sep 19 2019 at 13:30, on Zulip):

E.g., if the memory is on the stack, then there is no need to touch it, ever, AFAICT

gnzlbg (Sep 19 2019 at 13:32, on Zulip):

What kind of load does freeze with touch = true currently perform ? In the code it seems that it does a volatile load

gnzlbg (Sep 19 2019 at 13:32, on Zulip):

But wouldn't this be UB if that load where to race ?

aqjune (Sep 19 2019 at 13:33, on Zulip):

It would be great if there would be an optimization to turn a freeze with touch == true to touch == false

Yep, this will be a great optimization. The transformation is valid when any store to a pointer within the range precedes or directly follows the llvm.freeze_mem.

gnzlbg (Sep 19 2019 at 13:33, on Zulip):

Avoiding the race would need a volatile relaxed load, but I don't know if that is supported on all backends

aqjune (Sep 19 2019 at 13:36, on Zulip):

But wouldn't this be UB if that load where to race ?

Yeah, data race should be UB by definition. It can raise UB if the given pointer is not dereferenceable (e.g. dangling pointer) as well
If touch_mem is true, It is marked as volatile for convenience but not really needed to be volatile load, because adding a new memory attribute in SelectionDAG seemed to be a pretty hard work (it is just used for convenience).

aqjune (Sep 19 2019 at 13:36, on Zulip):

Not by definition, sorry. It should be added to the definition in LangRef.

gnzlbg (Sep 19 2019 at 13:39, on Zulip):

I see. So I'd recommend updating the docs to include this information.

RalfJ (Oct 09 2019 at 12:55, on Zulip):

You only need to touch one byte per memory page.

This is true - but it is pretty hard to statically determine in a compiler whether two allocations are in the same page, I believe, especially because unused mallocs can be removed.

it should be fairly easy though to do a loop that increments by 4096 and touches bytes until leaving the given range? the backend should know the page size

RalfJ (Oct 09 2019 at 12:56, on Zulip):

@aqjune I think touching all the pages is needed or else we cannot give this intrinsics a sensible semantics in the LLVM abstract machine

RalfJ (Oct 09 2019 at 12:56, on Zulip):

saying that "it freezes memory" is just incorrect with a compilation scheme that does not touch all pages

RalfJ (Oct 09 2019 at 12:57, on Zulip):

also read-write races are not UB in LLVM, so LLVM lowering could easily do a normal non-atomic read here

aqjune (Oct 10 2019 at 06:09, on Zulip):

I am not sure whether there is an assembly instruction that returns the page size, but in the frontend linux has getpagesize() api. Calling getpagesize() may have a cost, so I believe it is good for programmer to explicitly call this so one can write optimized code.

aqjune (Oct 10 2019 at 06:11, on Zulip):

For the semantics of llvm.freeze_mem(ptr, size), we can say that it freezes one of the pages that [ptr, ptr+size) lies on. We can say that the page is given from the environment, as the pointer size does.

aqjune (Oct 10 2019 at 06:15, on Zulip):

BTW, one thing that I realized about llvm.freeze_mem(ptr, size) was that, if (ptr + size - 1) and ptr are in different pages, the touching *ptr is not enough to freeze the page of *(ptr + size - 1).
To support this, llvm.freeze_mem should be lowered into touches to *ptr and *(ptr + size - 1); I hope this would work.

RalfJ (Oct 10 2019 at 08:21, on Zulip):

For the semantics of llvm.freeze_mem(ptr, size), we can say that it freezes one of the pages that [ptr, ptr+size) lies on. We can say that the page is given from the environment, as the pointer size does.

so pages are part of the LLVM abstract machine now? I think that'd be new

Last update: Nov 15 2019 at 10:50UTC