Hi all,
There was a discussion at https://bugs.llvm.org/show_bug.cgi?id=42435 with @RalfJ about introducing an operation that replaces uninitialized bytes with arbitrary(but defined) bytes, which will be helpful for implementing fast memory initialization in Rust (e.g. https://github.com/rust-lang/rust/pull/58363 ).
Here is a prototype of a patch that adds intrinsic llvm.freeze_mem
to LLVM: https://github.com/aqjune/llvm-freeze/tree/freeze_mem .
It is either lowered into no-op assembly, or single round-trip of one-byte load/store assembly to address the case when OS may change bytes of untouched memory.
Rather than pushing this patch by myself, it will be great if anyone who is working both in LLVM and Rust could take this patch and drive instead, because I and colleagues are already pushing freeze instruction to LLVM (pushing llvm.freeze_mem by us might possibly distract people (e.g. people can assume that we're suggesting an alternative solution to freeze, which is not), which is not desirable).
Currently this patch works without compiler crash and is tested with SPEC CPU2017 (I tested with Clang that inserts llvm.freeze_mem after alloca). It is tested on x86-64, but not on other architectures.
Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.
Introducing anyone who might be interested in this patch is greatly welcome.
I’ll gladly be of help if have any inquiry. :)
If it is true, this is lowered into assembly that reads a random one byte
between%ptr
and%ptr + %size - 1
and writes it back to the same
address.
Note that this is probably overkill
You only need to touch one byte per memory page.
After
freeze_mem(ptr, size, touch_mem)
is called, it is
guaranteed that there is no undef or poison bit in bytes fromptr
to
ptr + size - 1
. Calculation ofptr + size - 1
follows the way how
getelementptr is evaluated.
There are no tests checking that this is the case AFAICT
Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.
Is there a different patch including these ?
If
touch_mem
is true andsize
is larger than the size of one page
(which is determined at runtime),
only one of the pages that thisfreeze_mem
spans is touched.
This behavior is incorrect right ?
Each page must be touched
otherwise the content of these other pages has not been frozen
Hi @gnzlbg ! :D
If
touch_mem
is true andsize
is larger than the size of one page
(which is determined at runtime),
only one of the pages that thisfreeze_mem
spans is touched.This behavior is incorrect right ?
It depends on how Rust inserts llvm.freeze_mem.
In my patch, I was assuming that Rust should insert a loop that looks like this:
while (ptr < ptr_end) { llvm.freeze_mem(ptr, min(PAGE_SIZE, ptr_end - ptr)) ptr += PAGE_SIZE }
Another choice would be to let llvm.freeze_mem
emit assembly with loops, but guess the result would be the same if LLVM's loop analyzer is taught about what llvm.freeze_mem
is.
Also, several memory related optimizations should be updated so they’re aware of this intrinsics, which is not included in the patch above.
Is there a different patch including these ?
No, this was kind of a very beginning phase, so there was no updates on optimizers yet.
You only need to touch one byte per memory page.
This is true - but it is pretty hard to statically determine in a compiler whether two allocations are in the same page, I believe, especially because unused mallocs can be removed.
In my patch, I was assuming that Rust should insert a loop that looks like this:
That makes sense
I'm not sure whether it is better to have an intrinsic with the touch
option, or two intrinsics, but that probably does not matter
It would be great if there would be an optimization to turn a freeze with touch == true to touch == false
E.g., if the memory is on the stack, then there is no need to touch it, ever, AFAICT
What kind of load does freeze with touch = true currently perform ? In the code it seems that it does a volatile load
But wouldn't this be UB if that load where to race ?
It would be great if there would be an optimization to turn a freeze with touch == true to touch == false
Yep, this will be a great optimization. The transformation is valid when any store to a pointer within the range precedes or directly follows the llvm.freeze_mem
.
Avoiding the race would need a volatile relaxed load, but I don't know if that is supported on all backends
But wouldn't this be UB if that load where to race ?
Yeah, data race should be UB by definition. It can raise UB if the given pointer is not dereferenceable (e.g. dangling pointer) as well
If touch_mem is true, It is marked as volatile for convenience but not really needed to be volatile load, because adding a new memory attribute in SelectionDAG seemed to be a pretty hard work (it is just used for convenience).
Not by definition, sorry. It should be added to the definition in LangRef.
I see. So I'd recommend updating the docs to include this information.
You only need to touch one byte per memory page.
This is true - but it is pretty hard to statically determine in a compiler whether two allocations are in the same page, I believe, especially because unused mallocs can be removed.
it should be fairly easy though to do a loop that increments by 4096 and touches bytes until leaving the given range? the backend should know the page size
@aqjune I think touching all the pages is needed or else we cannot give this intrinsics a sensible semantics in the LLVM abstract machine
saying that "it freezes memory" is just incorrect with a compilation scheme that does not touch all pages
also read-write races are not UB in LLVM, so LLVM lowering could easily do a normal non-atomic read here
I am not sure whether there is an assembly instruction that returns the page size, but in the frontend linux has getpagesize() api. Calling getpagesize() may have a cost, so I believe it is good for programmer to explicitly call this so one can write optimized code.
For the semantics of llvm.freeze_mem(ptr, size), we can say that it freezes one of the pages that [ptr, ptr+size) lies on. We can say that the page is given from the environment, as the pointer size does.
BTW, one thing that I realized about llvm.freeze_mem(ptr, size) was that, if (ptr + size - 1) and ptr are in different pages, the touching *ptr is not enough to freeze the page of *(ptr + size - 1).
To support this, llvm.freeze_mem should be lowered into touches to *ptr and *(ptr + size - 1); I hope this would work.
For the semantics of llvm.freeze_mem(ptr, size), we can say that it freezes one of the pages that [ptr, ptr+size) lies on. We can say that the page is given from the environment, as the pointer size does.
so pages are part of the LLVM abstract machine now? I think that'd be new