Stream: t-lang/wg-unsafe-code-guidelines

Topic: Can LLVM truncate a constant?


Lokathor (Aug 19 2019 at 17:46, on Zulip):

Suppose that I have a constant for a null terminated utf16 string like this

const HELLO_UTF16_NULL: &[u16] = [104, 101, 108, 108, 111, 0];

And then in the actual program I only ever access the 0th index of that constant with something like &HELLO_UTF16_NULL[0] to get &u16 so that I have "a pointer to the start of the string" for sending over FFI.

Is LLVM ever permitted to truncate my constant? Will it remove all the data past the first index because I'm only using the first index, and then screw up my pointer that's supposed to point to the start of a complete null terminated string?

(Unlike many things we chat about here, this is a question that's less about the Rust Abstract Machine and more about what LLVM itself really guarantees and does.)

rkruppe (Aug 19 2019 at 17:49, on Zulip):

This does somewhat relate to the abstract machine, specifically pointer provenance. If taking a pointer to element zero specifically only grants permission to access that element, not the rest of the allocation (which IIRC is what stacked borrows currently does) then accessing the rest of the array from that pointer is UB even if LLVM doesn't elide the rest of the array. (And conversely, of course, if we did allow it we'd have to make sure LLVM doesn't miscompile that!). Better use .as_ptr() on the slice to be sure! It's more readable anyway IMO.

Lokathor (Aug 19 2019 at 17:55, on Zulip):

as_ptr doesn't seem to be magic, in fact its definition is about as dull as you can image, just some casting.

If I understand the &HELLO_UTF16_NULL[0] indexing process correctly, it's also grabbing &self on the slice, passing that to <[T] as Index<usize>>::index or however you name that method, which then offsets from the base address by 0 slots and gives that reference back.

So it seems like either both ways should be equally good, or both ways are in equal danger.

Lokathor (Aug 19 2019 at 17:58, on Zulip):

(the full context of the question is for use in a proc-macro, so the end user would write something like

let window_title = L!("hello");

and then L! would expand to something like

{
  const LITERAL: &[u16] = [the string literal proc expanded to u16 data with 0 on the end goes here];
  &LITERAL[0]
}

So you get back &'static u16, which will coerce to *const u16 as needed)

rkruppe (Aug 19 2019 at 17:58, on Zulip):

I'm not well versed enough in stacked borrows to explain why they differ but see https://github.com/rust-lang/unsafe-code-guidelines/issues/134 for a citation.

Lokathor (Aug 19 2019 at 18:02, on Zulip):

Hmm, okay, I'm sufficiently convinced of as_ptr over &slice[0].

but, still, does that make LLVM keep the entire const allocation instead of truncating it? Does LLVM "know" that it's a pointer to the start of an array and it has to keep the whole array?

rkruppe (Aug 19 2019 at 18:04, on Zulip):

It has to if we lower that code correctly, i.e., if the pointer really can be used to access the whole allocation in LLVM's memory model (which should not be trivial to achieve)

Lokathor (Aug 19 2019 at 18:05, on Zulip):

should not be?

rkruppe (Aug 19 2019 at 18:05, on Zulip):

uh, editing mishap, fixed

Lokathor (Aug 19 2019 at 18:05, on Zulip):

ah ha! Okay then.

RalfJ (Aug 24 2019 at 11:13, on Zulip):

as_ptr doesn't seem to be magic, in fact its definition is about as dull as you can image, just some casting.

The key point is that it casts the entire slice to a raw pointer, and then casts the raw pointer to an element. That's what you want. This is very different from getting a reference to the first element (only permitted to access that one element), and then cast that to a raw pointer.
So yes, it's dull, but it's dull in the right way. &HELLO_UTF16_NULL[0] as *const _ as as_ptr() are not equivalent.

RalfJ (Aug 24 2019 at 11:13, on Zulip):

ah seems you cleared that up, good :)

Last update: Nov 19 2019 at 17:40UTC