Stream: t-lang/wg-unsafe-code-guidelines

Topic: GEP inbounds


rkruppe (May 06 2019 at 21:54, on Zulip):

@RalfJ not to derail the -> thread but re: your note that you'd like to make inbounds GEPs a rustc internal optimization rather than an inherent part of field offsets (and presumably more generally offset).

I also don't care for the vague and apparently useless "must be in bounds of an object" part, but inbounds unfortunately ties that together with additonal, rather different, quite useful (and not as onerous for unsafe code) properties: that the address calculation doesn't wrap around and that the offset computation (excepting the addition to the base pointer) doesn't wrap in signed arithmetic. I am already skeptical about how practical it is to teach LLVM (let alone rustc) to figure out the "in bounds of an allocation" aspect from derefenceable attributes on related pointers, and the non-wrapping properties require even more reasoning and can also depend on the concrete base address. And unlike the "in bounds of an allocation" property, the absence of wrapping does enables a substantial number of optimizations which I'd rather not make dependent on fragile analyses that have work against the language semantics.

So my long-term hope is to rather separate these aspects in LLVM IR and then free Rust users from having to worry about the "in bounds" part but keep the non-wrapping aspects. Which, of course, means keeping -> unsafe, though with fewer proof obligations (and in the case of field offsets, the compiler gives substantial assistance -- creating a type that would lead to field offsets > isize::MAX is a compile time error).

RalfJ (May 07 2019 at 08:41, on Zulip):

Hm... I basically had this thought when people said on llvm-dev that GEPi vs GEP does not have a huge amount of impact. But I guess we'd have to collect some data.

RalfJ (May 07 2019 at 08:42, on Zulip):

I am already skeptical about how practical it is to teach LLVM (let alone rustc) to figure out the "in bounds of an allocation" aspect from derefenceable attributes on related pointers,

I think it'd be rather easy for rustc to determine when it can add inbounds -- if we are working on a reference, we can; if we are working on a raw ptr, we cannot.

RalfJ (May 07 2019 at 08:42, on Zulip):

The only complication I see there is around accessing the last field of an unsized struct

Tom Phinney (May 07 2019 at 13:56, on Zulip):

In the short term rustc could simply omit the inbounds on that last field. That would still provide safety for the other fields.

rkruppe (May 07 2019 at 15:10, on Zulip):

Yes, we should measure the impact, but doing this properly is difficult because absence of wrapping often affects rather situational and low-level optimizations which can nevertheless become important in hotspots.

For example, by a happy coincidence (seeing this talk) I can point you at the arm64_32 target where GEPs without inbounds result in much worse machine code (have to mask off high bits after every GEP) than those with inbounds. You couldn't see that while testing on x86 or most other targets, but it's going to be a real pain if someone ever seriously targets this kind of platform with Rust.

Another major use of "no wrapping" I'm aware of is in the analysis of loops whose induction variable is a pointer (see here for example). I'm not sure if you want to change anything about ptr.offset(n) or just built-in field accesses (changing offset is required for fixing the slicing problem that led to that llvm-dev thread), but if offset also loses the inbounds, then e.g. automatic vectorization of loops based on slice iterators could become more fragile.

RalfJ (May 07 2019 at 20:33, on Zulip):

note that I am only suggesting this for offset-on-raw-ptr, not for all offsets. but I get your point.

RalfJ (May 07 2019 at 20:35, on Zulip):

"nowrap" instead of "inbounds" would be nice if we used it for all projections (answering questions such as https://github.com/rust-lang/rust/issues/54857). but when it comes to raw ptr projections, as far as spec and Miri complexity goes, both are equally annoying (as on, both need a special case and put extra burden on unsafe code authors in code that already is very hard to get right)

RalfJ (May 08 2019 at 12:08, on Zulip):

that the offset computation (excepting the addition to the base pointer) doesn't wrap in signed arithmetic

just to be sure I remember correctly, this really was with the base being interpreted unsigned but the offset signed, right?

rkruppe (May 08 2019 at 12:30, on Zulip):

Yes. This is the part that actually makes allocations larger than isize::MAX bytes problematic.

Last update: Nov 19 2019 at 18:35UTC