Stream: t-lang/wg-unsafe-code-guidelines

Topic: noalias semantics


nikomatsakis (Aug 30 2019 at 12:31, on Zulip):

So @RalfJ (or others) I was thinking over our conversation from yesterday. I went looking to read up on what the rules for noalias -- vague or ill-specified as they may be -- are. I didn't find much. From the language reference manual I see:

This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.

Clicking through to the discussion of "no-alias" doesn't add a lot of detail.

Ah, writing this comment has helped clarify my confusion I think. What I was wondering about was: why aren't*mut values to the generator considered to conflict with a &mut to some interior field X (from LLVM's POV)? And the answer, I think, is that the *mut isn't used to access that field X (i.e., the LLVM definition is based solely on the "objects accessed").

I guess the next question is what LLVM means when it says objects. That is, what is the granularity of a conflict -- byte ranges? word ranges? etc. I naively interpret that pretty broadly (e.g., the entire struct), in which case it seems like there is still a problem, but I suspect that the language above may be overly broad. I guess the question would be in what situation LLVM might synthesize access to fields of memory that was not originally accessed?

gnzlbg (Aug 30 2019 at 14:26, on Zulip):

naively interpret that pretty broadly (e.g., the entire struct)

@nikomatsakis IIUC this interpretation is correct. E.g. when accessing e.g. &self.field we tell LLVM that Self is some struct, that &self is a pointer to Self that is noalias, dereferenceable(sizeof(struct)), etc. and that the .field access is inbounds of that struct. IIRC, the LLVM docs cover this in the getelementptr section, and its inbounds subsection.

gnzlbg (Aug 30 2019 at 14:28, on Zulip):

IIUC the problem is that we access some fields of the generator using such an approach, while simultaneoulsy using a pointer not derived from &self to mutate the contents of the struct.

nikomatsakis (Aug 30 2019 at 15:09, on Zulip):

"derived from" here means "based on"?

nikomatsakis (Aug 30 2019 at 15:13, on Zulip):

I guess I have to go refresh my memory as to what the MIR etc looks like, but I'm (naively) assuming that resuming a generator is something like:

I suppose what we could alternatively do is to rewrite local variables to directly access from self, so that we are not using the stack.

Either way, if there is some local variable of type &mut (which points elsewhere in the struct), then at some point we'll have something sort of like:

p = self.stored_value_of_p

and I guess that p is not considered "based on" self here? The LLVM definition made it sound specific to GEP -- i.e., I didn't see loads -- though I guess you could insert an inttoptr to kind of alter things.

nikomatsakis (Aug 30 2019 at 15:13, on Zulip):

I guess I should go re-read the original comments more closely.

nikomatsakis (Aug 30 2019 at 15:13, on Zulip):

(Separately and relatedly, I do wonder if "alias sets" or other bits of LLVM metadata give us a bit more expressive power)

RalfJ (Aug 30 2019 at 15:45, on Zulip):

I think "object" can in particular also be a "subobject" here

RalfJ (Aug 30 2019 at 16:20, on Zulip):

so that effectively makes it byte-level

RalfJ (Aug 30 2019 at 16:20, on Zulip):

and I guess that p is not considered "based on" self here? The LLVM definition made it sound specific to GEP -- i.e., I didn't see loads -- though I guess you could insert an inttoptr to kind of alter things.

it would be rather strange IMO when a pointer stored in memory would be "based on" the pointer used to load it

RalfJ (Aug 30 2019 at 16:20, on Zulip):

also we already assume that text is wrong: we assume it is also okay to have many accesses to the same object through different pointers that are all noalias, as long as all accesses are reads...

RalfJ (Aug 30 2019 at 16:20, on Zulip):

I think that also would be a fairly ill-behaved semantics

RalfJ (Aug 30 2019 at 20:11, on Zulip):

you want the round-trip of storing a ptr in memory and loading it again to leave that ptr unchanged, not make it acquire new provenance from the ptr used to store/load the ptr

Last update: Nov 20 2019 at 12:40UTC