So @RalfJ (or others) I was thinking over our conversation from yesterday. I went looking to read up on what the rules for noalias -- vague or ill-specified as they may be -- are. I didn't find much. From the language reference manual I see:
This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.
Clicking through to the discussion of "no-alias" doesn't add a lot of detail.
Ah, writing this comment has helped clarify my confusion I think. What I was wondering about was: why aren't
*mut values to the generator considered to conflict with a
&mut to some interior field
X (from LLVM's POV)? And the answer, I think, is that the
*mut isn't used to access that field
X (i.e., the LLVM definition is based solely on the "objects accessed").
I guess the next question is what LLVM means when it says objects. That is, what is the granularity of a conflict -- byte ranges? word ranges? etc. I naively interpret that pretty broadly (e.g., the entire struct), in which case it seems like there is still a problem, but I suspect that the language above may be overly broad. I guess the question would be in what situation LLVM might synthesize access to fields of memory that was not originally accessed?
naively interpret that pretty broadly (e.g., the entire struct)
@nikomatsakis IIUC this interpretation is correct. E.g. when accessing e.g.
&self.field we tell LLVM that
Self is some struct, that
&self is a pointer to
Self that is
dereferenceable(sizeof(struct)), etc. and that the
.field access is
inbounds of that struct. IIRC, the LLVM docs cover this in the
getelementptr section, and its
IIUC the problem is that we access some fields of the generator using such an approach, while simultaneoulsy using a pointer not derived from
&self to mutate the contents of the struct.
"derived from" here means "based on"?
I guess I have to go refresh my memory as to what the MIR etc looks like, but I'm (naively) assuming that resuming a generator is something like:
I suppose what we could alternatively do is to rewrite local variables to directly access from
self, so that we are not using the stack.
Either way, if there is some local variable of type
&mut (which points elsewhere in the struct), then at some point we'll have something sort of like:
p = self.stored_value_of_p
and I guess that
p is not considered "based on"
self here? The LLVM definition made it sound specific to GEP -- i.e., I didn't see loads -- though I guess you could insert an
inttoptr to kind of alter things.
I guess I should go re-read the original comments more closely.
(Separately and relatedly, I do wonder if "alias sets" or other bits of LLVM metadata give us a bit more expressive power)
I think "object" can in particular also be a "subobject" here
so that effectively makes it byte-level
and I guess that p is not considered "based on" self here? The LLVM definition made it sound specific to GEP -- i.e., I didn't see loads -- though I guess you could insert an inttoptr to kind of alter things.
it would be rather strange IMO when a pointer stored in memory would be "based on" the pointer used to load it
also we already assume that text is wrong: we assume it is also okay to have many accesses to the same object through different pointers that are all
noalias, as long as all accesses are reads...
I think that also would be a fairly ill-behaved semantics
you want the round-trip of storing a ptr in memory and loading it again to leave that ptr unchanged, not make it acquire new provenance from the ptr used to store/load the ptr