Stream: t-lang/wg-unsafe-code-guidelines

Topic: stacked borrows: llvm-noalias violation in safe code


Ariel Ben-Yehuda (Nov 26 2018 at 14:29, on Zulip):

example:

let x: &mut u32 = a;
let raw_ptr: *const u32 = x;
foo(x, raw_ptr);

fn foo(x: &mut u32, raw_ptr: *const u32) -> u32 {
    let x = &mut *x; // stack is now [..., Uniq(N)]
    *x = 5;
    let y = &*x; // stack is now [..., Uniq(N), Shr]

    // this is OK according to stacked borrows because of the Shr tag, but is UB
    // according to LLVM noalias and would be hard to make non-UB.
    unsafe { *raw_ptr }
}
RalfJ (Nov 28 2018 at 09:12, on Zulip):

@nikomatsakis This is an example of code that is UB according to LLVM but not according to Stacked Borrows. I designed Stacked Borrows coming more from the Rust side than from the LLVM side, so I am not too surprised these exist, but it means Stacked Borrows cannot be the end of the story for justifying noalias.

RalfJ (Nov 28 2018 at 09:13, on Zulip):

@Ariel Ben-Yehuda you had another interesting example, involving mem::replace or so, didn't you?

RalfJ (Nov 28 2018 at 09:13, on Zulip):

My general plan for this is that I hope to work with some other people on better understanding LLVM noalias, and once we got that we can see how much of that we have to incorporate into Rust, and how it interacts with Stacked Borrows.

RalfJ (Nov 28 2018 at 09:13, on Zulip):

but that's a multi-year plan ;)

Nicole Mazzuca (Nov 28 2018 at 16:16, on Zulip):

I would really hope that's ub in any system we come up with

Nicole Mazzuca (Nov 28 2018 at 16:17, on Zulip):

we should be able to assume &mut pointers don't alias anything

Nicole Mazzuca (Nov 28 2018 at 16:21, on Zulip):

especially other function parameters

RalfJ (Nov 28 2018 at 16:25, on Zulip):

@Nicole Mazzuca that doesn't hold though, here is a counterexample:

use std::cell::RefCell;

fn main() {
    fn inner(rc: &RefCell<i32>, aliasing: &mut i32) {
        // the two pointers alias (they are not equal, but they overlap)
        println!("{:?} + {} = {:?}",
            rc as *const _,
            (aliasing as *mut _ as usize) - (rc as *const _ as usize),
            aliasing as *mut _
        );
    }

    let rc = RefCell::new(23);
    let mut bmut = rc.borrow_mut();
    inner(&rc, &mut *bmut);
}
Nicole Mazzuca (Nov 28 2018 at 16:26, on Zulip):

that's not aliasing?

Nicole Mazzuca (Nov 28 2018 at 16:27, on Zulip):

that's pointers being equal

RalfJ (Nov 28 2018 at 16:28, on Zulip):

so you are not actually talking about the property of pointers then, but about the property of accesses "done through" them

Nicole Mazzuca (Nov 28 2018 at 16:29, on Zulip):

yes

RalfJ (Nov 28 2018 at 16:29, on Zulip):

I think I know what you mean, I am not sure I agree though. I don't think raw pointers should have an identity. That's just an unnecessary complication in usually-already-complicated raw ptr code.

Nicole Mazzuca (Nov 28 2018 at 16:29, on Zulip):

the type system must guarantee that those accesses do not alias

Nicole Mazzuca (Nov 28 2018 at 16:30, on Zulip):

but raw pointers probably must have an identity for purposes of alias analysis

Nicole Mazzuca (Nov 28 2018 at 16:30, on Zulip):

since they do have an identity in LLVM

RalfJ (Nov 28 2018 at 16:33, on Zulip):

since they do have an identity in LLVM

at this point I am still trying to separate "the model we want" from constraints like "how LLVM works". that may be just naive or idealistic, but well, I'm dreaming a bit ;)

Nicole Mazzuca (Nov 28 2018 at 16:33, on Zulip):

I mean that's fair, but we should keep in mind LLVM

Nicole Mazzuca (Nov 28 2018 at 16:34, on Zulip):

it's been designed in a reasonably useful way for optimization

RalfJ (Nov 28 2018 at 16:34, on Zulip):

fair enough

Nicole Mazzuca (Nov 28 2018 at 16:34, on Zulip):

I'm not gonna say raw pointers must have identity, I don't know, but that example must be UB

RalfJ (Nov 28 2018 at 16:35, on Zulip):

hence my long-term plan: 1. understand LLVM noalias (for me this means have a precise formal model, one that one could use to prove things in), and then 2. see if from that we can glean how to make Stacked Borrows compatible, or maybe small tweaks to LLVM that might be acceptable upstream to make it more compatible with an idea of "untracked pointers"

RalfJ (Nov 28 2018 at 16:36, on Zulip):

but that example must be UB

that is because LLVM treats it as such and you consider LLVM immutable, or because there are good reasons for this from a pure Rust perspective?

RalfJ (Nov 28 2018 at 16:37, on Zulip):

I think the current model is sound wrt LLVM if no raw pointers are involved, which is a good start IMO.

Nicole Mazzuca (Nov 28 2018 at 16:39, on Zulip):

because there are good reasons for this from a pure rust perspective

Nicole Mazzuca (Nov 28 2018 at 16:39, on Zulip):

basically, I don't think introducing raw pointers should do anything to optimization

Nicole Mazzuca (Nov 28 2018 at 16:41, on Zulip):

*const T should be treated as a nullable, maybe-unaligned &T, while *mut T should be treated as a nullable, maybe-unaligned &UnsafeCell<T>

Nicole Mazzuca (Nov 28 2018 at 16:41, on Zulip):

(that was weird UX things)

Nicole Mazzuca (Nov 28 2018 at 16:42, on Zulip):

also to be clear, I really like stacked borrows, I think it's a fantastic model which just requires a bit of working out the kinks

RalfJ (Nov 28 2018 at 16:45, on Zulip):

also to be clear, I really like stacked borrows, I think it's a fantastic model which just requires a bit of working out the kinks

good to hear. :) I am sure it's not ready yet

RalfJ (Nov 28 2018 at 16:46, on Zulip):

*const T should be treated as a nullable, maybe-unaligned &T

that can't be entirely right, &i32 e.g. asserts immutability while *const i32 is fine with writes through other pointers to the same location

Nicole Mazzuca (Nov 28 2018 at 16:47, on Zulip):

mmh, good point

RalfJ (Nov 28 2018 at 16:47, on Zulip):

about the example: the reason it is considered okay by stacked borrows is that the mutable ref did get shared. so reads can happen through shared references

RalfJ (Nov 28 2018 at 16:48, on Zulip):

also allowing reads through raw pointers at that point didn't seem too harmful

Nicole Mazzuca (Nov 28 2018 at 16:48, on Zulip):

so *const T should also be treated as &UnsafeCell<T> probably

RalfJ (Nov 28 2018 at 16:48, on Zulip):

fixing this particular example is easy though (just disallow raw ptr reads when the location was never cast to a raw ptr). variants of it are harder.

Nicole Mazzuca (Nov 28 2018 at 16:49, on Zulip):

even then, it seems untenable to support

RalfJ (Nov 28 2018 at 16:49, on Zulip):
fn foo(x: &mut u32, raw_ptr: *const u32) -> u32 {
    let x = &mut *x; // stack is now [..., Uniq(N)]
    *x = 5;
    mem::swap(x, &mut 16); // internally uses raw pointers
    let y = &*x; // stack is now [..., Uniq(N), Shr]

    // this is OK according to stacked borrows because of the Shr tag, but is UB
    // according to LLVM noalias and would be hard to make non-UB.
    unsafe { *raw_ptr }
}
RalfJ (Nov 28 2018 at 16:50, on Zulip):

here the location actually does get "leaked" to raw ptrs by mem::swap, hence justifying the raw ptr access later

Nicole Mazzuca (Nov 28 2018 at 16:50, on Zulip):
let x : *mut i32;
let y : &mut i32;

*y = 0;
y as *mut i32;
*x
RalfJ (Nov 28 2018 at 16:50, on Zulip):

I don't see a way to rule this out without tracking raw ptr identity

Nicole Mazzuca (Nov 28 2018 at 16:50, on Zulip):

right

Nicole Mazzuca (Nov 28 2018 at 16:50, on Zulip):

I'd argue that that's the solution

RalfJ (Nov 28 2018 at 16:51, on Zulip):

I'd argue that cure is worse than the disease ;)

RalfJ (Nov 28 2018 at 16:51, on Zulip):

basically I am not convinced that the additional power this gives to the optimizer is worth the additional complexity this imposes on every unsafe code author

Nicole Mazzuca (Nov 28 2018 at 16:52, on Zulip):

I mean... the complexity is imposed already

Nicole Mazzuca (Nov 28 2018 at 16:52, on Zulip):

I also don't think it's really any extra complexity for unsafe authors

RalfJ (Nov 28 2018 at 16:52, on Zulip):

raw pointers not being tracked seems like a very useful complexity reduction to me

RalfJ (Nov 28 2018 at 16:52, on Zulip):

but I might also be overestimating that extra complexity

Nicole Mazzuca (Nov 28 2018 at 16:52, on Zulip):

for the model, I think

Nicole Mazzuca (Nov 28 2018 at 16:53, on Zulip):

but not for people writing code

RalfJ (Nov 28 2018 at 16:53, on Zulip):

and it might reduce complexity around other questions, like the difference between transmute-to-raw and cast-to-raw, or the funny effects when you transmute a raw ptr to a reference

Nicole Mazzuca (Nov 28 2018 at 16:53, on Zulip):

exactly

RalfJ (Nov 28 2018 at 16:53, on Zulip):

hm now that is curious, people writing the code will have to understand the model, so I'd think any complexity increase in the model is also a complexity increase for programmers?

Nicole Mazzuca (Nov 28 2018 at 16:54, on Zulip):

very few programmers will ever understand the model on a mathematical level

Nicole Mazzuca (Nov 28 2018 at 16:54, on Zulip):

they will get told "this is UB"

RalfJ (Nov 28 2018 at 16:54, on Zulip):

oh sure, I meant on a more intuitive, descriptive level

Nicole Mazzuca (Nov 28 2018 at 16:54, on Zulip):

by people like me

RalfJ (Nov 28 2018 at 16:54, on Zulip):

something with pictures of a stack ;)

Nicole Mazzuca (Nov 28 2018 at 16:55, on Zulip):

on an intuitive level, they probably won't get close to enough mathematical rigor that the extra complexity matters

Nicole Mazzuca (Nov 28 2018 at 16:55, on Zulip):

this is my experience in the C++ community, anyways

Nicole Mazzuca (Nov 28 2018 at 16:56, on Zulip):

having a checker is far more important than having a simple system

Nicole Mazzuca (Nov 28 2018 at 16:57, on Zulip):

because the vast, vast majority of people will never read this part of the specification :P

RalfJ (Nov 28 2018 at 16:58, on Zulip):

judging from the feedback I got, experienced programmers without a CS degree can think in terms of sth on the complexity of Stacked Borrows. so I hope we can keep the model simple enough. but well, only time will tell^^

Nicole Mazzuca (Nov 28 2018 at 16:58, on Zulip):

(or most of the other parts of the specification, let's be honest)

RalfJ (Nov 28 2018 at 16:58, on Zulip):

the vast majority will hopefully never write unsafe code :P

RalfJ (Nov 28 2018 at 16:58, on Zulip):

that's a key advantage we have over C++

Nicole Mazzuca (Nov 28 2018 at 16:58, on Zulip):

I don't mean they can't understand it, I mean they just don't care enough to read it

Nicole Mazzuca (Nov 28 2018 at 16:59, on Zulip):

because we should design our model in a way which means you don't end up having to read it :)

Nicole Mazzuca (Nov 28 2018 at 17:00, on Zulip):

(and I think stacked borrowd follow that rule)

RalfJ (Nov 28 2018 at 17:00, on Zulip):

I feel in the Rust community there is a strong desire for safe abstractions, and there is a sizeable group of people valuing that enough to very carefully evaluate their crates accordingly. I have no experience with the C++ community so I cannot compare, but as an outsider, safe abstractions seem to be less of a core thing there (also because the type system doesn't really let you do that to the extend Rust does)

RalfJ (Nov 28 2018 at 17:00, on Zulip):

my main problem with "lets track raw ptrs" is explaining int-to-ptr casts. we need some kind of "wildcard", some kind of fallback. and if that works like raw ptrs work in Stacked Borrows, nothing is won. (all the examples above would still be fine, then, if you just cast ptrs to integers and back a bit.) that's why I'd first like to understand LLVM noalias in isolation, and then maybe that teaches us how to solve this kind of problem.

Nicole Mazzuca (Nov 28 2018 at 17:01, on Zulip):

I think to understand noalias, one must understand provenance

Nicole Mazzuca (Nov 28 2018 at 17:01, on Zulip):

provenance travels through values of any type

RalfJ (Nov 28 2018 at 17:01, on Zulip):

true. I feel at least I understand it much better now than I did a year ago.

RalfJ (Nov 28 2018 at 17:01, on Zulip):

provenance travels through values of any type

I strongly disagree with that

Nicole Mazzuca (Nov 28 2018 at 17:02, on Zulip):

how?

RalfJ (Nov 28 2018 at 17:02, on Zulip):

integers shouldnt have provenance, and I think any model making them have provenance is untenable

RalfJ (Nov 28 2018 at 17:03, on Zulip):

there are common compiler optimizations that are incompatible with integers having provenance

RalfJ (Nov 28 2018 at 17:03, on Zulip):

like, GVN putting i and j into the same "bucket" inside an if i == j

RalfJ (Nov 28 2018 at 17:03, on Zulip):

and also from a modelling perspective, integers corresponding to mathematical integers (modulo 2^N) is extremely useful

RalfJ (Nov 28 2018 at 17:04, on Zulip):

and breaking that... ugh. no please no.^^

RalfJ (Nov 28 2018 at 17:04, on Zulip):

for the next C/C++ standards, AFAIK the game still seems to be open wrt. integers do or do not have provenance. I very strongly hope they decide against making them have provenance

RalfJ (Nov 28 2018 at 17:05, on Zulip):

(I'll be back later, afk for now)

Nicole Mazzuca (Nov 28 2018 at 17:05, on Zulip):

I would argue that provenance lists are the way to solve that

Nicole Mazzuca (Nov 28 2018 at 17:05, on Zulip):

but this is the way LLVM does noalias:

Nicole Mazzuca (Nov 28 2018 at 17:06, on Zulip):

This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value.

Nicole Mazzuca (Nov 28 2018 at 17:06, on Zulip):

(on a noalias argument)

RalfJ (Nov 28 2018 at 17:47, on Zulip):

I agree understanding provenance is key to understanding noalias

Nicole Mazzuca (Nov 28 2018 at 18:04, on Zulip):

it means that stuff like:

Nicole Mazzuca (Nov 28 2018 at 18:04, on Zulip):
fn foo(x: &mut i32, y: *const i32) -> i32 {
  *(x as *mut _ as usize as *mut i32) = 1;
  *y
}
Nicole Mazzuca (Nov 28 2018 at 18:05, on Zulip):

is still UB, if x and y alias

RalfJ (Nov 28 2018 at 20:27, on Zulip):

is still UB, if x and y alias

what about this?

fn foo(x: &mut i32, y: *const i32) -> i32 {
  let xi = x as *mut _ as usize;
  let yi = y as usize;
  if xi == yi {
    *(yi as *mut i32) = 1;
  }
  *y
}

And what if I now replace yi by xi inside the conditional (which GVN will do if it deems it profitable)?

Nicole Mazzuca (Nov 28 2018 at 21:33, on Zulip):

LLVM can introduce that kind of UB in its passes, as long as it makes it not UB by some other transformation.

Nicole Mazzuca (Nov 28 2018 at 21:33, on Zulip):

That doesn't mean you can.

rkruppe (Nov 28 2018 at 21:34, on Zulip):

Sure, but LLVM wants to run alias analysis before and after GVN, so there is a tension here that can't just be explained away that way

Nicole Mazzuca (Nov 28 2018 at 21:34, on Zulip):

for example, LLVM can transform struct { unsigned x; unsigned y; } foo; do_smth(&foo); return foo.x | (static_cast<uint64_t>(foo.y) << 32); with return *(uint64_t*)&foo;, but you sure can't.

Nicole Mazzuca (Nov 28 2018 at 21:34, on Zulip):

(in C++ (because of alignment _and_ TBAA), nor Rust (because of alignment))

Nicole Mazzuca (Nov 28 2018 at 21:36, on Zulip):

if LLVM wants to make that transformation, it has to make the transformed code not UB via some method

rkruppe (Nov 28 2018 at 21:38, on Zulip):

Yes, the issue is how to do that in a model where integers have provenance.

Nicole Mazzuca (Nov 28 2018 at 21:39, on Zulip):

huh?

Nicole Mazzuca (Nov 28 2018 at 21:39, on Zulip):

integers have provenance in LLVM

Nicole Mazzuca (Nov 28 2018 at 21:40, on Zulip):

if GVN wants to make that optimization, that's LLVM's problem

Nicole Mazzuca (Nov 28 2018 at 21:41, on Zulip):

we have to guarantee that Rust code which does not have UB transforms into LLVM IR that doesn't have undefined behavior, though

Nicole Mazzuca (Nov 28 2018 at 21:41, on Zulip):

LLVM also has to make sure its transformations follow this property

Nicole Mazzuca (Nov 28 2018 at 21:41, on Zulip):

one easy way to do this would be to remove the noalias on the parameter

rkruppe (Nov 28 2018 at 21:43, on Zulip):

I am well aware of the roles of rustc and LLVM passes. I think we're coming from different perspectives? Like, yes, ultimately justifying transformations is the problem of the one doing them, but Ralf's point was more generally that models in which integers have provenance suck, in part because it's so difficult to justify perfectly sensible optimizations on pure integer operations (such as GVN in that example).

Nicole Mazzuca (Nov 28 2018 at 21:45, on Zulip):

I mean, okay, but we have to deal with LLVM as it exists, and LLVM as it exists gives integers provenance. I don't see a way of keeping the optimizations without giving integers provenance in any model we use.

Nicole Mazzuca (Nov 28 2018 at 21:46, on Zulip):

(also, we can give integers provenance in the Rust model, and then helpfully ignore integer provenance if it's helpful for optimization)

Nicole Mazzuca (Nov 28 2018 at 21:47, on Zulip):

(which is what I assume LLVM does)

rkruppe (Nov 28 2018 at 21:49, on Zulip):

Well, it's not like LLVM is immutable and infallible. Its integer optimizations and aliasing model have in the past contradicted itself.

rkruppe (Nov 28 2018 at 21:51, on Zulip):

As detailed in https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf which also lays out a memory model that justifies many existing optimizations LLVM does without provenance for integers (though IIRC it don't say anything about noalias specifically)

rkruppe (Nov 28 2018 at 21:56, on Zulip):

None of this means we're in the clear wrt how LLVM works today but to me it illustrates that maybe noalias in its current form isn't the be-all-end-all and we either have to conform to it or lose all hope of improving AA. I can definitely see a future where either LLVM changes its aliasing/memory model or gains more fine-grained alternatives to noalias that we have a better shot at using (there's plenty of non-Rust-incentives for this, e.g. noalias is known to be pretty inadequate for restrict too)

RalfJ (Nov 28 2018 at 21:56, on Zulip):

LLVM as it exists gives integers provenance.

I doubt that. I think it does not. And they have a GVN transformation which witnesses that it does not.

RalfJ (Nov 28 2018 at 21:56, on Zulip):

LLVM IR has the same semantics before and after GVN

RalfJ (Nov 28 2018 at 21:57, on Zulip):

so, while it is true that LLVM can have semantics different from Rust, it must have some consistent semantics

RalfJ (Nov 28 2018 at 21:57, on Zulip):

and I am saying "naive noalias with int-ptr-casts" + "GVN based on int equality" cannot coexist as transformations on the same IR

RalfJ (Nov 28 2018 at 21:58, on Zulip):

it seems likely that LLVM currently does both, and hence is buggy. would be interesting to construct a miscompilation from this actually.

RalfJ (Nov 28 2018 at 21:58, on Zulip):

noalias is known to be pretty inadequate for restrict too

it is?

rkruppe (Nov 28 2018 at 21:59, on Zulip):

I believe it's fine for arguments but not expressive enough for locals with restrict

RalfJ (Nov 28 2018 at 21:59, on Zulip):

ah yeah that seems plausible

RalfJ (Nov 28 2018 at 22:02, on Zulip):

As detailed in https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf which also lays out a memory model that justifies many existing optimizations LLVM does without provenance

oh but we have provenance :) just not on integers

RalfJ (Nov 28 2018 at 22:03, on Zulip):

but pointers carry around extra information, reflecting their provenance

RalfJ (Nov 28 2018 at 22:03, on Zulip):

a bit like the tags which pointers carry in Stacked Borrows, which also are a form of provenance

rkruppe (Nov 28 2018 at 22:03, on Zulip):

er, yeah, I meant provenance on integers

rkruppe (Nov 28 2018 at 22:03, on Zulip):

sorry for the confusion

RalfJ (Nov 28 2018 at 22:03, on Zulip):

okay, then I agree :)

RalfJ (Nov 28 2018 at 22:03, on Zulip):

but yes, noalias is future work

RalfJ (Nov 28 2018 at 22:03, on Zulip):

I am meeting with some other authors of that paper in Jan to start thinking about noalias

rkruppe (Nov 28 2018 at 22:04, on Zulip):

:confetti:

RalfJ (Nov 28 2018 at 22:05, on Zulip):

and the example above will be an interesting litmus test

RalfJ (Nov 28 2018 at 22:05, on Zulip):

one of my coauthors is really good at turning such problems into miscompilations, I'll ask them if they can't exploit this one^^

RalfJ (Nov 28 2018 at 22:06, on Zulip):

but anyway it's getting late... good night!

Ariel Ben-Yehuda (Nov 28 2018 at 23:02, on Zulip):

and I am saying "naive noalias with int-ptr-casts" + "GVN based on int equality" cannot coexist as transformations on the same IR

My original solution what I called "broadcast semantics" - doing the "unfreeze and push an Shr" on the ptr->int cast or pointer comparison (of course, this makes them side-effectful, which has its own disadvantages. Did anyone measure the perf impact of that?)

@Ariel Ben-Yehuda you had another interesting example, involving mem::replace or so, didn't you?

That was just about mem::replace using raw pointers internally, and us not wanting to treat it as a "leak".

Ariel Ben-Yehuda (Nov 28 2018 at 23:03, on Zulip):

good night in any casez.

RalfJ (Nov 29 2018 at 08:55, on Zulip):

My original solution what I called "broadcast semantics" - doing the "unfreeze and push an Shr" on the ptr->int cast or pointer comparison (of course, this makes them side-effectful, which has its own disadvantages. Did anyone measure the perf impact of that?)

Yes and that's what Stacked Borrows does, but this breaks your mem::replace example because replace performs a broadcast

RalfJ (Nov 29 2018 at 08:56, on Zulip):

the issue is somehow "scoping" the broadcast, I think

Ariel Ben-Yehuda (Nov 29 2018 at 08:58, on Zulip):

The idea was limiting the broadcast to ptr->int casts (and maybe pointer comparisons), rather than to raw pointer ops.

Ariel Ben-Yehuda (Nov 29 2018 at 08:58, on Zulip):

with the idea being that ptr->int casts on pointers that are not "very much escaped" should be rare

RalfJ (Nov 29 2018 at 08:59, on Zulip):

hm. but then calling an unknown function could still do that.

RalfJ (Nov 29 2018 at 08:59, on Zulip):

Stacked Borrows, for me, is all about the guarantees you get around unknown function calls

Ariel Ben-Yehuda (Nov 29 2018 at 08:59, on Zulip):

yea, and it can be because the unknown function could be as_ptr

Ariel Ben-Yehuda (Nov 29 2018 at 09:00, on Zulip):

so one idea I had was to have some kind of references following lifetimes in safe code, that's all the deal about the "tootsie pop model"

RalfJ (Nov 29 2018 at 09:00, on Zulip):

I mean Stacked Borrows gives you some of that

Ariel Ben-Yehuda (Nov 29 2018 at 09:00, on Zulip):

but only after you do a write

RalfJ (Nov 29 2018 at 09:01, on Zulip):

it currently makes even ref-to-raw "broadcast" and still you get some guarantees around unknown code

RalfJ (Nov 29 2018 at 09:01, on Zulip):

no, any Retag will do it

RalfJ (Nov 29 2018 at 09:01, on Zulip):

and also there are these "function call barriers" that give you even more information (and have found a bug already)

Ariel Ben-Yehuda (Nov 29 2018 at 09:01, on Zulip):

which go in the other direction, don't they?

Ariel Ben-Yehuda (Nov 29 2018 at 09:02, on Zulip):

Wouldn't that bug have been caught even without function call barriers?

RalfJ (Nov 29 2018 at 09:02, on Zulip):

no

RalfJ (Nov 29 2018 at 09:02, on Zulip):

because there was a Shr somewhere on the stack

RalfJ (Nov 29 2018 at 09:03, on Zulip):

so turning the raw ptr into an &mut popped to make that Shr active

Ariel Ben-Yehuda (Nov 29 2018 at 09:04, on Zulip):

so that's the "raw pointer escape" rule?

Ariel Ben-Yehuda (Nov 29 2018 at 09:05, on Zulip):

the equivalent of

let x: &mut A = ...;
let y: *mut A = &mut *x;
let z: &A = (&mut *x); // "WRONG"
use(z, x); // ok, because we have an Shr ?

Ariel Ben-Yehuda (Nov 29 2018 at 09:06, on Zulip):

ah no there were only 2 shared references there, AND a raw pointer

Ariel Ben-Yehuda (Nov 29 2018 at 09:06, on Zulip):

so

let x: &A = ...;
let y: *const A = &*x;
let wrong_z: &A =(&mut *(x as *const A as *mut A);
use(x, wrong_z);
RalfJ (Nov 29 2018 at 09:07, on Zulip):

I am not sure exactly where the Shr is coming from in this case

RalfJ (Nov 29 2018 at 09:08, on Zulip):

it might have been more like

let x: &mut T = ...;
let xraw = x as *mut _;
let y = &mut *xraw;
let z = &*y;
let bad = &mut *(z as *const _); // okay because this raw ptr can match the item pushed for xraw
RalfJ (Nov 29 2018 at 09:09, on Zulip):

adding a barrier anywhere between xraw and bad then detects this because you cannot use xraw any more when creating bad

Ariel Ben-Yehuda (Nov 29 2018 at 09:11, on Zulip):

that seems wrong (not being able to use xraw after using bad).

Ariel Ben-Yehuda (Nov 29 2018 at 09:11, on Zulip):

you mean y?

Ariel Ben-Yehuda (Nov 29 2018 at 09:11, on Zulip):

I thought that broadcastness remained until you used an "old" raw pointer

Ariel Ben-Yehuda (Nov 29 2018 at 09:12, on Zulip):

or reference

Ariel Ben-Yehuda (Nov 29 2018 at 09:12, on Zulip):

aka x

RalfJ (Nov 29 2018 at 09:16, on Zulip):

that seems wrong (not being able to use xraw after using bad).

I said no such thing?

RalfJ (Nov 29 2018 at 09:16, on Zulip):

Creating bad will make the Shr item pushed for xraw top-of-stack

RalfJ (Nov 29 2018 at 09:16, on Zulip):

so now you can use both bad and xraw interchangably: two raw pointers with the same tag, pointing to a location whose top-of-stack is Shr

RalfJ (Nov 29 2018 at 09:17, on Zulip):

the barrier makes it so that bad cannot be created, because the item it needs (Shr) is behind a barrier

RalfJ (Nov 29 2018 at 09:19, on Zulip):

that would be code like

let x: &mut T = ...;
let xraw = x as *mut _;
foo(&mut *xraw);

fn foo(y: &mut i32) {
// here we add a barrier to the stack, making sure y cannot be popped off
let z = &*y;
let bad = &mut *(z as *const _); // NOT okay because this would pop y off the stack,
// which is prevent by a barrier
}
Ariel Ben-Yehuda (Nov 29 2018 at 09:28, on Zulip):

ok I just misread you

rkruppe (Nov 30 2018 at 10:18, on Zulip):

@RalfJ is this the result of asking your coauthor who's good at creating miscompiles? :D https://bugs.llvm.org/show_bug.cgi?id=39846

RalfJ (Nov 30 2018 at 10:27, on Zulip):

it is^^ (not the coauthor I expected, but I sent this to all of them)

RalfJ (Nov 30 2018 at 10:29, on Zulip):

I'm actually rather surprised by this particular miscompile, I expected them to be different, but I won't be picky^^. Anything that demonstrates that LLVM's noalias handling is inconsistent is helpful towards figuring out a consistent story.

RalfJ (Nov 30 2018 at 10:30, on Zulip):

@Ariel Ben-Yehuda @Nicole Mazzuca you might also be interested in this example, see the link a few messages up

Ariel Ben-Yehuda (Nov 30 2018 at 14:24, on Zulip):

One another solution would be to track derivation "precisely" in the spec, but to have LLVM treat inttoptr as a broadcast.

Ariel Ben-Yehuda (Nov 30 2018 at 14:25, on Zulip):

which would make GVN not a legal endo-optimization on Rust code, but maintain it as an endo-optimization on LLVM IR

Ariel Ben-Yehuda (Nov 30 2018 at 14:25, on Zulip):

*ptrtoint is broadcast, not inttoptr

Ariel Ben-Yehuda (Nov 30 2018 at 14:27, on Zulip):

so this would keep the spec clean, but (I think) enable all the optimizations we want in practice

Ariel Ben-Yehuda (Nov 30 2018 at 14:27, on Zulip):

other than pointers being passed to foreign functions being treated as escaping. But that's a must given as_ptr I think

Ariel Ben-Yehuda (Nov 30 2018 at 14:40, on Zulip):

we need to make sure that *(ptr as *const usize) = *(ptr as *const usize); "breaks derived" while having memcpy work

Ariel Ben-Yehuda (Nov 30 2018 at 14:40, on Zulip):

as in, keep derived, preferably without broadcast

Ariel Ben-Yehuda (Nov 30 2018 at 14:40, on Zulip):

moving memcpy to use a "databyte" type that is GVN-less instead of u8 would work

Ariel Ben-Yehuda (Nov 30 2018 at 14:47, on Zulip):

so in the MIR you can't even optimize x*0 [if we really wanted, we could add broadcast to MIR and optimize x*0 to { broadcast(x); 0 }]

Ariel Ben-Yehuda (Nov 30 2018 at 14:47, on Zulip):

and we can even lower the likes if inttoptr(ptrtoint(x) | 1) in a way that doesn't broadcast anything

Ariel Ben-Yehuda (Nov 30 2018 at 14:48, on Zulip):

we'll need a set_tag intrinsic that doesn't allow intermediate GVN, but that will work

Ariel Ben-Yehuda (Nov 30 2018 at 14:52, on Zulip):

and add "Torvalds' delight" where in the MIR, (x & 0xff) is not derived from x but x & 0xffffff00 is

Ariel Ben-Yehuda (Nov 30 2018 at 14:54, on Zulip):

integer optimizations on MIR require adding broadcasts to random places, let LIR handle them

RalfJ (Nov 30 2018 at 15:05, on Zulip):

I really dislike provenance on integers. I think u32 should exactly model Z_[2^32], i.e., integers modulo 2^32, not the product of that and some magic metadata. We shouldn't make people's reasoning about code harder than it already is.

RalfJ (Nov 30 2018 at 15:06, on Zulip):

that's basically externalizing the cost and letting our users pay the price

Nicole Mazzuca (Nov 30 2018 at 15:34, on Zulip):

our integers don't model the modulo field tho

Nicole Mazzuca (Nov 30 2018 at 15:34, on Zulip):

also, why is that making code harder to reason about?

RalfJ (Nov 30 2018 at 15:36, on Zulip):

because it is extra state that has to be considered during verification? it means all sorts of equations that hold trivially for integers need to be carefully verified for these "strange integers". likely, they are not even associative and/or commutative. it might even make the set of possible values infinite depending on what exactly provenance looks like.

RalfJ (Nov 30 2018 at 15:36, on Zulip):

it means that just because I know that some integer is 0, I still cannot fully describe its behavior. How is that not a huge burden for verification?

Nicole Mazzuca (Nov 30 2018 at 15:37, on Zulip):

you said users

Nicole Mazzuca (Nov 30 2018 at 15:38, on Zulip):

I don't think it's a burden for users

Nicole Mazzuca (Nov 30 2018 at 15:40, on Zulip):

also, remember that these are simply checks on functions - if someone has a complex enough function that it's impossible to model the provenance of integer values (noting that the provenance of integer values is usually nil), I don't really mind that they don't get UB checking

Nicole Mazzuca (Nov 30 2018 at 15:44, on Zulip):

if you have an integer value that has provenance for more than 20 objects, what are you doing with your life

Nicole Mazzuca (Nov 30 2018 at 15:54, on Zulip):

probably hashing a vector of raw pointers

Ariel Ben-Yehuda (Nov 30 2018 at 17:45, on Zulip):

I mean, here provenance can "leak" through booleans

Ariel Ben-Yehuda (Nov 30 2018 at 17:45, on Zulip):

and other small integers

Nicole Mazzuca (Nov 30 2018 at 17:46, on Zulip):

I mean yeah.

Ariel Ben-Yehuda (Nov 30 2018 at 17:46, on Zulip):

so you could mix a "real" raw pointer with a lots of "booleans"

Ariel Ben-Yehuda (Nov 30 2018 at 17:46, on Zulip):

and have a value with 30 provenances, only one of them matters

Nicole Mazzuca (Nov 30 2018 at 17:46, on Zulip):

yeah

Ariel Ben-Yehuda (Nov 30 2018 at 17:46, on Zulip):

if the problem is automated verifications, maybe have annotations for that?

Ariel Ben-Yehuda (Nov 30 2018 at 17:47, on Zulip):

intrinsics::fix_provenance(ptr, orig)

Ariel Ben-Yehuda (Nov 30 2018 at 17:47, on Zulip):

which prevents ptr from gaining provenances beyond orig?

Ariel Ben-Yehuda (Nov 30 2018 at 17:47, on Zulip):

(and checks that ptr has orig as a provenance)

Ariel Ben-Yehuda (Nov 30 2018 at 17:48, on Zulip):

and have the verifier otherwise complain if you dereference a raw pointer with too many provenances

Ariel Ben-Yehuda (Nov 30 2018 at 17:50, on Zulip):

the intrinsic doesn't even need to have language semantics, just verifier semantics

Ariel Ben-Yehuda (Nov 30 2018 at 17:52, on Zulip):

otoh, this has the problem of preventing a "mindless" verification

Ariel Ben-Yehuda (Nov 30 2018 at 17:53, on Zulip):

you couldn't run MIRI on an unmodified program that makes a mess with ints & ptrs

Ariel Ben-Yehuda (Nov 30 2018 at 17:54, on Zulip):

otoh, twin allocation is SO MUCH worse when it comes to soundness

Ariel Ben-Yehuda (Nov 30 2018 at 17:58, on Zulip):

*when it comes to having a easy-to-compute, mindless way of determining whether a program execution is defined

Ariel Ben-Yehuda (Nov 30 2018 at 18:00, on Zulip):

I mean, even if we define it so that disassembling a pointer into bits and reassembling it doesn't give you a usable pointer

Ariel Ben-Yehuda (Nov 30 2018 at 18:01, on Zulip):

I don't think anyone would care

Ariel Ben-Yehuda (Nov 30 2018 at 18:01, on Zulip):

actually, I'm not that sure about ^

Ariel Ben-Yehuda (Nov 30 2018 at 18:02, on Zulip):

So I think the problem that would result here is that SOMEONE will deflate and inflate their RAM, and then complain that that breaks the verifier

Ariel Ben-Yehuda (Nov 30 2018 at 18:03, on Zulip):

to which I'll answer "if you do these crazy things, please do broadcast on the RAM that you deflate and inflate and then everyone will be happy"

Ariel Ben-Yehuda (Nov 30 2018 at 18:08, on Zulip):

OTOH, "deflating an inflating RAM" is a good way provenance can be really "lost", if you don't broadcast

Ariel Ben-Yehuda (Nov 30 2018 at 18:08, on Zulip):

(because it might all use array indexes in the middle, which destroy provenance)

Ariel Ben-Yehuda (Nov 30 2018 at 18:09, on Zulip):

and the compiler might even theoretically notice that

Ariel Ben-Yehuda (Nov 30 2018 at 18:11, on Zulip):

@Nicole Mazzuca how bad do you think is forcing anyone who does these kinds of mixing-up to do a broadcast on all the bytes?

Ariel Ben-Yehuda (Nov 30 2018 at 18:12, on Zulip):

OTOH, I believe that the "deflating RAM" might do their sins with references, and not raw pointers

Ariel Ben-Yehuda (Nov 30 2018 at 18:12, on Zulip):

in which case that'll be UB even with Stacked Borrows

Ariel Ben-Yehuda (Nov 30 2018 at 18:13, on Zulip):

take a bunch of RAM, deflate it, write that over a socket, read on the other end, unpack it

Ariel Ben-Yehuda (Nov 30 2018 at 18:13, on Zulip):

inflate it back, use it

Nicole Mazzuca (Nov 30 2018 at 19:56, on Zulip):

@Ariel Ben-Yehuda I doubt it's that big of a deal, for most things.

Nicole Mazzuca (Nov 30 2018 at 20:03, on Zulip):

although on the other hand, broadcast semantics also make sense

Nicole Mazzuca (Nov 30 2018 at 20:04, on Zulip):

we'd just need to make sure LLVM wasn't miscompiling us

RalfJ (Nov 30 2018 at 20:04, on Zulip):

users also want to at least informally reason about their code

RalfJ (Nov 30 2018 at 20:04, on Zulip):

the closer we can make the real model to what they are going to assume anyway, the better

RalfJ (Nov 30 2018 at 20:05, on Zulip):

every diversion from that should be very well-justified. I dont think integer provenance is even remotely justified enough, in particular if not even LLVM has it.

Nicole Mazzuca (Nov 30 2018 at 20:09, on Zulip):
int foo(int *restrict x, int *restrict y) {
  *y = 0;

  intptr_t x_addr = (intptr_t)x;
  intptr_t y_addr = (intptr_t)y;

  *(int *)x_addr = 1;
  return *(int *)y_addr;
}

converting this to

int foo(int *restrict x, int *restrict y) {
  *y = 0;
  *x = 1;
  return 0;
}

I believe requires integer provenance.

RalfJ (Nov 30 2018 at 20:13, on Zulip):

it does. which is why I think this transformation should not be allowed.

Ariel Ben-Yehuda (Nov 30 2018 at 20:14, on Zulip):

users also want to at least informally reason about their code

I think that users informally reason about integer provenance

Ariel Ben-Yehuda (Nov 30 2018 at 20:14, on Zulip):

pretty well

Ariel Ben-Yehuda (Nov 30 2018 at 20:15, on Zulip):

and the main problem with it is that it annoys optimizer & verification tool writers

Ariel Ben-Yehuda (Nov 30 2018 at 20:15, on Zulip):

(or at least, I don't expect users to distinguish much between integer provenance, raw pointer provenance, and reference provenance)

Nicole Mazzuca (Nov 30 2018 at 20:16, on Zulip):

I agree

Ariel Ben-Yehuda (Nov 30 2018 at 20:16, on Zulip):

the only case where there might be confusion is the "deflating & inflating RAM" case

Nicole Mazzuca (Nov 30 2018 at 20:16, on Zulip):

Also, if you want these semantics to be true for Rust, you'll need to fix LLVM @RalfJ

Nicole Mazzuca (Nov 30 2018 at 20:17, on Zulip):

see https://gcc.godbolt.org/z/2rwaFe

RalfJ (Nov 30 2018 at 20:17, on Zulip):

@Nicole Mazzuca we'll need to fix LLVM anyway: https://bugs.llvm.org/show_bug.cgi?id=39846

Nicole Mazzuca (Nov 30 2018 at 20:17, on Zulip):

sure

RalfJ (Nov 30 2018 at 20:17, on Zulip):

right now, given that LLVM is not self-consistent, I do not think it is a useful basis for discussion

Nicole Mazzuca (Nov 30 2018 at 20:17, on Zulip):

but there are different levels of "fixing"

Ariel Ben-Yehuda (Nov 30 2018 at 20:18, on Zulip):

and I think that adding annotations to RAM inflate/deflate is not such a high cost

Nicole Mazzuca (Nov 30 2018 at 20:18, on Zulip):

and I very much doubt that LLVM would be willing to give up optimizations like this

RalfJ (Nov 30 2018 at 20:18, on Zulip):

well, LLVM does GVN on integer equality. and it is my understanding that they strongly desire to keep it.

RalfJ (Nov 30 2018 at 20:19, on Zulip):

so when LLVM is self-consistent, I assume it'll still do this GVN and we have a model that can do noalias without integer provenance. or maybe it will have integer provenance and give up on GVN, and that may persuade me to do the same.
EDIT: or maybe we'll have found a way to have both, that'd be even better!

RalfJ (Nov 30 2018 at 20:20, on Zulip):

users also want to at least informally reason about their code

I think that users informally reason about integer provenance

my experience is that users are already surprised by pointers not being "just integers" and having some kind of provenance. if not even integers are "just (mathematical) integers", I think this will get worse.

Ariel Ben-Yehuda (Nov 30 2018 at 20:21, on Zulip):

it is very unnatural to write code that can observe that integers are not "just (mathematical) integers"

RalfJ (Nov 30 2018 at 20:22, on Zulip):

pretty much any code casting between ints and ptrs would be such code

RalfJ (Nov 30 2018 at 20:23, on Zulip):

I was originally worried that e.g. crypto code would be affected, but likely provenance would not affect integer arithmetic so one can work on equivalence classes of values quotiented by "the integer part is the same"

RalfJ (Nov 30 2018 at 20:23, on Zulip):

that would require one thing though: integer comparison is deterministic and ignores provenance

RalfJ (Nov 30 2018 at 20:23, on Zulip):

(this is not the case for ptr comparison)

Ariel Ben-Yehuda (Nov 30 2018 at 20:24, on Zulip):

that would require one thing though: integer comparison is deterministic and ignores provenance

Sounds like a good idea. I personally prefer to also define ptr comparison as integer comparison.

RalfJ (Nov 30 2018 at 20:24, on Zulip):

have fun convincing LLVM of that :P

Ariel Ben-Yehuda (Nov 30 2018 at 20:24, on Zulip):

we can do it in Rust :-)

Ariel Ben-Yehuda (Nov 30 2018 at 20:24, on Zulip):

by emitting ptrtoint

RalfJ (Nov 30 2018 at 20:24, on Zulip):

C actually says that in the standard, and LLVM still doesn't care

RalfJ (Nov 30 2018 at 20:25, on Zulip):

true, we can. it'd make me sleep better. but I think some people wouldn't like it as it'll likely cost some perf...

Ariel Ben-Yehuda (Nov 30 2018 at 20:25, on Zulip):

we wouldn't want to do it if ptrtoint has broadcast semantics

RalfJ (Nov 30 2018 at 20:26, on Zulip):

yeah and that

Nicole Mazzuca (Nov 30 2018 at 20:26, on Zulip):

one thing to note: the C standard gives integers provenance

RalfJ (Nov 30 2018 at 20:26, on Zulip):

@Nicole Mazzuca source? I've not seen that

Nicole Mazzuca (Nov 30 2018 at 20:26, on Zulip):

the C17 standard, page 89, 6.7.3.1.3

Nicole Mazzuca (Nov 30 2018 at 20:26, on Zulip):

http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf

Nicole Mazzuca (Nov 30 2018 at 20:27, on Zulip):

(technically it's the draft standard but :shrug:)

RalfJ (Nov 30 2018 at 20:27, on Zulip):

I keep looking for a good HTML version of the standard...

Ariel Ben-Yehuda (Nov 30 2018 at 20:28, on Zulip):

actually, for iterators to work, pointer comparison between pointers with the same provenance must not be a broadcast

Nicole Mazzuca (Nov 30 2018 at 20:28, on Zulip):

In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B [the block which P is declared to be for] prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E. Note that “based” is defined only for expressions with pointer types.

RalfJ (Nov 30 2018 at 20:28, on Zulip):

@Nicole Mazzuca oh, that sentence. yeah it's also in older standards. it's about pointers though, and their definition is basically information flow control which is... not even a trace-based property.

RalfJ (Nov 30 2018 at 20:29, on Zulip):

like, that mandates that x - x kills all provenance

Nicole Mazzuca (Nov 30 2018 at 20:29, on Zulip):

true

RalfJ (Nov 30 2018 at 20:29, on Zulip):

It is not in general even defined when looking only at a single trace

RalfJ (Nov 30 2018 at 20:29, on Zulip):

(aka, it's a hyperproperty)

Nicole Mazzuca (Nov 30 2018 at 20:30, on Zulip):

I think broadcast semantics are likely the only reasonable semantics, looking at it

Nicole Mazzuca (Nov 30 2018 at 20:30, on Zulip):

although we'd need a way to tell LLVM "broadcast this pointer value"

RalfJ (Nov 30 2018 at 20:30, on Zulip):

so, that's not just giving integers provenance. that's killing trace-based reasoning. I don't intend to abandon trace-based reasoning for Rust^^

Ariel Ben-Yehuda (Nov 30 2018 at 20:30, on Zulip):

broadcast @ ptr->int cast?

Nicole Mazzuca (Nov 30 2018 at 20:30, on Zulip):

because we don't in general want ptrtoint to broadcast

Ariel Ben-Yehuda (Nov 30 2018 at 20:31, on Zulip):

@ Rust ptr->int cast ?

RalfJ (Nov 30 2018 at 20:31, on Zulip):

yeah its very useful to keep ptrtoint side-effect free. reorderdable, can be DCO'd, and so on

Nicole Mazzuca (Nov 30 2018 at 20:31, on Zulip):

the issue I see is with function calls

Ariel Ben-Yehuda (Nov 30 2018 at 20:31, on Zulip):

yeah its very useful to keep ptrtoint side-effect free. reorderdable, can be DCO'd, and so on

is it?

Ariel Ben-Yehuda (Nov 30 2018 at 20:32, on Zulip):

did you measure/know why?

RalfJ (Nov 30 2018 at 20:32, on Zulip):

I dont think I have numbers, no

RalfJ (Nov 30 2018 at 20:32, on Zulip):

but if ptrtoint broadcasts, LLVM has to stop reordering it freely and treat it specially

RalfJ (Nov 30 2018 at 20:32, on Zulip):

that seems hard

Nicole Mazzuca (Nov 30 2018 at 20:32, on Zulip):

i.e., <&T as Pointer>::fmt(&p) should broadcast, but <i32 as Display>::fmt(p) should not

Ariel Ben-Yehuda (Nov 30 2018 at 20:32, on Zulip):

or we could emit an intrinsic instead

RalfJ (Nov 30 2018 at 20:32, on Zulip):

and there is no good reason to marry the type-conversion and the side-effecting broadcast into one operation

Ariel Ben-Yehuda (Nov 30 2018 at 20:32, on Zulip):

I suspect that even emitting a call to an external function won't kill performance

Nicole Mazzuca (Nov 30 2018 at 20:33, on Zulip):

but it does mean that every unknown call that accepts an &T now broadcasts

RalfJ (Nov 30 2018 at 20:33, on Zulip):

@Nicole Mazzuca yeah taht's exactly the mem::replace problem

Nicole Mazzuca (Nov 30 2018 at 20:33, on Zulip):

(it might be useful to have a nobroadcast type for parameters)

RalfJ (Nov 30 2018 at 20:33, on Zulip):

"it uses raw ptr internally so it broadcasted"

RalfJ (Nov 30 2018 at 20:33, on Zulip):

now we may avoid this with raw

RalfJ (Nov 30 2018 at 20:33, on Zulip):

but with int...

Nicole Mazzuca (Nov 30 2018 at 20:33, on Zulip):

and then if a function casts a pointer to an int, it has broadcasted

Nicole Mazzuca (Nov 30 2018 at 20:34, on Zulip):

(we'll be able to eliminate most of these, I think, in MIR)

RalfJ (Nov 30 2018 at 20:34, on Zulip):

shame that that's a safe operation

Ariel Ben-Yehuda (Nov 30 2018 at 20:34, on Zulip):

casting a pointer to an int needs to be a safe op

Ariel Ben-Yehuda (Nov 30 2018 at 20:34, on Zulip):

because HashMap<*const (), VAL>

Ariel Ben-Yehuda (Nov 30 2018 at 20:34, on Zulip):

must exist

RalfJ (Nov 30 2018 at 20:34, on Zulip):

is that so common?

Ariel Ben-Yehuda (Nov 30 2018 at 20:35, on Zulip):

yea

RalfJ (Nov 30 2018 at 20:35, on Zulip):

that HashMap is probably not even correct in C++. at least not if you compare ptrs at ptr type

RalfJ (Nov 30 2018 at 20:35, on Zulip):

because ptrs with the same address but different provenance may compare inequal

Ariel Ben-Yehuda (Nov 30 2018 at 20:35, on Zulip):

hashing pointers converts them to integers IIRC

Nicole Mazzuca (Nov 30 2018 at 20:36, on Zulip):

that seems like it should be invalid?

Nicole Mazzuca (Nov 30 2018 at 20:36, on Zulip):

it does

RalfJ (Nov 30 2018 at 20:36, on Zulip):

right but HashMaps also do == on the keys because collisions

Ariel Ben-Yehuda (Nov 30 2018 at 20:36, on Zulip):

why would you be comparing == pointers with different provenances?

Nicole Mazzuca (Nov 30 2018 at 20:36, on Zulip):

@RalfJ there's no good HTML C standard; do you have the C++ standard?

Ariel Ben-Yehuda (Nov 30 2018 at 20:36, on Zulip):

HashMap already deals with erratic PartialEq, from a safety perspective

RalfJ (Nov 30 2018 at 20:37, on Zulip):

@Nicole Mazzuca I got https://timsong-cpp.github.io/cppwp/n4140/

Nicole Mazzuca (Nov 30 2018 at 20:37, on Zulip):

ah, that one also works, yeah

RalfJ (Nov 30 2018 at 20:37, on Zulip):

HashMap already deals with erratic PartialEq, from a safety perspective

true but then it violates its functional properties

Ariel Ben-Yehuda (Nov 30 2018 at 20:37, on Zulip):

so the consequence is that you might get "safe erratic" behavior if you leave GCed keys in your hashmap

Nicole Mazzuca (Nov 30 2018 at 20:37, on Zulip):

the C standard people are more friendly to ISO rules about "not publishing their work online"

Ariel Ben-Yehuda (Nov 30 2018 at 20:37, on Zulip):

I'm not sure that users intentionally do that

RalfJ (Nov 30 2018 at 20:37, on Zulip):

ah, that one also works, yeah

you know another (better?) HTML version?

Nicole Mazzuca (Nov 30 2018 at 20:37, on Zulip):

no, they run the same code; eel.is/c++draft is what we commonly use

RalfJ (Nov 30 2018 at 20:37, on Zulip):

the C standard people are more friendly to ISO rules about "not publishing their work online"

I'd frame that as "less friendly to the public"^^

Nicole Mazzuca (Nov 30 2018 at 20:38, on Zulip):

not inaccurate

Nicole Mazzuca (Nov 30 2018 at 20:38, on Zulip):

side note, if we do get to the point where we release a standard, we should go with ECMA

RalfJ (Nov 30 2018 at 20:38, on Zulip):

no, they run the same code; eel.is/c++draft is what we commonly use

who is "we" here? (just curious)

RalfJ (Nov 30 2018 at 20:39, on Zulip):

side note, if we do get to the point where we release a standard, we should go with ECMA

my entirely uninformed opinion is to go with IETF. but whatever^^

Nicole Mazzuca (Nov 30 2018 at 20:40, on Zulip):

"people who work on C++"

Nicole Mazzuca (Nov 30 2018 at 20:40, on Zulip):

the IETF is not a standards organization, afaict

RalfJ (Nov 30 2018 at 20:41, on Zulip):

well IETF does TCP/IP and similar. if these are not standards I dont know what is^^

Nicole Mazzuca (Nov 30 2018 at 20:41, on Zulip):

ECMA is a standards organization that encourages open standards, and has experience with languages

RalfJ (Nov 30 2018 at 20:41, on Zulip):

I don't care about whether governments consider these standards. practically speaking, IETF does standards. but yeah I dont think they do programming languages^^

Nicole Mazzuca (Nov 30 2018 at 20:42, on Zulip):

it's where C++ would move if the politics were to work out

RalfJ (Nov 30 2018 at 20:42, on Zulip):

^^

Nicole Mazzuca (Nov 30 2018 at 20:44, on Zulip):

ECMA currently does Javascript, C#, Eiffel, and the CLI (C# backend)

RalfJ (Nov 30 2018 at 20:44, on Zulip):

wow I had no idea they do C#

Nicole Mazzuca (Nov 30 2018 at 20:44, on Zulip):

also apparently Dart

Ariel Ben-Yehuda (Nov 30 2018 at 20:45, on Zulip):

HashMap<*const (), T> at least has the semantics where
1. If you don't leave invalid elements in it, it acts as desired
2. If you do leave invalid elements in it, it doesn't UB but can have erratic semantics

RalfJ (Nov 30 2018 at 20:45, on Zulip):

@Ariel Ben-Yehuda that is the intention, yes. unfortunately I dont know how to give a proper account to ptr equality that is deterministic, and LLVM assumes it to be deterministic.

RalfJ (Nov 30 2018 at 20:46, on Zulip):

(non-determinism would break HashMap but breaks some LLVM optimizations)

Ariel Ben-Yehuda (Nov 30 2018 at 20:46, on Zulip):

(non-determinism would break HashMap but breaks some LLVM optimizations)

If it is not "undef" non-determinism, then it is no worse than any other erratic PartialEq impl

RalfJ (Nov 30 2018 at 20:46, on Zulip):

BTreeMap<*const (), T> is a totally different story... I am pretty sure that's UB in C/C++ and have no idea how to make it work in LLVM^^

Ariel Ben-Yehuda (Nov 30 2018 at 20:46, on Zulip):

you might get random panics, or not find your elements, but you won't get UB

RalfJ (Nov 30 2018 at 20:47, on Zulip):

If it is not "undef" non-determinism, then it is no worse than any other erratic PartialEq impl

agreed.

Ariel Ben-Yehuda (Nov 30 2018 at 20:47, on Zulip):

and if LLVM just returns undef, then that's prima facia unsafety.

RalfJ (Nov 30 2018 at 20:47, on Zulip):

yeah if LLVM can make ptr comparison undef then we are in trouble^^

RalfJ (Nov 30 2018 at 20:47, on Zulip):

I dont think they want to, though

Ariel Ben-Yehuda (Nov 30 2018 at 20:48, on Zulip):

And doesn't BTreeMap<*const (), T> have the same semantics in Rust?

RalfJ (Nov 30 2018 at 20:48, on Zulip):

it uses < in pointers

Ariel Ben-Yehuda (Nov 30 2018 at 20:48, on Zulip):

yea

Ariel Ben-Yehuda (Nov 30 2018 at 20:48, on Zulip):

which should work when they are valid

RalfJ (Nov 30 2018 at 20:49, on Zulip):

that's... uh. we make it UB/undef in our paper because we dont know how else to make the optimizations work^^

Ariel Ben-Yehuda (Nov 30 2018 at 20:49, on Zulip):

and return an nondeterministic-but-not-poison value when they are not

RalfJ (Nov 30 2018 at 20:49, on Zulip):

but TBH I dont think we spent much time on inequalities

Ariel Ben-Yehuda (Nov 30 2018 at 20:49, on Zulip):

same comment about returning UB/undef being prima facia unsafety

RalfJ (Nov 30 2018 at 20:50, on Zulip):

yes if it returns undef then it would be wrong to make it safe in Rust

RalfJ (Nov 30 2018 at 20:51, on Zulip):

I think replacing all undef by "non-deterministic but valid boolean" wouldn't break anything in our model but I might well be wrong. I should probably ask.

RalfJ (Nov 30 2018 at 20:52, on Zulip):

oh actually I misremembered. it is non-deterministic in our model, not undef or even UB

RalfJ (Nov 30 2018 at 20:52, on Zulip):

so, good enough for erratic but safe BTreeMap

RalfJ (Nov 30 2018 at 20:52, on Zulip):

but not very satisfying

Ariel Ben-Yehuda (Nov 30 2018 at 20:53, on Zulip):

so I think we concluded that the MIR translation to LLVM should use broadcast, probably on ptr->int casts

Ariel Ben-Yehuda (Nov 30 2018 at 20:53, on Zulip):

and we are not sure whether MIR itself should be defined as broadcast or provenance-based tracking

Ariel Ben-Yehuda (Nov 30 2018 at 20:53, on Zulip):

and that calling an unknown function is a broadcast because as_ptr exists

RalfJ (Nov 30 2018 at 20:53, on Zulip):

that seems to be the rough idea currently

RalfJ (Nov 30 2018 at 20:54, on Zulip):

I still dont know exactly what broadcast means for noalias

RalfJ (Nov 30 2018 at 20:54, on Zulip):

it seems that unknown functions broadcasting might be a big problem for noalias? (on the LLVM side)

Ariel Ben-Yehuda (Nov 30 2018 at 20:54, on Zulip):

depends on how much unknown functions you have

Ariel Ben-Yehuda (Nov 30 2018 at 20:55, on Zulip):

and on how well LLVM can infer nocapture

RalfJ (Nov 30 2018 at 20:55, on Zulip):

and nobroadcast I guess^^

RalfJ (Nov 30 2018 at 20:55, on Zulip):

it might also be useful to have a way to "scope" broadcast

Ariel Ben-Yehuda (Nov 30 2018 at 20:55, on Zulip):

nocapture implies nobroadcast

RalfJ (Nov 30 2018 at 20:55, on Zulip):

no idea what that means though^^

Ariel Ben-Yehuda (Nov 30 2018 at 20:55, on Zulip):

sort of

Ariel Ben-Yehuda (Nov 30 2018 at 20:56, on Zulip):

broadcast is already scoped

Ariel Ben-Yehuda (Nov 30 2018 at 20:56, on Zulip):

once you pop it off the stack, it is gone

RalfJ (Nov 30 2018 at 20:56, on Zulip):

oh I wasn't aware there's still a stack. LLVM doesn't have the stacks, or do you think it does?

Ariel Ben-Yehuda (Nov 30 2018 at 20:57, on Zulip):

I mean, from a Rust POV there's still the stack thing

Ariel Ben-Yehuda (Nov 30 2018 at 20:57, on Zulip):

or the "cactus stack"

RalfJ (Nov 30 2018 at 20:57, on Zulip):

true

RalfJ (Nov 30 2018 at 20:57, on Zulip):

lol^^

Ariel Ben-Yehuda (Nov 30 2018 at 20:57, on Zulip):

which still has the same pop-when-you-write

RalfJ (Nov 30 2018 at 20:57, on Zulip):

(even reads may pop)

Ariel Ben-Yehuda (Nov 30 2018 at 20:57, on Zulip):

yea, pop-when-you-access

RalfJ (Nov 30 2018 at 20:57, on Zulip):

ack

Ariel Ben-Yehuda (Nov 30 2018 at 20:57, on Zulip):

I'm not sure whether LLVM has pop-when-you-access ATM

Ariel Ben-Yehuda (Nov 30 2018 at 20:58, on Zulip):

that could be something worth adding

RalfJ (Nov 30 2018 at 20:58, on Zulip):

true, braodcast is scoped in MIR by popping currently. but there is no way for a function to say "I am not like as_ptr", as in, "I may broadcast internally but not beyond this call"

RalfJ (Nov 30 2018 at 20:58, on Zulip):

something like "auto-pop the stack after this function to be like it was right before"

Ariel Ben-Yehuda (Nov 30 2018 at 20:58, on Zulip):

so one idea I had was to do auto-pops when you are in a function that doesn't contain an unsafe block

RalfJ (Nov 30 2018 at 20:59, on Zulip):

mem::replace would want that

Ariel Ben-Yehuda (Nov 30 2018 at 20:59, on Zulip):

*to do them at the callee

RalfJ (Nov 30 2018 at 20:59, on Zulip):

uuhh

RalfJ (Nov 30 2018 at 20:59, on Zulip):

I... dont think I like that

Ariel Ben-Yehuda (Nov 30 2018 at 20:59, on Zulip):

mem::replace is not an unknown function

Ariel Ben-Yehuda (Nov 30 2018 at 20:59, on Zulip):

LLVM will very quickly discover that it is nocapture nobroadcast

Ariel Ben-Yehuda (Nov 30 2018 at 20:59, on Zulip):

unless it does a broadcast by accident

RalfJ (Nov 30 2018 at 21:00, on Zulip):

that's not enough on the MIR side though. it pushes a Shr and doesnt pop it

RalfJ (Nov 30 2018 at 21:00, on Zulip):

so after calling it you are allowed to use raw ptrs to access

RalfJ (Nov 30 2018 at 21:00, on Zulip):

well okay we said only for ints, but that's easily adapted

Ariel Ben-Yehuda (Nov 30 2018 at 21:00, on Zulip):

that's not enough on the MIR side though. it pushes a Shr and doesnt pop it

umm that's why I wanted provenance for raw pointers

RalfJ (Nov 30 2018 at 21:00, on Zulip):

imagine mem::replace would go through usize

Ariel Ben-Yehuda (Nov 30 2018 at 21:00, on Zulip):

yea, in that case it would be bad

RalfJ (Nov 30 2018 at 21:00, on Zulip):

or something with tagged pointers

RalfJ (Nov 30 2018 at 21:00, on Zulip):

or so

Ariel Ben-Yehuda (Nov 30 2018 at 21:01, on Zulip):

I feel that's a bogeyman

RalfJ (Nov 30 2018 at 21:01, on Zulip):

(the problems are the same no matter whether you make raw ptrs broadcast or just integers broadcast, so for now I made raw ptrs broadcast in Stacked Borrows for easier experimentation)

Ariel Ben-Yehuda (Nov 30 2018 at 21:02, on Zulip):

we could add intrinsics for tags that don't broadcast

Ariel Ben-Yehuda (Nov 30 2018 at 21:02, on Zulip):

if tags are improtant

RalfJ (Nov 30 2018 at 21:02, on Zulip):

true

Ariel Ben-Yehuda (Nov 30 2018 at 21:02, on Zulip):

and I expect more complicated uses to only occur in cases that are "definitely broadcast"

Ariel Ben-Yehuda (Nov 30 2018 at 21:02, on Zulip):

*when the pointer is already "definitely broadcast"

RalfJ (Nov 30 2018 at 21:02, on Zulip):

same as having an intrinsic for cast-to-int-then-icmp that doesnt broadcast. which we'd likely want for rust's ptr comparisons to make them definitely deterministic.

Ariel Ben-Yehuda (Nov 30 2018 at 21:03, on Zulip):

@RalfJ I prefer the latter intrinsic to be PartialOrd::le

RalfJ (Nov 30 2018 at 21:06, on Zulip):

seems we just have to convince LLVM to have such broadcasts then. should be easy. :P

Ariel Ben-Yehuda (Nov 30 2018 at 21:06, on Zulip):

And work items:
1. investigate & measure the perf impact of making ptrtoint side-effectful
2. investigate & measure the perf impact of using deterministic pointer comparisons
3. figure out how to add broadcast to LLVM

RalfJ (Nov 30 2018 at 21:07, on Zulip):

some of this measuring will be hard because LLVM doesn't currently implement a consistent semantics

RalfJ (Nov 30 2018 at 21:07, on Zulip):

in particular around ptrtoint/inttoptr

Ariel Ben-Yehuda (Nov 30 2018 at 21:07, on Zulip):

"4". figure out whether we want provenance-based or broadcast-based semantics for MIR, but that feels like too much of a bikeshed right now (keywords: deflate-and-inflate RAM, easiness for checking)

Ariel Ben-Yehuda (Nov 30 2018 at 21:08, on Zulip):

@RalfJ that's why it's "investigate & measure"

RalfJ (Nov 30 2018 at 21:08, on Zulip):

;)

RalfJ (Nov 30 2018 at 21:08, on Zulip):

also 3a. or so, "investigate how to integrate raw ptr provenance with Stacked Borrows"

Ariel Ben-Yehuda (Nov 30 2018 at 21:09, on Zulip):

sure, cool

RalfJ (Nov 30 2018 at 21:12, on Zulip):

it's not entirely trivial, the naive thing doesnt work... I forgot why though. I should find out and describe it to y'all^^

Nicole Mazzuca (Nov 30 2018 at 21:50, on Zulip):

@RalfJ C++ has those same semantics for < (unspecified, nondeterministic, but not UB)

Nicole Mazzuca (Nov 30 2018 at 21:51, on Zulip):

and it uses std::less to compare pointers for std::map, which creates a total ordering

RalfJ (Nov 30 2018 at 22:06, on Zulip):

@Nicole Mazzuca IIRC C++ allows an "indeterminate" result at least for ==, which would be undef in LLVM? Or has that changed on more recent versions?

Nicole Mazzuca (Nov 30 2018 at 22:06, on Zulip):

yeah, I believe it changed in C++14?

Nicole Mazzuca (Nov 30 2018 at 22:08, on Zulip):

it's now "unspecified"

RalfJ (Nov 30 2018 at 22:08, on Zulip):

hm... http://eel.is/c++draft/expr.eq#3.1 says the result may be "unspecified"

RalfJ (Nov 30 2018 at 22:08, on Zulip):

I have no idea what that means

Nicole Mazzuca (Nov 30 2018 at 22:08, on Zulip):

lemme look and check for sure what that means

RalfJ (Nov 30 2018 at 22:09, on Zulip):

http://eel.is/c++draft/defns.unspecified#:behavior,unspecified

behavior, for a well-formed program construct and correct data, that depends on the implementation
[ Note: The implementation is not required to document which behavior occurs.

RalfJ (Nov 30 2018 at 22:09, on Zulip):

now I am left wondering if the compiler may choose to make the result "indeterminate"

Nicole Mazzuca (Nov 30 2018 at 22:09, on Zulip):

I don't believe so

RalfJ (Nov 30 2018 at 22:09, on Zulip):

but it also doesnt say the result may be non-deterministic

RalfJ (Nov 30 2018 at 22:10, on Zulip):

as in, does comparing the same two pointers twice have to produce the same result?

RalfJ (Nov 30 2018 at 22:10, on Zulip):

The range of possible behaviors is usually delineated by this document.

well, not in this case :/

Nicole Mazzuca (Nov 30 2018 at 22:11, on Zulip):

I believe it's something like "a < b" must produce either true or false for any specific expression

RalfJ (Nov 30 2018 at 22:11, on Zulip):

but does (a < b) == (a < b) always have to produce true?

RalfJ (Nov 30 2018 at 22:11, on Zulip):

it doesnt allow non-determinism

Nicole Mazzuca (Nov 30 2018 at 22:11, on Zulip):

i.e., it's definitely gonna be one or the other, but it doesn't have to be consistent

RalfJ (Nov 30 2018 at 22:11, on Zulip):

it also doesnt forbid an indeterminate value

RalfJ (Nov 30 2018 at 22:11, on Zulip):

so this seems fairly vague to me

Nicole Mazzuca (Nov 30 2018 at 22:12, on Zulip):

the forbidding of an indeterminate value is implied

Nicole Mazzuca (Nov 30 2018 at 22:12, on Zulip):

so the comparison is nondeterministic, but each comparison produces exactly one value

RalfJ (Nov 30 2018 at 22:12, on Zulip):

hm that would make sense (that it can't be indeterminate or plain UB), though I'd prefer if it said that in the def.n of "unspecified"

Nicole Mazzuca (Nov 30 2018 at 22:13, on Zulip):

I agree

RalfJ (Nov 30 2018 at 22:13, on Zulip):

so the comparison is nondeterministic, but each comparison produces exactly one value

that seems to be the most liberal interpretation

RalfJ (Nov 30 2018 at 22:13, on Zulip):

and it is in line with what we model for LLVM

RalfJ (Nov 30 2018 at 22:13, on Zulip):

however, LLVM will (AFAIK) duplicate icmp, so it actually expects its own operation to be deterministic

RalfJ (Nov 30 2018 at 22:14, on Zulip):

and I don't know how to model that^^

Nicole Mazzuca (Nov 30 2018 at 22:14, on Zulip):

yeah

Nicole Mazzuca (Nov 30 2018 at 22:14, on Zulip):

I doubt it's necessary, tbh

RalfJ (Nov 30 2018 at 22:14, on Zulip):

duplicating?

Nicole Mazzuca (Nov 30 2018 at 22:14, on Zulip):

no, modeling anything more than nondeterminism

RalfJ (Nov 30 2018 at 22:15, on Zulip):

on the surface language level, maybe

RalfJ (Nov 30 2018 at 22:15, on Zulip):

I'd like to understand LLVM as well though

Nicole Mazzuca (Nov 30 2018 at 22:15, on Zulip):

faut

Nicole Mazzuca (Nov 30 2018 at 22:15, on Zulip):

*fair

RalfJ (Nov 30 2018 at 22:19, on Zulip):

well, this has been interesting but it's getting late. see you later everyone!

Nicole Mazzuca (Nov 30 2018 at 22:20, on Zulip):

o7

RalfJ (Nov 30 2018 at 22:20, on Zulip):

I already wanted to ask yesterday... what does "o7" mean?

RalfJ (Nov 30 2018 at 22:20, on Zulip):

or is that "o/" plus "too lazy for shift"?^^

RalfJ (Nov 30 2018 at 22:20, on Zulip):

(at least on the German keyboard, / is Shift-7)

Nicole Mazzuca (Nov 30 2018 at 22:21, on Zulip):

nah, it's just a different wave

RalfJ (Nov 30 2018 at 22:21, on Zulip):

okay :D :wave:

Nicole Mazzuca (Nov 30 2018 at 22:21, on Zulip):

also, I think, from the documentation, icmp on pointers should be exactly a ptrtoint comparison

Nicole Mazzuca (Nov 30 2018 at 22:22, on Zulip):

If the operands are pointer typed, the pointer values are compared as if they were integers.

Nicole Mazzuca (Nov 30 2018 at 22:22, on Zulip):

if a pass actually doesn't treat pointers like that, I feel like it should probably be seen as a bug

RalfJ (Dec 01 2018 at 08:14, on Zulip):

also, I think, from the documentation, icmp on pointers should be exactly a ptrtoint comparison

you mean in LLVM? we certainly have counterexamples for that and I think they are deliberate: LLVM will optimize x == y to false if they come from different malloc, even if (using getelementptr) they might be equal

Nicole Mazzuca (Dec 01 2018 at 08:25, on Zulip):

even if that's true, that feels like it should be treated as a miscompilation :<

Nicole Mazzuca (Dec 01 2018 at 08:25, on Zulip):

at least according to the documentation on icmp

RalfJ (Dec 01 2018 at 08:37, on Zulip):

You are right, it is pretty explicit about that:

If the operands are pointer typed, the pointer values are compared as if they were integers.

RalfJ (Dec 01 2018 at 08:45, on Zulip):

LLVM also works really hard to make sure that icmp is observably deterministic

RalfJ (Dec 01 2018 at 08:46, on Zulip):

but I dont know how to turn that into a formal operational spec, because that would basically require knowing how an allocation is used already when it is created. and a syntactic analysis does not work because LLVM may optimize away uses.

Last update: Nov 19 2019 at 18:00UTC