Stream: general

Topic: c2rust and Unique references


Elichai Turkel (Feb 09 2020 at 16:22, on Zulip):

Hi I've been playing with c2rust all day, which worked pretty awesome. but then I ran Miri(@RalfJ ) and saw trying to reborrow for SharedReadWrite, but parent tag <untagged> does not have an appropriate item in the borrow stack
when I realized that this is probably because rust doesn't allow 2 mutable(unique) references to the same object, so a common C pattern like:

multiply(result: *mut T, a: *mut T, b: *mut T);
multiply(&mut a, &mut a, &mut b);

is just not allowed in rust if result and a are deferenced together. ( while writing started to wonder if it's possible to useptr::read and ptr:write for that)

Elichai Turkel (Feb 09 2020 at 16:23, on Zulip):

Thinking about it more, read+write isn't enough, sometimes you want to access fields and that's not possible without creating a reference

Hanna Kruppe (Feb 09 2020 at 16:34, on Zulip):

The language feature for this not in stable Rust yet, so it would be unreasonable to expect c2rust to use it, but the Right Solution (eventually) is to never create references, always work with raw pointers, using &mut raw / &const raw for operations such as taking pointers and projecting through pointers.

Elichai Turkel (Feb 09 2020 at 16:50, on Zulip):

Right I've seen people use it.
one thing I don't like is that it makes it harder to understand what the heck a dereference is.
ie let a:&raw T = &raw *t; like is this safe or unsafe? what did it dereference to? can t be null/dangling?

Lokathor (Feb 09 2020 at 17:45, on Zulip):

it's unsafe if you try to write it outside of an unsafe block and get told you can't.

Lokathor (Feb 09 2020 at 17:47, on Zulip):

More helpfully, it depends on what type t is.

if t is a reference it's safe (and will never dangle of course)
if t is a raw pointer it's unsafe (and yes it might be null or dangle)

Elichai Turkel (Feb 10 2020 at 09:38, on Zulip):

@Lokathor So you're saying that let t: *const T; let r = &raw *t requires that t points to a valid T just like &*t requires? the only difference is that it doesn't create a shared reference?

Lokathor (Feb 10 2020 at 16:12, on Zulip):

My understanding is that you'd be able to write that and it would create a raw pointer r based on the raw pointer t, in this case r would just end up pointing to what t is pointing to.

However I think that the raw syntax isn't even implemented in Nightly yet, so I don't want to get too carried away with fine details here, because they might change during the implementation and stabilization.

RalfJ (Feb 10 2020 at 17:55, on Zulip):

@Elichai Turkel do you have a minimal example that runs on the playground? that makes it easier to figure out what is going on

RalfJ (Feb 10 2020 at 18:18, on Zulip):

Lokathor So you're saying that let t: *const T; let r = &raw *t requires that t points to a valid T just like &*t requires? the only difference is that it doesn't create a shared reference?

for now, yes

RalfJ (Feb 10 2020 at 18:19, on Zulip):

not creating a shared ref means it doesnt have to be readonly, etc

RalfJ (Feb 10 2020 at 18:19, on Zulip):

@Lokathor &raw is implemented in nightly :)

RalfJ (Feb 10 2020 at 18:20, on Zulip):

in the future we probably will want to relax the rules around deref so that &raw (*var).field doesnt require anything to be a valid instance of anything

Lokathor (Feb 10 2020 at 18:20, on Zulip):

ah excellent!

RalfJ (Feb 10 2020 at 18:20, on Zulip):

but I am not sure yet what the most elegant way is to state this...

Lokathor (Feb 10 2020 at 18:21, on Zulip):

seems easy if you don't run a deref impl

Lokathor (Feb 10 2020 at 18:21, on Zulip):

and not easy otherwise

RalfJ (Feb 10 2020 at 18:22, on Zulip):

well we still want &(*raw).field to require validity... maybe? or not?

RalfJ (Feb 10 2020 at 18:22, on Zulip):

having deref semantics depend on the surrounding & vs &raw seems really ugly though

RalfJ (Feb 10 2020 at 18:23, on Zulip):

also I am thinking on the MIR level, there is no deref impl

RalfJ (Feb 10 2020 at 18:23, on Zulip):

those are already lowered to fn calls

Lokathor (Feb 10 2020 at 18:25, on Zulip):

well if &(*raw).field makes &F it has to point to valid since all references must. If it makes *const F then it doesn't have to point to anything.

Lokathor (Feb 10 2020 at 18:25, on Zulip):

so &raw (*x).field can point to invalid

Lokathor (Feb 10 2020 at 18:26, on Zulip):

but &(*x).field must be valid

RalfJ (Feb 10 2020 at 18:26, on Zulip):

@Lokathor we were talking about the deref, not the ref, I thought

RalfJ (Feb 10 2020 at 18:27, on Zulip):

so, the *x subexpression of this, what exactly are its proof obligations? x must be non-null and aligned for sure. and probably also dereferencable. and maybe point to a valid T?

Lokathor (Feb 10 2020 at 18:27, on Zulip):

well we still want &(*raw).field to require validity... maybe? or not?

i was responding to this thought here. Im saying that we shoud want that to require validity there

RalfJ (Feb 10 2020 at 18:28, on Zulip):

well... we might not want to require all references to point to a valid T even when unused. and if we go down that route, I dont think there is anything in that expression that makes such an obligation arise.

Lokathor (Feb 10 2020 at 18:29, on Zulip):

as to *x on its own, honestly the point of a lot of this is for maybe uninit, so reading it as just *x is probably the wrong design because then you couldn't initialize fields individually of a MaybeUninit (if i follow you correctly)

RalfJ (Feb 10 2020 at 18:34, on Zulip):

the behavior of a compung expression like &raw (*x).field ought to be defined by defining all of its pieces separately

RalfJ (Feb 10 2020 at 18:34, on Zulip):

if we cant do that we'll just end up with a mess

RalfJ (Feb 10 2020 at 18:35, on Zulip):

though to do this we will have to be more explicit about place-to-value coercions...

RalfJ (Feb 10 2020 at 18:35, on Zulip):

which there are none here (well, x is really load(x) but that's trivial), but in let val = *x there's one: let val = load(*x)

Lokathor (Feb 10 2020 at 18:46, on Zulip):

if *x doesn't need to be valid until you load it that probably works out.

Then (*x).field is
"follow the address of X to a place, then offset to the sub-place of the particular field"

Elichai Turkel (Feb 11 2020 at 09:31, on Zulip):

the behavior of a compung expression like &raw (*x).field ought to be defined by defining all of its pieces separately

I 100% agree. that was my main complaint. the behavior of *x cannot(imo) be defined by if it has & or &raw before it.
maybe we can change the syntax though? and not have the * operation when creating raw "references". not sure how it would look though

Personally I prefer to stick with the common well known "deference requires the pointer to be a valid aligned pointer to an existing object" and just not use dereferencing in raw references

Elichai Turkel (Feb 11 2020 at 09:59, on Zulip):

Elichai Turkel do you have a minimal example that runs on the playground? that makes it easier to figure out what is going on

I was able to get it down to this minimal example: https://play.rust-lang.org/?edition=2018&gist=ee4b825dc44c918c6fc8c9fb58810bfa (obviously the real code is more contrived but it all starts because of 2 mutable references to the same place)

I can try going over c2rust output and replace all unneeded mutable references with shared references, but idk if it will be possible in all places where 2 mutable references to the same place are created.

Lokathor (Feb 11 2020 at 16:56, on Zulip):

@Elichai Turkel Requiring all pointers to be aligned and valid is just unacceptable, because then there is no way to get to fields.

Hanna Kruppe (Feb 11 2020 at 17:03, on Zulip):

Giving two different interpretations to place expressions depending on what context they are used is not a very elegant formalism, but since this context always comes from the immediate syntactic surroundings it does not actually cause any technical problems. While the potential for confusion when reading code is important, inventing a whole second set of syntax doesn't sound like it's an obvious improvement, especially given that expressions like (*raw_ptr).field are already wide-spread and should presumably have just as few restrictions on raw_ptr as whatever new syntax gets introduced.

Elichai Turkel (Feb 11 2020 at 17:10, on Zulip):

@Lokathor they can. Just not when dereferencing.
My point is that dereferencing a raw pointer and creating a reference should be a different operation than creating a "raw reference"

(honestly I'm not quite sure why we to add raw references and we can't just implement all the needed functionality on pointers)

Hanna Kruppe (Feb 11 2020 at 17:22, on Zulip):

There's no "raw references". There's just an operator to create a raw pointer to a memory location, immediately, without involving a reference at all. Describing the memory location in question can involve many nested operations, not just dereferencing, regardless of what you wind up doing with the memory location (taking a reference, taking a raw pointer, reading data directly, writing data directly, etc.). For example, consider &mut raw (*ptr).data[1] where ptr: *mut Foo with struct Foo { header: u16, data: [u64; 8] }.

Lokathor (Feb 11 2020 at 17:29, on Zulip):

Yes, to be clear, &raw isn't some new data type, it's just the way to immediately make *const T instead of making &T and the having it be coerced to *const T by usage.

Elichai Turkel (Feb 12 2020 at 09:40, on Zulip):

@Lokathor Why not allow doing ptr.field, which will create a raw pointer to the field?

Lokathor (Feb 12 2020 at 09:44, on Zulip):

This is a pretty wide design space of what could have happened.

Long story short, we got &raw and that's what we got.

Elichai Turkel (Feb 12 2020 at 09:44, on Zulip):

Yeah I guess it's too late to do any major change here

Jake Goulding (Feb 12 2020 at 13:56, on Zulip):

Not too late, just has a higher hill to climb.

Jake Goulding (Feb 12 2020 at 13:56, on Zulip):

It's not stable yet.

RalfJ (Feb 13 2020 at 08:58, on Zulip):

@Hanna Kruppe I think we can and should avoid both context-dependent interpretation of place expressions and separate syntax

RalfJ (Feb 13 2020 at 08:58, on Zulip):

and it's not like that's impossible, it's just that there's a bunch of choices and no clear winner

RalfJ (Feb 13 2020 at 08:59, on Zulip):

@Elichai Turkel the main counterexample for that was that it is very inconsistent with the behavior of .field on anything else. the same syntax shouldnt have to widely different behaviors.

Elichai Turkel (Feb 13 2020 at 09:06, on Zulip):

@RalfJ I think as a user of the language I feel it makes more sense to do that then to have the behavior through the & operator.
I'm already not quite sure what happens when I do something like&mut (*(*(*p).a).b)[0] is there any shared reference produced in the middle? Does it produce unique references at each deref? Neither?

RalfJ (Feb 13 2020 at 09:15, on Zulip):

not sure what you mean by "behavior through the & op"

RalfJ (Feb 13 2020 at 09:16, on Zulip):

we now have two & ops that have different behavior

RalfJ (Feb 13 2020 at 09:16, on Zulip):

but it is only locally different, the p,lace below them still gets evaluated exactly the same

Hanna Kruppe (Feb 13 2020 at 10:31, on Zulip):

@RalfJ I'd like to hear what choices you see for having a single semantics for place expressions. It's not that I can't imagine any such design, but there's some UB in place expression evaluation that we'd (presumably) want to keep for most contexts but may want to remove from raw pointer projections (most importantly, the requirement that the base pointer is not dangling).

RalfJ (Feb 13 2020 at 10:38, on Zulip):

Hm, I think I was imagining we'd keep inbounds because, as you kept pointing out, it's important to LLVM to know that the arithmetic is non-wrapping.

Hanna Kruppe (Feb 13 2020 at 10:39, on Zulip):

I have hope for a future where "non-wrapping" and "base pointer not dangling" are decoupled.

RalfJ (Feb 13 2020 at 21:28, on Zulip):

you think it is realistic that LLVM will adopt this? so we should design MIR semantics for such a future?

Hanna Kruppe (Feb 13 2020 at 21:35, on Zulip):

I don't know. It may very well fall below the threshold of being worth proposing, hashing out. implementing, etc.
I also don't think we should currently define our semantics in anticipation of it: it's better to remove some UB in the future than to eat the performance impact of not emitting inbounds for however long inbounds exists.

But I think it's definitely a possibility, and so I think it's also possible that we'd have a good reason to want multiple interpretations of place expressions eventually.

RalfJ (Feb 14 2020 at 08:09, on Zulip):

I see. I am concerned about our complexity budget here, which we are already straining...

RalfJ (Feb 14 2020 at 08:11, on Zulip):

but if we go for a "contextual" semantics, I feel we should at least do some benchmarks to see how badly we need inbounds for these raw ptr ops... if we pay the complexity price anyway, being able to say that with raw ptrs the ops are entirely safe would be quite nice

RalfJ (Feb 14 2020 at 08:12, on Zulip):

I think an alternative to a contextual semantics might be to use the type of the ptr? so, make it depend not on & vs &raw furhter outside, but on whether the ptr is raw or a ref? that would be structural and not require context info

Last update: Feb 25 2020 at 03:00UTC