Stream: t-compiler/wg-polonius

Topic: Understanding the Polonius model


Will Crichton (Apr 01 2021 at 14:04, on Zulip):

Hi Polonius team, after a productive thread (#general > Relationship between lifetimes and aliasing) I've set out to better understand the Polonius model. A first question: would you agree with this statement?

In the Polonius model of borrows, the borrow checker is an abstract interpreter that computes the set of possible locations that a pointer can have at any point in the program.

I realize that doesn't include details about whether the loans are mutable and whether loan terms are violated. But it struck me that Polonius looks much more like an abstract interpreter than the classic borrow checker does.

Will Crichton (Apr 01 2021 at 14:12, on Zulip):

Also, importantly Polonius is (in a way) a _flow-insensitive_ data analysis because the set of possible loans is accumulated across all points, rather than computed differently for each point. Therefore a reference's region has a single value, rather than a different value per point.

But that description doesn't seem to take into account the refinement of subset/requires as described in Niko's blog post: http://smallcultfollowing.com/babysteps/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/#refining-constraint-propagation-with-liveness

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:03, on Zulip):

Will Crichton said:

Hi Polonius team, after a productive thread (#general > Relationship between lifetimes and aliasing) I've set out to better understand the Polonius model. A first question: would you agree with this statement?

In the Polonius model of borrows, the borrow checker is an abstract interpreter that computes the set of possible locations that a pointer can have at any point in the program.

Yes. This is exactly what Prusti exploits for verification.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:04, on Zulip):

Just it is important to note that when you cross the function boundary, the lifetimes act as a way to abstract over possible locations.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:06, on Zulip):

Will Crichton said:

Also, importantly Polonius is (in a way) a _flow-insensitive_ data analysis because the set of possible loans is accumulated across all points, rather than computed differently for each point. Therefore a reference's region has a single value, rather than a different value per point.

This sounds wrong to me, probably I do not understand what you mean. Could you give an example?

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:09, on Zulip):

In Polonius, loan_live_at(L, P) is computed for each program point P as described in Niko's blog post.

Will Crichton (Apr 01 2021 at 20:24, on Zulip):

@Vytautas Astrauskas [he/him] thanks for the clarification. And actually the flow-sensitive part isn't super important, so don't worry about that.

But I do actually have a question for you -- I'm trying to write an OOPSLA paper about my Rust program slicer. One goal is to create a more formal argument for why the borrow checker's information is sufficient to conservatively estimate the aliases for any borrow (at the MIR level). Do you know of any related work that makes a similar argument? I've checked out Prusti, RustBelt, and Oxide but none seems quite appropriate -- Prusti focuses on encoding into Viper, RustBelt focuses on implications of unsafe, and Oxide focuses on surface-level Rust and not MIR.

Will Crichton (Apr 01 2021 at 20:30, on Zulip):

My current intuition (and the relevance to Polonius) is to make the following argument:

Will Crichton (Apr 01 2021 at 20:35, on Zulip):

I'm trying to figure out:
a) is this argument correct?
b) if so, do I need to define my own MIR formalism, or I can reuse one like e.g. in RustBelt
c) which parts of this argument have already been made in related work vs. I need to justify myself
d) how to precisely talk about the "necessary set of outlives relations", ie the sufficient condition that rustc must meet for alias analysis to be correct

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:46, on Zulip):

One goal is to create a more formal argument for why the borrow checker's information is sufficient to conservatively estimate the aliases for any borrow (at the MIR level).

Isn't this basically an argument that the Rust type system is sound?

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:48, on Zulip):

Or to phrase it differently: if the borrow checker's information would not be sufficient, I, think, that would imply that the Rust type system is unsound.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 20:51, on Zulip):

RustBelt focuses uses an outdated notion of lifetimes

I would not agree with this. Even though the paper was written before NLL and Polonius, I think the presented approach is flexible enough to handle most examples. I know that there are problems with two-phase borrows, but that is the only limitation I remember.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 21:07, on Zulip):

a) is this argument correct?

I do not see any mistakes, but it is also a bit late now for me to give a proper answer.

b) if so, do I need to define my own MIR formalism, or I can reuse one like e.g. in RustBelt

This literary looks like you are trying to prove the soundness of Polonius. One option would be to try to prove it against operational semantics, which would be Stacked Borrows.

c) which parts of this argument have already been made in related work vs. I need to justify myself

Do you actually need to? Cannot you just assume the correctness of the compiler?

Will Crichton (Apr 01 2021 at 21:07, on Zulip):

Re: the type system, I agree, and part of the point is to elucidate the link between alias analysis and Rust's type system. Rust doesn't actually compute aliases and use them in type checking, so it's not immediately obvious that the borrow checker outputs are sufficient to compute aliases.

And yes I don't meant to imply that the RustBelt formalism is no longer useful. But rather the connection between borrows and aliases is much clearer (I think) when lifetimes are provenances (ie set of loans) instead of regions (sets of locations).

Will Crichton (Apr 01 2021 at 21:10, on Zulip):

One key bit is that I'd like to articulate this point in a way that's distinct from whether Rust is using classic borrow checking, NLL, or Polonius. I want to say "if a compiler infers at least the right set of outlives constraints, then this method of computing aliases is sound" and then make a high-level argument as to why any particular version of the borrow checker infers the "right" set of constraints.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 21:12, on Zulip):

Couldn't you make an argument that if the borrow checker would not do a proper over-approximation of aliases, the type system would be unsound?

Vytautas Astrauskas [he/him] (Apr 01 2021 at 21:13, on Zulip):

I do not think you can have the same proof for all three versions of the borrow checker.

Vytautas Astrauskas [he/him] (Apr 01 2021 at 21:14, on Zulip):

And yes I don't meant to imply that the RustBelt formalism is no longer useful. But rather the connection between borrows and aliases is much clearer (I think) when lifetimes are provenances (ie set of loans) instead of regions (sets of locations).

As far as I understand RustBelt, it actually does not care which interpretation of lifetimes you take. It works with both.

Will Crichton (Apr 01 2021 at 21:16, on Zulip):

Couldn't you make an argument that if the borrow checker would not do a proper over-approximation of aliases, the type system would be unsound?

Yes this seems reasonable. I'll have to think about the argument, thanks for the suggestion.

As far as I understand RustBelt, it actually does not care which interpretation of lifetimes you take. It works with both.

Ok, I will see if I can make the argument above using the RustBelt model.

Last update: Jun 20 2021 at 00:15UTC