Stream: t-compiler/major changes

Topic: Add StatementKind::Call to MIR compiler-team#348


triagebot (Aug 21 2020 at 19:52, on Zulip):

A new proposal has been announced: Add StatementKind::Call to MIR #348. It will be announced at the next meeting to try and draw attention to it, but usually MCPs are not discussed during triage meetings. If you think this would benefit from discussion amongst the team, consider proposing a design meeting.

nagisa (Aug 21 2020 at 20:06, on Zulip):

this I think needs a benefit vs complexity evaluation. Rust seems like a language that's excessively unlikely to not have cleanups in any given function.

nagisa (Aug 21 2020 at 20:07, on Zulip):

(this is also super easy to check, just count how many of our current terminators end up without a cleanup edge.

Taylor Cramer (Aug 21 2020 at 20:31, on Zulip):

function calls that do not under any circumstance panic or unwind

Would this include all functions that are compiled under panic=abort?

nagisa (Aug 21 2020 at 21:04, on Zulip):

yeah, but panic=abort is a fairly special case scenario by itself. That said, I wouldn’t oppose this change if we also start requiring that the calls at a terminator position have a unwind edge.

nagisa (Aug 21 2020 at 21:05, on Zulip):

(i.e. make https://github.com/rust-lang/rust/blob/de521cbb303c08febd9fa3755caccd4f3e491ea3/src/librustc_middle/mir/terminator/mod.rs#L118 a BasicBlock instead of Option<BasicBlock>)

Amanieu (Aug 22 2020 at 06:12, on Zulip):

This breaks the assumption that Statements have no side effects (in particular that they do not diverge). LlvmAsm works around this with some hacks (1 2).

Amanieu (Aug 22 2020 at 06:13, on Zulip):

Ideally we will want to remove these hacks once llvm_asm! is fully deprecated and removed (replaced by asm! which is a terminator in MIR).

oli (Aug 22 2020 at 10:44, on Zulip):

Right, the point of these StatementKind::Call is to not have diverging calls at all. Things like transmute, memcpy, size_of, ...

bjorn3 (Aug 22 2020 at 10:50, on Zulip):

Maybe call it CallIntrinsic then and give it a (DefId, SubstsRef) as function to call instead of Operand, indirect calls can't be represented.

nagisa (Aug 22 2020 at 13:13, on Zulip):

oli said:

Right, the point of these StatementKind::Call is to not have diverging calls at all. Things like transmute, memcpy, size_of, ...

I guess that makes sense, I recall us wanting to special case some of these for quite some time.

nagisa (Aug 22 2020 at 13:16, on Zulip):

but that also requires that we have in MIR something like the readnone attribute in LLVM which we don’t currently. We could special case some specific symbols but something named Call seems too generic for what it would do without such an attribute (and CallIntrinsic too limiting given that we might be able to eventually add such a attribute or whatever)

Rich Kadel (Aug 22 2020 at 16:03, on Zulip):

@Wesley Wiser this sounds very relevant to the options we discussed/considered before moving injected coverage "calls" from the Call terminator to the new StatementKind::Coverage, before landing PR #75563.

nagisa (Aug 22 2020 at 16:58, on Zulip):

to me StatementKind::Coverage seems like the most elegant solution, especially when thinking who will want to care about that. Nesting these constructs or hiding them behind a Call makes them more difficult inspect/handle.

oli (Aug 22 2020 at 21:03, on Zulip):

Side note: we managed to special case size_of, align_of and any other functions with no arguments by making them constants

jknodt (Aug 23 2020 at 00:56, on Zulip):

Would it be significantly difficult to start of super conservatively with something like CallIntrinsic that handles just things like transmute, memcpy, and then when it becomes more clear what a more general solution would be to expand to encompass that as well?

oli (Aug 25 2020 at 22:27, on Zulip):

I like starting out with the conservative thing, but we should at least have a plan for future extensions. Side note: memcpy is actually copy_nonoverlapping

jknodt (Aug 26 2020 at 01:56, on Zulip):

This is fair enough, I guess the main intent of this is to make it easier to perform optimizations on the code, because I can't see it enabling any functionality that was not existing prior. As such, I think it would be idea to also follow nagisa's suggestion that

nikomatsakis (Aug 28 2020 at 13:34, on Zulip):

I'm not close enough to the work on compiler optimizations, I guess, to make a call on this. From the POV of borrow checker and type-checker, it seems to just make everything a bit more complicated.

nikomatsakis (Aug 28 2020 at 13:34, on Zulip):

Can someone elaborate a bit on the benefits?

nikomatsakis (Aug 28 2020 at 13:35, on Zulip):

Is it that basic-block-local optimizations become more effective?

nikomatsakis (Aug 28 2020 at 13:35, on Zulip):

I could imagine memory use improvements as well

oli (Aug 28 2020 at 15:48, on Zulip):

It would make several optimizations more effective as they must currently treat function calls as these opaque side-effecting things

nikomatsakis (Aug 28 2020 at 15:49, on Zulip):

@oli I don't understand why a function call in a statement is any less opaque

nikomatsakis (Aug 28 2020 at 15:50, on Zulip):

if it's a matter of "we've analyzed it and we've identified it doesn't have side effects", couldn't we set a flag on the terminator too?

oli (Aug 28 2020 at 15:50, on Zulip):

these are just function calls that cannot unwind

nikomatsakis (Aug 28 2020 at 15:50, on Zulip):

right but we already have only one outgoing edge from such functions

oli (Aug 28 2020 at 15:50, on Zulip):

sure, that would work, too, and is basically what @nagisa suggested (make the unwind path an Option)

oli (Aug 28 2020 at 15:51, on Zulip):

so... we'd change the optimizations to handle function calls without an unwind edge the way we'd treat these hypothetical function call statements

oli (Aug 28 2020 at 15:52, on Zulip):

that's an alternative I guess. it won't reduce the number of basic blocks, but I gotta look at the motivating examples for this MCP to see if that would work, too. Doing that now

nagisa (Aug 28 2020 at 16:42, on Zulip):

If we're doing anything I’m a strong proponent for trying statements for specific things we want to handle specially and only when that becomes unwieldy to look into generalizing.

nikomatsakis (Aug 29 2020 at 13:54, on Zulip):

@oli I thought we already had functions with an optional unwind edge

nikomatsakis (Aug 29 2020 at 13:54, on Zulip):

am I misremembering?

nikomatsakis (Aug 29 2020 at 13:55, on Zulip):

@nagisa I'm not sure I understand that comment, I can't tell if it is in favor of the MCP as proposed or some alternative -- e.g., maybe having a specialized statement for specific builtin functions?

Wesley Wiser (Aug 29 2020 at 14:48, on Zulip):

@nikomatsakis I think that was in reference to the Coverage statement we recently added for the LLVM code coverage feature. Before adding that, we were using a regular LLVM intrinsic so we generated function calls to it which had unwind edges. That worked but it made the instrumentation pass pretty complex because we basically doubled (or more) the number of function calls in a given body.

Recently, we changed this and added a Coverage statement which just lowers to the intrinsic and cannot unwind. I actually suggested adding StatementKind::Call at that time instead of the more specific Coverage statement but now I'm glad we didn't do that. Coverage has simple semantics and 90% of the compiler doesn't need to care about it at all so it's very easy to see that it's handled correctly. Only cg_llvm needs to do anything with it.

So I totally agree with @nagisa in regards to "trying statements for specific things we want to handle specially and only when that becomes unwieldy to look into generalizing"

nagisa (Aug 29 2020 at 14:49, on Zulip):

Yeah, I’m advocating for an alternative much like Coverage statement we have right now.

Wesley Wiser (Aug 29 2020 at 14:55, on Zulip):

That said, I could see how making function calls that cannot unwind statements instead of terminators could improve performance. I think we have some MIR passes which are super linear in the number of basic blocks (I could totally be wrong about that) so reducing the number of blocks would be a win. I'm not sure this makes writing MIR optimizations easier as call statements are just as side effect-y as call terminators are.

So, if this improves performance it may be worth the additional complexity but I don't see how we could know that without doing the work and then testing it.

RalfJ (Aug 29 2020 at 18:16, on Zulip):

Amanieu said:

This breaks the assumption that Statements have no side effects (in particular that they do not diverge). LlvmAsm works around this with some hacks (1 2).

there are statements with side-effects -- *x = ..., for example. but indeed diverging statements would be something new which seems problematic, also for optimizations.

RalfJ (Aug 29 2020 at 18:19, on Zulip):

for example, currently if we have stmt1; stmt2; stmt3 and the last one causes UB, we can just replace the entire block by "unreachable". once stmt2 might diverge, that is no longer correct.

RalfJ (Aug 29 2020 at 18:20, on Zulip):

in fact I think the term "basic block" should be changed then, as it no longer matches what the rest of the world calls a "basic block"

jknodt (Aug 29 2020 at 23:29, on Zulip):

Ah but for the proposed StatementKind::Call, it would only permit convergent functions. Whether that should be a small subset of intrinsics that are known to converge as a conservative approach or to open up much more to all functions that converge and restricting TerminatorKind::Call I think is where the split among opinion currently is. I think it would be more feasible to start with a limited subset and expand later if it's promising, altho I think earlier I thought it might be cleaner to refactor TerminatorKind::Call, that would be significantly larger effort which while related is not necessarily tied to this change

RalfJ (Aug 30 2020 at 06:46, on Zulip):

Ah but for the proposed StatementKind::Call, it would only permit convergent functions.

That would help a lot. This was said somewhere in the beginning of this thread, but later people kept just talking about doing this for all nounwind functions, so I got worried.

RalfJ (Aug 30 2020 at 06:47, on Zulip):

it's still a bit tricky though -- we'd have to say something like "it is UB for a StatementKind::Call to never return". this is UB Miri cannot test for, it is non-decidable UB (see halting problem). That's not great.

RalfJ (Aug 30 2020 at 06:48, on Zulip):

also the exact clause is something like "the function must always return in every execution", i.e., if the function depends on external input (say, reads from a file or so) then no matter what happens it must return.

RalfJ (Aug 30 2020 at 06:49, on Zulip):

so considering all that I am definitely in favor of doing this only for intrinsics; AFAIK all our intrinsics always return (except maybe the ones around unwinding?). then I'd call it StatementKind::Intrinsic. whether or not we also use that for nullary intrinsics (or keep treating them as consts), I am not sure what is better.

bjorn3 (Aug 30 2020 at 07:20, on Zulip):

I think the volatile group of intrinsics should also be forbidden for StatementKind::Call.

jknodt (Aug 30 2020 at 07:34, on Zulip):

These are all the intrinsics which have "volatile" in the function name, or are there others?

RalfJ (Aug 30 2020 at 07:56, on Zulip):

bjorn3 said:

I think the volatile group of intrinsics should also be forbidden for StatementKind::Call.

hm... that is probably a reasonably conservative choice, so we do not have to answer the question if UB is allowed to go backwards in time across volaltile accesses

RalfJ (Aug 30 2020 at 07:56, on Zulip):

jknodt said:

These are all the intrinsics which have "volatile" in the function name, or are there others?

that should be the ones yes

RalfJ (Aug 30 2020 at 07:57, on Zulip):

@bjorn3 I dont think compiler fences are UB barriers? (If they are, then we'd also have to exclude those and atomic fences... and atomic accesses...)

bjorn3 (Aug 30 2020 at 09:02, on Zulip):

@RalfJ I was thinking about the fact that volatile memory accesses may have side-effects, which could for example reasonably include pausing the process forever allowing another process to read the value stored by the volatile store. I just read the LLVM lang ref and it says "The compiler may assume execution will continue after a volatile operation, so operations which modify memory or may have undefined behavior can be hoisted past a volatile operation." (https://llvm.org/docs/LangRef.html#volatile-memory-accesses)., so it would indeed be UB for a volatile memory access to be followed by "unreachable".

RalfJ (Aug 30 2020 at 09:48, on Zulip):

ah, interesting... so given we could likely make them statements as well

RalfJ (Aug 30 2020 at 09:49, on Zulip):

I am thinking of volatile accesses as "syscalls the compiler knows something about (so it can optimize more around them)"; I just didn't have "the syscall always returns" on my list of "things the compiler knows"

bjorn3 (Aug 30 2020 at 09:57, on Zulip):

Neither did I.

nikomatsakis (Sep 02 2020 at 21:33, on Zulip):

I see, if this is not a general call mechanism, but is tailored to specific intrinsics, then I think I agree with @nagisa and @Wesley Wiser that custom statements is likely better

jknodt (Sep 03 2020 at 00:01, on Zulip):

Should I update the MCP with the relevant discussion then?

jknodt (Sep 07 2020 at 17:50, on Zulip):

I've updated the main issue with relevant discussion, not sure if I correctly interpreted all the comments here.

oli (Sep 08 2020 at 07:28, on Zulip):

I think @nagisa suggested a special Rvalue::Transmute instead of a StatementKind::Intrinsic and potentially other variants for intrinsics or other operations as we need them

oli (Sep 08 2020 at 07:31, on Zulip):

I think the original motivator for this MCP was copy_nonoverlapping, which could be created via StatementKind::CopyNonoverlapping(Place, Place)

oli (Sep 08 2020 at 09:55, on Zulip):

Oh. we also need a byte count I guess. so StatementKind::CopyNonoverlapping { source: Place, dest: Place, num_bytes: Size }

oli (Sep 08 2020 at 09:55, on Zulip):

we can always change it to have a type and an number of elements field if that becomes a use case

jknodt (Sep 09 2020 at 08:08, on Zulip):

Ah alright, that's even more specific than I thought but sounds good

jknodt (Sep 09 2020 at 16:50, on Zulip):

alright I've updated the MCP again

nikomatsakis (Sep 18 2020 at 18:57, on Zulip):

note that since the MCP issue title changed, the bot posted my "second" in #t-compiler/major changes > Add StatementKind::Intrinsic to MIR compiler-team#348

nikomatsakis (Sep 18 2020 at 18:58, on Zulip):

also @Wesley Wiser and @nagisa I took the liberty of seconding this, but I'd like y'all to approve since I'm not closely monitoring this sort of work really :)

nagisa (Sep 18 2020 at 23:34, on Zulip):

thirded.

Last update: May 07 2021 at 07:45UTC