Stream: general

Topic: black_box dejavu


gnzlbg (Feb 12 2019 at 22:43, on Zulip):

@rkruppe http://lists.llvm.org/pipermail/llvm-dev/2015-November/091968.html

gnzlbg (Feb 12 2019 at 22:44, on Zulip):

i just found that, not sure if you have seen it

rkruppe (Feb 12 2019 at 22:44, on Zulip):

Oh gods I recognize that name

gnzlbg (Feb 12 2019 at 22:52, on Zulip):

@rkruppe after reading all that, i have some thoughts

gnzlbg (Feb 12 2019 at 22:53, on Zulip):

first, we don't want to say which optimizations black_box inhibits

gnzlbg (Feb 12 2019 at 22:53, on Zulip):

that would tie black_box definition to some optimization pipeline

gnzlbg (Feb 12 2019 at 22:54, on Zulip):

second, it feels like we want black_box to "inhibit optimizations" but while still generating optimized code, this is a problem

gnzlbg (Feb 12 2019 at 22:56, on Zulip):

third, the identity function is not what we want, because we don't want back_box(T) -> T to return the same T that it takes as an argument

gnzlbg (Feb 12 2019 at 22:57, on Zulip):

i think that what we want is for the code around black_box(T) -> T to be optimized as if black_box performs a volatile write of its argument to global memory, and then performs a volatile read of global memory, returning that result

gnzlbg (Feb 12 2019 at 22:57, on Zulip):

at the same time, we don't want any machine code to be generated to perform those reads and writes

gnzlbg (Feb 12 2019 at 22:59, on Zulip):

so that's basically what the asm! macros do

rkruppe (Feb 12 2019 at 23:00, on Zulip):

I am very wary of defining an operation as "pretend it does these operations, but don't actually codegen it that way". For starters, that doesn't indicate what the right semantics are, and the semantics are ultimately what drives development of optimizations (and lots of other stuff!). I also do not know how to implement that other than carrying this operation not just through all of IR but also practically all of the backend. That's a lot of real estate.

gnzlbg (Feb 12 2019 at 23:02, on Zulip):

right now we implement it with asm! telling LLVM that we read some memory with side-effects, but emitting no instructions within the assembly block, generating no machine code

gnzlbg (Feb 12 2019 at 23:03, on Zulip):

that might well be the right way to implement these semantics in LLVM

rkruppe (Feb 12 2019 at 23:04, on Zulip):

I may be misunderstanding what you're targeting here. Is this wording supposed to go into the user facing docs? Is that supposed to provide any tangible guarantees to users?

gnzlbg (Feb 12 2019 at 23:08, on Zulip):

The thing is that the guarantees that we provide to users cannot be used for optimizations

gnzlbg (Feb 12 2019 at 23:09, on Zulip):

To users we want to say black_box returns its argument and has no effects (it is a nop).
To the optimizer we want to say: black_box performs a volatile write of its argument to global memory, and a volatile read of some other value from global memory, and returns that.

gnzlbg (Feb 12 2019 at 23:10, on Zulip):

I don't know how a specification could specify both, they sound incompatible to me.

rkruppe (Feb 12 2019 at 23:12, on Zulip):

It's true that this kind of function is inherently self-contradicting to a degree, but I don't see the actual problem with what is the status quo implementation and in the RFC text right now?

gnzlbg (Feb 12 2019 at 23:12, on Zulip):

The RFC says that black_box is a nop, and that it might inhebit some optimizations as a quality of implementation issue.

rkruppe (Feb 12 2019 at 23:13, on Zulip):

Yes. And in practice we implement it as inline asm, which as you said seems to do something roughly sensible. What more is needed?

gnzlbg (Feb 12 2019 at 23:13, on Zulip):

Remove the quality of implementation.

rkruppe (Feb 12 2019 at 23:14, on Zulip):

I see no remotely reasonable way to do that, for many reasons some of which you stated earlier. Nor do I see any need for that for the benchmarking use case.

gnzlbg (Feb 12 2019 at 23:14, on Zulip):

We have to say that black_box is a nop, but for the purposes of code generation, it is treated as an unknown function that performs a volatile write of its argument to global memory, and returns some value that is volatile read from global memory, except that when all optimizations are over, it is replaced by a nop.

rkruppe (Feb 12 2019 at 23:14, on Zulip):

Why do we "have to say" that?

gnzlbg (Feb 12 2019 at 23:15, on Zulip):

We'd have to say something like that if we wanted to more precisely guarantee how optimizations are inhibited by the function

gnzlbg (Feb 12 2019 at 23:16, on Zulip):

That would be a way of doing that.

gnzlbg (Feb 12 2019 at 23:17, on Zulip):

I'd guess the question is whether we want people to be able to rely on black_box being optimized in a certain way or not.

gnzlbg (Feb 12 2019 at 23:17, on Zulip):

the RFC says that this is not what we want

gnzlbg (Feb 12 2019 at 23:17, on Zulip):

but some want this

gnzlbg (Feb 12 2019 at 23:17, on Zulip):

not necessary for benchmarks, but would be nice to have

rkruppe (Feb 12 2019 at 23:21, on Zulip):

I think a lot of people have some idea in their head that they could do a certain thing if they could just shut up the pesky optimizer, but I think a lot of these fall under the category as <https://twitter.com/johnregehr/status/1074712581749657600> and the ones that can be salvaged would be far better served by understanding what it is they actually want in terms of the language semantics (not the optimizer) and finding or adding a way to express that.

rkruppe (Feb 12 2019 at 23:22, on Zulip):

See also Ralf's excellent comment on the RFC PR

rkruppe (Feb 12 2019 at 23:23, on Zulip):

To take a particular example, I do not think anything in the vicinity of "constant time code" will ever be achievable by any compilation pipeline that involves LLVM in any way.

gnzlbg (Feb 12 2019 at 23:24, on Zulip):

Not constant time code, but having some mental model of how LLVM sees black_box in term of Rust operations (volatile reads/writes) would be useful when using it in benchmarks

rkruppe (Feb 12 2019 at 23:26, on Zulip):

Possibly, but I doubt I personally would get much value from that, and I eat and breathe LLVM IR and LLVM passes. But more importantly, this can be a rustc-specific note, not "the spec".

RalfJ (Feb 13 2019 at 08:55, on Zulip):

I wonder if we could try playing some trick, such as saying the Rust abstract machine has a "magic global" or so, and black_box writes a pointer there. Then, if stuff gets written to that global with atomic instructions, there could be another thread (or so?) that takes the data and manipulates it. We also have a simple syntactic condition making sure the "magic global" is never read by real Rust code, meaning we don't have to actually compile the instructions that manipulate this global to real code. But when considering whether an optimization is correct, we do allow the context (the unknown code surrounding the code we are optimizing) to read from this magic global.
The concurrency part about this seems "wrong" somehow, though. maybe signal handlers are a better model? not sure.

rkruppe (Feb 13 2019 at 10:16, on Zulip):

For the shared memory thing cramertj described, something like that seems fine. But for benchmarks, generally the "real" accesses to the data will be neither atomic nor volatile, so other threads or signal handlers accessing the data would be UB and that possibility will be rightly disregarded by optimizations. (Plus, there's also the question of how to map whatever semantics we choose to LLVM IR.)

eddyb (Feb 16 2019 at 10:08, on Zulip):

I can't tell, was this thread before my "two primitives" comment?

RalfJ (Feb 16 2019 at 10:09, on Zulip):

yeah

RalfJ (Feb 16 2019 at 10:09, on Zulip):

though I dont see that fundamentally changing anything, TBH^^ it's still "magic", isn't it?

eddyb (Feb 16 2019 at 10:10, on Zulip):

IMO the nice thing is we can require the values to still be valid

eddyb (Feb 16 2019 at 10:10, on Zulip):

i.e. it should affect performance but we guarantee 0 UB avoidance

eddyb (Feb 16 2019 at 10:11, on Zulip):

could even noop them out if --test is not passed or something silly like that

eddyb (Feb 16 2019 at 10:11, on Zulip):

I guess it's still dubious in terms of accidentally supporting constant-time stuff?

eddyb (Feb 16 2019 at 10:14, on Zulip):

@RalfJ I guess my point was that it's tamer if bench_input(mem::uninitialized()) will always error in miri

eddyb (Feb 16 2019 at 10:15, on Zulip):

and bench_output couldn't accidentally be used to assume a pointer may escape

eddyb (Feb 16 2019 at 10:15, on Zulip):

(not sure if we can tell LLVM about the latter limitation)

eddyb (Feb 16 2019 at 10:16, on Zulip):

I kind of like the closure form but idk if we can spec it any better

eddyb (Feb 16 2019 at 10:22, on Zulip):

@RalfJ now I am tempted to do something silly like x ^ bench_obscured_zero() to replace bench_input(x), especially since I suspect it's almost always integers, but I don't want to actually generate the xor instruction. if x is undef/poison I suspect this will propagate to the xor

eddyb (Feb 16 2019 at 10:23, on Zulip):

defining the semantics of these functions is frustrating since they're "just" optimizer hints

nagisa (Feb 16 2019 at 10:24, on Zulip):

uhhhh

eddyb (Feb 16 2019 at 10:24, on Zulip):

you could just as well have an attribute on the bench closure to make its args obscured and return preserved, while allowing inlining

nagisa (Feb 16 2019 at 10:24, on Zulip):

if you want constant time code you write assembly, there’s currently no other way around it

nagisa (Feb 16 2019 at 10:24, on Zulip):

and definitely no way around it if your code goes through llvm

eddyb (Feb 16 2019 at 10:24, on Zulip):

yeah my point is I don't want to mix bench stuff with freeze or ct-time

eddyb (Feb 16 2019 at 10:24, on Zulip):

the bench stuff should be implementable as noops

nagisa (Feb 16 2019 at 10:25, on Zulip):

+

eddyb (Feb 16 2019 at 10:25, on Zulip):

#[inline(bench)] lol

eddyb (Feb 16 2019 at 10:26, on Zulip):

#[inline(obscure)] might be better, idk. basically #[inline(always)] plus obscured args and used return value

RalfJ (Feb 16 2019 at 10:41, on Zulip):

i.e. it should affect performance but we guarantee 0 UB avoidance

wait but that's what the RFC already says, right?

RalfJ (Feb 16 2019 at 10:42, on Zulip):

as far as I am concerned, specifying the "bench" part is pretty much solved, the RFC seems to do that fine

RalfJ (Feb 16 2019 at 10:42, on Zulip):

some people don't like the name, is all

RalfJ (Feb 16 2019 at 10:42, on Zulip):

but then the hard open question is how to specify the "UB-avoidance" stuff

eddyb (Feb 16 2019 at 10:46, on Zulip):

yeah it's just that some people have suggested renaming to bench_used but that only makes sense for bench outputs, not inputs

eddyb (Feb 16 2019 at 10:47, on Zulip):

if anyone thinks avoiding UB might be part of what a black_box is and will abuse it without reading the docs, we should probably use a different name

eddyb (Feb 16 2019 at 10:47, on Zulip):

black_box is not a particularly helpful name anyway

RalfJ (Feb 16 2019 at 11:21, on Zulip):

bench_obfuscate :P

eddyb (Feb 16 2019 at 11:24, on Zulip):

that's good for inputs (and not far off my "obscure")

eddyb (Feb 16 2019 at 11:25, on Zulip):

I really want the input one to not keep the computation alive if it's unused and the output one to only do that keeping it alive thing

eddyb (Feb 16 2019 at 11:25, on Zulip):

but it seems hard to do in LLVM exactly like that

RalfJ (Feb 16 2019 at 11:33, on Zulip):

I agree it makes sense to distinguish these two from a "spec" perspective ("spec" as in "compiler hint spec", not language spec)

RalfJ (Feb 16 2019 at 11:34, on Zulip):

and I agree the bench harness should incorporate that:

bench::iter(inputs, |inputs| {
  let out = do_something_with(inputs);
  out
})

should pass inputs and outputs through black_box (or through the two separate operations you suggested)

maksimsco (Feb 16 2019 at 11:40, on Zulip):

Is pretend_use far off?

eddyb (Feb 16 2019 at 11:43, on Zulip):

sounds okay for the output half

eddyb (Feb 16 2019 at 11:44, on Zulip):

but the current black_box is a weird "pretend the value escapes and can also be modified" all-in-one thing hence me suggesting to split it up

RalfJ (Feb 16 2019 at 11:58, on Zulip):

that might also make finding less contentious names easier

centril (Feb 16 2019 at 17:22, on Zulip):

@maksimsco pretend_use seems pretty good actually

maksimsco (Feb 16 2019 at 17:53, on Zulip):

@centril In my humble opinion anything other than black_box would be a path in a right direction, black_box is very misleading name IMO until you look in the code

centril (Feb 16 2019 at 17:54, on Zulip):

@maksimsco I mostly just want to ship the thing :P

maksimsco (Feb 16 2019 at 17:55, on Zulip):

It's a good thing :)

gnzlbg (Feb 22 2019 at 20:28, on Zulip):

@RalfJ when using asm! you can either say, this reads this register and uses it, or you can also say this reads the memory behind this pointer

gnzlbg (Feb 22 2019 at 20:28, on Zulip):

"*m"(&x) states that this reads the memory behind the pointer &x, which is just x

gnzlbg (Feb 22 2019 at 20:29, on Zulip):

if you want to state that the block also reads other memory, either you manually pass that as inputs to the block, or you can say clobber "memory"

gnzlbg (Feb 22 2019 at 20:29, on Zulip):

which means that all memory is read by the assembly block, and/or written to

gnzlbg (Feb 22 2019 at 20:30, on Zulip):

that's globals, memory reachable by pointers, etc. private locals that the block cannot know anything about, aren't affected by it

gnzlbg (Feb 22 2019 at 20:30, on Zulip):

and that's pretty much all the tools we have with asm!

gnzlbg (Feb 22 2019 at 20:31, on Zulip):

if you wanted to say, reads the memory behind some pointer, and all the memory behind all the pointers that might be there, you basically have to write asm!("" : : "*m"(&x), "*m(x.0)", ....) somehow (where x.0 is a ptr)

gnzlbg (Feb 22 2019 at 20:32, on Zulip):

for the Vec case, the pointer is of type *mut T and there are len elements behind it, so we somehow would need to generate a block that reads the memory of each of those T, from the knowledge that this is encoded in len

gnzlbg (Feb 22 2019 at 20:34, on Zulip):

so iff we had a way to finding all memory reachable from a pointer, then we could do this with asm! or other tools, but AFAICT we do not have a way of finding that out

RalfJ (Feb 22 2019 at 20:41, on Zulip):

hm okay, they are more limited than I thought then

RalfJ (Feb 22 2019 at 20:41, on Zulip):

I'd have expected something like a "read-only clobber" -- can read anything, but write nothing

RalfJ (Feb 22 2019 at 20:42, on Zulip):

(well, write whatever registers it gets assigned as writeable, but not anything else)

gnzlbg (Feb 22 2019 at 21:42, on Zulip):

yeah, that would be cool

gnzlbg (Feb 22 2019 at 21:43, on Zulip):

we could use that in both bench_input and bench_output

gnzlbg (Feb 22 2019 at 21:43, on Zulip):

i'm tending to think that the best is to propose black_box as is, and that we can implement bench_input and bench_output behind a feature gate, and maybe some day stabilize those and implement black_box on top of them

gnzlbg (Feb 22 2019 at 21:44, on Zulip):

black_box still addresses the use case of "please don't remove my benchmark", and if someone needs something more finer grained than that, while bench_input / _output might be enough, chances are that they won't

gnzlbg (Feb 22 2019 at 21:45, on Zulip):

like, playing with all the asm! options, LLVM accepts many of them and documents their semantics, but everything is full of disclaimers stating things like "LLVM will assume the worst and clobber everything"

gnzlbg (Feb 22 2019 at 21:46, on Zulip):

reading the lang refs sounds more like things are barely implemented to make clang correct, but that all code around asm! blocks is unnecessarily pessimized atm

Last update: Nov 20 2019 at 13:15UTC