Stream: t-lang/wg-unsafe-code-guidelines

Topic: Safe access to volatile memory


Alan Jeffrey (Aug 30 2019 at 22:55, on Zulip):

Hi everyone... long time no see... I've been writing some unsafe code, which I thiiiiink is safe, but as always who knows?

Alan Jeffrey (Aug 30 2019 at 22:57, on Zulip):

It's providing safe access to shared memory, which means that there could be attacker processes writing into the memory at arbitrary times, so taking &T access is definitely UB for some Ts (e,g. &usize into shared memory is right out).

Alan Jeffrey (Aug 30 2019 at 22:57, on Zulip):

I put all the unsafe code in one file, which will hopefully make it easier to nerd-snipe. https://github.com/asajeffrey/shared-data/blob/master/src/unsafe_code.rs

Alan Jeffrey (Aug 30 2019 at 22:59, on Zulip):

One issue I'm aware of is that if the attacker can write undef values into shared memory, then we have UB, if read_volatile can read back the undef.

Alan Jeffrey (Aug 30 2019 at 23:07, on Zulip):

But I think read_volatile can only produce defined values, not undef.

nikomatsakis (Aug 31 2019 at 00:04, on Zulip):

this seems related to the question of whether int's can have undefined bits

Tom Phinney (Aug 31 2019 at 00:41, on Zulip):

int semantics is relevant here. If only some of the bits are undefined, it's clearly a bit-string or a packed struct and not an int. However if all of the bits are undefined it could be anything of that size.

RalfJ (Aug 31 2019 at 07:02, on Zulip):

@Alan Jeffrey ah, you're opening that can of worms again ;)

RalfJ (Aug 31 2019 at 07:02, on Zulip):

there's a loooong discussion on the subject at https://github.com/rust-lang/unsafe-code-guidelines/issues/152

RalfJ (Aug 31 2019 at 07:03, on Zulip):

But I think read_volatile can only produce defined values, not undef.

in particular, for this part, I would like to agree -- but LLVM devs seem to think this is not worth stating, and I am reluctant to guarantee it for Rust if LLVM won't guarnatee it to us. See https://github.com/rust-lang/unsafe-code-guidelines/issues/152.

RalfJ (Aug 31 2019 at 07:04, on Zulip):

If LLVM had a freeze instruction we could just make the read_volatile intrinsic emit a load followed by a freeze and wouldnt not have to rely on LLVM specifying their stuff better... but alas, we've been waiting for freeze since forever

RalfJ (Aug 31 2019 at 07:06, on Zulip):

oh and then of course there are all the concerns about having anything freeze-like in Rust, those would also have to be resolved before we can guarantee this. see e.g. https://github.com/rust-lang/rfcs/pull/837

RalfJ (Aug 31 2019 at 07:08, on Zulip):

also I think treating AtomicBool as share-safe is plain wrong. After all, if this is shared with untrusted code, that code might write things to memory that are not a valid bool

Alan Jeffrey (Aug 31 2019 at 13:38, on Zulip):

@RalfJ you're not kidding, that is a long read!

Alan Jeffrey (Aug 31 2019 at 13:39, on Zulip):

In this case, there's two ways to deal with undef, either stop the Rust code from reading it, or make the attacker model one which can't write it.

Alan Jeffrey (Aug 31 2019 at 13:40, on Zulip):

It sounds like restricting the attacker model so they can't write undef is the easiest one to go for?

RalfJ (Aug 31 2019 at 13:41, on Zulip):

RalfJ you're not kidding, that is a long read!

yeah... we should get better at having summaries of such discussions. not enough time for all the things that need doing :/

RalfJ (Aug 31 2019 at 13:42, on Zulip):

It sounds like restricting the attacker model so they can't write undef is the easiest one to go for?

restricting the attacker is always the easier option, isn't it? ;)

RalfJ (Aug 31 2019 at 13:42, on Zulip):

But I think read_volatile can only produce defined values, not undef.

in particular, for this part, I would like to agree -- but LLVM devs seem to think this is not worth stating, and I am reluctant to guarantee it for Rust if LLVM won't guarnatee it to us. See https://github.com/rust-lang/unsafe-code-guidelines/issues/152.

sorry, that link should have been https://bugs.llvm.org/show_bug.cgi?id=42435

Alan Jeffrey (Aug 31 2019 at 13:44, on Zulip):

Also, on AtomicBool, I had a look at the source, and it does sanitize reads to make sure they're bools (e.g. read x:u8 and return x!=0) but it may be that future versions of AtomicBool will change that?

Alan Jeffrey (Aug 31 2019 at 13:49, on Zulip):

Goes and reads llvm docs... AFAICT if the attacker is any llvm program, then the attacker just spawns two threads, does concurrent writes into shared memory, and boom the shared memory is filled with undef. I think I'm going to ignore that :)

Alan Jeffrey (Aug 31 2019 at 13:50, on Zulip):

Interesting that llvm is waker than all the hardware models, which (afaik) don't have undef.

Alan Jeffrey (Aug 31 2019 at 14:01, on Zulip):

*weaker

RalfJ (Aug 31 2019 at 14:08, on Zulip):

Also, on AtomicBool, I had a look at the source, and it does sanitize reads to make sure they're bools (e.g. read x:u8 and return x!=0) but it may be that future versions of AtomicBool will change that?

If it's not in the docs it is certainly not guaranteed

RalfJ (Aug 31 2019 at 14:08, on Zulip):

also, even if we allow undef in ints, that sanitization would be UB if the value read is undef

RalfJ (Aug 31 2019 at 14:10, on Zulip):

Interesting that llvm is waker than all the hardware models, which (afaik) don't have undef.

true. but having such a form of "delayed UB value" is basically necessary in an optimizing IR. This paper has a fairly good explanation.

RalfJ (Aug 31 2019 at 14:11, on Zulip):

note: my undef above is Miri's undef; LLVM calls that poison.
LLVM's undef is a mess that not even LLVM optmizations use correctly, so I am going to pretend that the proposal in said paper is implemented, so LLVM no longer has undef, it just has poison.

Alan Jeffrey (Aug 31 2019 at 14:15, on Zulip):

Yes, for regular reads and writes, I can see why you need something like undef, but it's not obvious that's true for atomics.

Alan Jeffrey (Aug 31 2019 at 14:19, on Zulip):

BTW, does anyone know if I'm reinventing the wheel here? Is there a crate that already does safe access to shared memory?

RalfJ (Sep 15 2019 at 13:02, on Zulip):

Yes, for regular reads and writes, I can see why you need something like undef, but it's not obvious that's true for atomics.

atomics can be replaced by non-atomic accesses if the compiler can determine that they are only used in a single thread

RalfJ (Sep 15 2019 at 13:02, on Zulip):

so, we have to be able to deal with them yieldung undef/poison as well

RalfJ (Sep 15 2019 at 13:02, on Zulip):

@Alan Jeffrey ^

Alan Jeffrey (Sep 15 2019 at 14:31, on Zulip):

@RalfJ the compiler is allowed to replace atomics by non-atomics?????? Argh, is there a way to stop that?

Alan Jeffrey (Sep 15 2019 at 14:34, on Zulip):

In particular, for shared memory, unless the compiler is related to keep track of which memory may be shared with other processes?

Alan Jeffrey (Sep 15 2019 at 14:35, on Zulip):

(I'm guessing that is under-spec'd, since that seems to be the state of most relaxed models wrt shared memory.)

Josh Triplett (Sep 16 2019 at 05:02, on Zulip):

@RalfJ By what means can the compiler know that?

rkruppe (Sep 16 2019 at 07:53, on Zulip):

One possibilities is if the compiler can find out where the object being accesses atomically was allocated and determine that its address does not escape to any place where it might become available to other threads. (Or for a simpler version,, the address just isn't captured at all, period.)

gnzlbg (Sep 16 2019 at 08:13, on Zulip):

@rkruppe @RalfJ do we have a minimal example that shows this at work ?

rkruppe (Sep 16 2019 at 08:16, on Zulip):

IIRC LLVM does not actually do that currently, it's just allowed under as-if rule

gnzlbg (Sep 16 2019 at 08:21, on Zulip):

LLVM does completely eliminate atomic loads in some cases

gnzlbg (Sep 16 2019 at 08:33, on Zulip):

But even for something as simple as: https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=0f1e25e488323bd2d9be740397ce1095 no interesting optimizations happen (e.g. I'd expect that to be optimized to just exit(43).

RalfJ (Sep 16 2019 at 09:51, on Zulip):

(I'm guessing that is under-spec'd, since that seems to be the state of most relaxed models wrt shared memory.)

well, it's not a specs role to say "the compiler may do this or that transformation". the spec describes possible program behaviors, and then the compiler can do whatever it wants as long as it doesnt make the program do anything the spec doesn't let it do.

RalfJ (Sep 16 2019 at 09:53, on Zulip):

RalfJ By what means can the compiler know that?

trivial case:

fn foo() -> u32 {
  let x = AtomicU32::new(14);
  x.load(Ordering::SeqCst)
}

this can be optimized to just return 14. the way that happens is that the atomic-load can first become non-atomic (a non-trivial transformation since the compiler has to prove that this does not remove synchronizes-with edges!), and then it's plain load forwarding.

RalfJ (Sep 16 2019 at 09:53, on Zulip):

@Josh Triplett more generally speaking, if foo did a hole bunch of stuff but the compiler could tell that all accesses to x come from the same thread, it can just make them all non-atomic.

gnzlbg (Sep 16 2019 at 09:54, on Zulip):

trivial case:

I saw that case, but the moment I add a write, all optimizatons are disabled :/

RalfJ (Sep 16 2019 at 09:54, on Zulip):

(I'm guessing that is under-spec'd, since that seems to be the state of most relaxed models wrt shared memory.)

@Alan Jeffrey what is under-spec'd is the "open" case -- like, having a part of a program that gets composed with some other unknown code and then saying what the heck is going on. that's a genuinely hard research problem even for languages much simpler than C or Rust.

RalfJ (Sep 16 2019 at 09:54, on Zulip):

trivial case:

I saw that case, but the moment I add a write, all optimizatons are disabled :/

I am not talking about what LLVM does (I have no idea of the status there), but what it is allowed to do

RalfJ (Sep 16 2019 at 09:57, on Zulip):

RalfJ the compiler is allowed to replace atomics by non-atomics?????? Argh, is there a way to stop that?

de jure? no, if the compiler understands what your code is doing (or thinks it does...) then it can do whatever it wants as long as the code still does the same thing when viewed from the abstract machine. so, you'd need something like volatile to be able to "plug in" hardware-level reasoning. LLVM has "atomic volatile" operations, so maybe those would help here.
de facto, if the address of an atomic gets leaked to unknown code that is never optimized together (including via xLTO), the compiler has no way to prove that the atomic is not used from multiple threads, so it cannot remove the optimizations.
but really, if you want to reason outside of the abstract machine, volatile should come in. that is exactly what it is made for.

gnzlbg (Sep 16 2019 at 09:58, on Zulip):

@RalfJ writes here work too: https://rust.godbolt.org/z/m_iP1f

I am not talking about what LLVM does (I have no idea of the status there), but what it is allowed to do

I think we all agree that it is allowed to do these optimizations. My question was if it was specifically already doing them in some cases, since from @rkruppe comment I got the impression that it was not.

Alan Jeffrey (Sep 16 2019 at 15:35, on Zulip):

@RalfJ \

Alan Jeffrey (Sep 16 2019 at 15:35, on Zulip):

LLVM has "atomic volatile" operations,

Alan Jeffrey (Sep 16 2019 at 15:35, on Zulip):

does Rust have a way of using them?

gnzlbg (Sep 16 2019 at 15:38, on Zulip):

no

RalfJ (Sep 16 2019 at 16:57, on Zulip):

but we've had proposals before to make read_volatile and write_volatile atomic (with the weakest "unordered" ordering)

RalfJ (Sep 16 2019 at 16:57, on Zulip):

the issue here is tearing -- volatile accesses can tear, atomic accesses can not

RalfJ (Sep 16 2019 at 16:57, on Zulip):

the thing is, when you use volatile for MMIO you dont want tearing

RalfJ (Sep 16 2019 at 16:58, on Zulip):

so we need to solve this one way or another either way. just noone is pushing for a solution ATM.

gnzlbg (Sep 16 2019 at 18:44, on Zulip):

there was also the idea of providing intrinsics that are guaranteed not to tear

gnzlbg (Sep 16 2019 at 18:44, on Zulip):

since whether volatile load / stores tear in LLVM is... unspecified at best

gnzlbg (Sep 16 2019 at 18:44, on Zulip):

at least, i don't know of any way to query that

RalfJ (Sep 16 2019 at 18:45, on Zulip):

yes. seems like a natural choice then for the compiler to provide "unordered" atomic volatile intrinsics that cannot tear, and do the rest in libstd?

RalfJ (Sep 16 2019 at 18:46, on Zulip):

not sure what the status of "atomic memcpy/memset" is, but that could have a volatile variant as well (with tearing specified the same way as it is for the atomic variant)

RalfJ (Sep 16 2019 at 18:48, on Zulip):

there is https://github.com/rust-lang-nursery/compiler-builtins/pull/311

RalfJ (Sep 16 2019 at 18:48, on Zulip):

so the various memcpy/memset variants all explicitly specify their tearing. we could have the same for volatile. the only question then is how to implement the existing interface that works for any type.

RalfJ (Sep 16 2019 at 18:49, on Zulip):

in terms of avoiding API duplication, it might make sense to have Ordering::Volatile, so that all the existing atomic APIs can just also be used here.

gnzlbg (Sep 16 2019 at 18:52, on Zulip):

FWIW i wouldn't try to do that

gnzlbg (Sep 16 2019 at 18:52, on Zulip):

At least, until we find an use case for volatile operations with tearing

gnzlbg (Sep 16 2019 at 18:53, on Zulip):

I think it would be sufficient to provide the Ordering::UnorderedVolatile for those atomic operations that cannot tear, and have those be unordered+volatile

gnzlbg (Sep 16 2019 at 18:54, on Zulip):

On a platform where those would tear, they should just fail to compile, just like we do for atomics

gnzlbg (Sep 16 2019 at 18:55, on Zulip):

If someone finds out an use case with tearing, they can always at least emulate it on top of non-tearing accesses

gnzlbg (Sep 16 2019 at 18:55, on Zulip):

And we could add support for that in the future if we wanted

RalfJ (Sep 16 2019 at 18:57, on Zulip):

At least, until we find an use case for volatile operations with tearing

we have to support the current read_volatile/write_volatile methods, which in general require tearing

RalfJ (Sep 16 2019 at 18:57, on Zulip):

so we already do support volatile-with-tearing. when we get atomic-with-tearing, we might as well make it support volatile as well.

gnzlbg (Sep 16 2019 at 20:18, on Zulip):

atomic-with-tearing does not make sense to me

gnzlbg (Sep 16 2019 at 20:18, on Zulip):

atomic is the opposite of tearing

gnzlbg (Sep 16 2019 at 20:19, on Zulip):

as in, either an operation is atomic, or it tears

gnzlbg (Sep 16 2019 at 20:19, on Zulip):

but it can't be both at the same time

Alan Jeffrey (Sep 16 2019 at 21:10, on Zulip):

Do we need SeqCstVolatile, AqVolatile, RelVolatile and AqRelVolatile too? We'll need something stronger than UnorderedVolatile if we want happens-before to propogate between processes as well as threads.

Alan Jeffrey (Sep 16 2019 at 21:11, on Zulip):

Or alternatively, can we make SeqCst use VolatileSeqCst under the hood? (and ditto AqRel and friends)

gnzlbg (Sep 17 2019 at 09:37, on Zulip):

Or alternatively, can we make SeqCst use VolatileSeqCst under the hood? (and ditto AqRel and friends)

I don't think so. This would disable pretty much all optimizations on atomic operations.

gnzlbg (Sep 17 2019 at 09:37, on Zulip):

Do we need SeqCstVolatile, AqVolatile, RelVolatile and AqRelVolatile too? We'll need something stronger than UnorderedVolatile if we want happens-before to propogate between processes as well as threads.

Probably.

gnzlbg (Sep 17 2019 at 09:37, on Zulip):

Volatile is orthogonal to the ordering

gnzlbg (Sep 17 2019 at 09:38, on Zulip):

and just means that the load / store has unobservable side-effects and must happen just as the code was written (e.g. two subsequent relaxed loads can be merged, but two subsequent volatile relaxed loads cannot).

Alan Jeffrey (Sep 17 2019 at 13:40, on Zulip):

@gnzlbg Yeah, I was mainly thinking about SeqCst. I don't think there are many valid optimizations we'd lose, the ones I know of are the ones which need (e.g.) escape analysis to ensure no concurrent access.

Alan Jeffrey (Sep 17 2019 at 13:41, on Zulip):

The reason for asking is that I'm not sure at this late date that we'll be able to add any more cases to Ordering, since that'd be a breaking change.

rkruppe (Sep 17 2019 at 13:50, on Zulip):

Many of the optimizations discussed in https://jfbastien.github.io/no-sane-compiler/#/32 are valid for SeqCst and don't need escape analysis

RalfJ (Sep 18 2019 at 21:06, on Zulip):

as in, either an operation is atomic, or it tears

well there are atomic menset and atomic memcpy. so... I think you are wrong.

RalfJ (Sep 18 2019 at 21:07, on Zulip):

We'll need something stronger than UnorderedVolatile if we want happens-before to propogate between processes as well as threads.

what synchronization do you want to propagate to untrusted code? that's a first.

comex (Sep 18 2019 at 23:07, on Zulip):

not a first :) https://internals.rust-lang.org/t/add-volatile-operations-to-core-x86-64/10480/18?u=comex

RalfJ (Sep 19 2019 at 06:53, on Zulip):

well see my reply there. I dont see why this would need volatile.

RalfJ (Sep 19 2019 at 06:54, on Zulip):

volatile is for "I want to reason based on assembly-level memory accesses"

Alan Jeffrey (Oct 02 2019 at 16:12, on Zulip):

Oh, and more fun, AFAICT there's no way to stop another process from truncating your shared memory file at any point, causing a SIGBUS if you access it. https://github.com/asajeffrey/shared-data/issues/7

Alan Jeffrey (Oct 02 2019 at 16:17, on Zulip):

So is there any way to use shared memory safely in Rust? Maaaaybe by putting down a SIGBUS signal handler that tears down the current thread, though I suspect this would mean not even running panic or drop code.\

nagisa (Oct 02 2019 at 17:15, on Zulip):

You are not guaranteed any particular signal, I think.

nagisa (Oct 02 2019 at 17:16, on Zulip):

on linux you want a sealed memfd

nagisa (Oct 02 2019 at 17:17, on Zulip):

specifically you can prevent memfd from being resized by placing a seal on it.

nagisa (Oct 02 2019 at 17:18, on Zulip):

reading manpages, it seems like you should be able to seal regular files too

nagisa (Oct 02 2019 at 17:19, on Zulip):

Ah no, you cannot:

Currently, file seals can be applied only to a file descriptor returned by memfd_create(2) (if the MFD_ALLOW_SEALING was employed). On other filesystems, all fcntl() operations that operate on seals will return EINVAL.

Alan Jeffrey (Oct 03 2019 at 15:42, on Zulip):

@nagisa Yeah, on Linux it seems like sealed memfd is the way to go; I wonder what the equivalent is on other OSs?

nagisa (Oct 03 2019 at 15:46, on Zulip):

Not sure, I haven’t looked in what options there are elsewhere; I bet on Windows you can just do an exclusive lock for that kind of operation on a filesystem file as well.

Alan Jeffrey (Oct 03 2019 at 15:49, on Zulip):

@nagisa yes, though it depends on the permission system as to whether you can hand out write permission but not truncate permission.

nagisa (Oct 03 2019 at 15:51, on Zulip):

On BSDs you might be able to set a append-only flag: https://man.openbsd.org/FreeBSD-11.1/chflags.2, though not sure if that allows mutating the already written data.

Alan Jeffrey (Oct 03 2019 at 15:53, on Zulip):

Yes, but that makes the current contents immutable :(

nagisa (Oct 03 2019 at 15:54, on Zulip):

But coming back to the original question: I would indeed not consider it to be a part of attacker model. What hebaviour you get here seems dependent on the kernel that the code is running on too

nagisa (Oct 03 2019 at 15:54, on Zulip):

i.e. linux may guarantee a sigbus, but I can imagine a bad kernel just returning garbage :slight_smile:

Alan Jeffrey (Oct 03 2019 at 15:55, on Zulip):

@nagisa yeah, we're into :shrug: territory here I suspect, about when you're allowed to mark access to shared memory safe.

nagisa (Oct 03 2019 at 15:55, on Zulip):

So it would be out-of-scope for unsafe-code-guidelines past what is already specified (e.g. reading unallocated memory is UB)

Alan Jeffrey (Oct 03 2019 at 15:56, on Zulip):

@nagisa nondeterministic values are fine, as long as they're not poison, or another form of UB.

nagisa (Oct 03 2019 at 15:57, on Zulip):

Well this is a volatile read, so you won’t get poison, but you could read out 42 as a bool :slight_smile:

nagisa (Oct 03 2019 at 15:57, on Zulip):

Which I guess is the same concern you need to handle when dealing with any shared memory anyway

Alan Jeffrey (Oct 03 2019 at 15:59, on Zulip):

@nagisa 42 is fine, there's a lot of jumping through hoops to make sure that any time you construct a &T into shared memory that it's OK, e.g. &AtomicUSize but not &bool.

Alan Jeffrey (Oct 03 2019 at 16:00, on Zulip):

the thing I don't know how to handle safely is the shared memory being truncated :frown:

Alan Jeffrey (Oct 03 2019 at 16:00, on Zulip):

https://github.com/asajeffrey/shared-data

Josh Triplett (Oct 04 2019 at 06:53, on Zulip):

In general I don't think you can make that safe on all systems. Certainly not on general POSIX. And even on Linux you can't count on having memfd.

Josh Triplett (Oct 04 2019 at 06:56, on Zulip):

That said, I don't think that's unsafe. It might raise a signal, but it won't generate UB.

Josh Triplett (Oct 04 2019 at 06:57, on Zulip):

So I think you should just ignore it. (You can recommend memfd though.)

Josh Triplett (Oct 04 2019 at 12:52, on Zulip):

So I think you should just ignore it. (You can recommend memfd though.)

nagisa (Oct 04 2019 at 13:19, on Zulip):

Is behaviour of reading mapped pages from previously-mapped area of the file that has been since truncated specified for all posixes?

nagisa (Oct 04 2019 at 13:20, on Zulip):

(as in, behaviour of "you will get a signal" and not "unspecified/undefined")

Josh Triplett (Oct 04 2019 at 13:33, on Zulip):

I've never heard of one that has undefined behavior.

Josh Triplett (Oct 04 2019 at 13:33, on Zulip):

You might check the spec.

Josh Triplett (Oct 04 2019 at 13:39, on Zulip):

The Single Unix Specification v2 specifies that behavior.

Josh Triplett (Oct 04 2019 at 13:39, on Zulip):

I think you can safely rely on it.

Josh Triplett (Oct 04 2019 at 13:42, on Zulip):

Ah, hang on...

Josh Triplett (Oct 04 2019 at 13:43, on Zulip):

So, it specifies that references past the end will generate SIGBUS.

Josh Triplett (Oct 04 2019 at 13:43, on Zulip):

But:

Josh Triplett (Oct 04 2019 at 13:43, on Zulip):

If the size of the mapped file changes after the call to mmap() as a result of some other operation on the mapped file, the effect of references to portions of the mapped region that correspond to added or removed portions of the file is unspecified.

RalfJ (Oct 09 2019 at 13:25, on Zulip):

Well this is a volatile read, so you won’t get poison, but you could read out 42 as a bool :)

@nagisa we hope you don't but LLVM doesn't want to guarantee that :(

RalfJ (Oct 09 2019 at 13:26, on Zulip):

@Alan Jeffrey regarding SIGBUS, one tricky bit here is that the compiler might have license to insert spurious accesses (e.g. if you had a reference into that memory) so you might get SIGBUS even if your code didn't access that memory

Alan Jeffrey (Oct 09 2019 at 15:03, on Zulip):

Indeed. I don't think Atomic* makes any guarantees that it won't introduce extra reads, just that the accesses in your program will be preserved (up to the appropriate Ordering).

RalfJ (Oct 09 2019 at 15:18, on Zulip):

no it doesnt even guarantee that anything is preserved

RalfJ (Oct 09 2019 at 15:18, on Zulip):

it makes no guarantees in terms of the final assembly memory pattern, only volatile does

RalfJ (Oct 09 2019 at 15:18, on Zulip):

"atomic" just guarantees something about the behavior observable inside the Rust program

RalfJ (Oct 09 2019 at 15:19, on Zulip):

and with a whole-program analysis, that might well mean optimizing away memory accesses entirely, demoting atomic to non-atomic, or whatever

Alan Jeffrey (Oct 09 2019 at 15:40, on Zulip):

@RalfJ I was brushing a lot of stuff under the carpet with "up to the appropriate Ordering" :)

Alan Jeffrey (Oct 09 2019 at 15:41, on Zulip):

and yes, it's only giving guarantees to other threads, not other processes.

gnzlbg (Oct 10 2019 at 05:38, on Zulip):

“to other threads of execution”

gnzlbg (Oct 10 2019 at 05:42, on Zulip):

Something like:

‘’’rust
fn main() {
extern “C” {
static PTR: *mut u8;
}
let x = atomic_load(PTR, Relaxed);
let y = atomic_load(PTR, Relaxed);
assert_eq!(x, y); // Might fail
}
‘’’

gnzlbg (Oct 10 2019 at 05:43, on Zulip):

can fail if there is a second thread of execution mutating the memory behind PTR

gnzlbg (Oct 10 2019 at 05:44, on Zulip):

If we want to be able to support multiple processes Rust cannot assume that won’t happen

gnzlbg (Oct 10 2019 at 05:46, on Zulip):

If Rust can assume that won’t happen, volatile won’t help you here anyways

RalfJ (Oct 10 2019 at 08:21, on Zulip):

can fail if there is a second thread of execution mutating the memory behind PTR

there isnt, though, as we can see the entire program

RalfJ (Oct 10 2019 at 08:22, on Zulip):

If Rust can assume that won’t happen, volatile won’t help you here anyways

yes volatile will help precise because it applies assembly-level semantics instead of Rust-level semantics

gnzlbg (Oct 10 2019 at 16:27, on Zulip):

there isnt, though, as we can see the entire program

If two processes are working on inter-process shared memory, Rust doesn't see all the changes to that memory

gnzlbg (Oct 10 2019 at 16:28, on Zulip):

In the program I have above, PTR address is opaque to Rust, and it could point to inter-process shared memory, so other processes might be modifying it concurrently

gnzlbg (Oct 10 2019 at 16:30, on Zulip):

It should be enough for all processes to use atomics to synchronize reads and write without having to reach for volatile

gnzlbg (Oct 10 2019 at 16:31, on Zulip):

And using volatile would actually prevent optimizations that are sound for this type of program, so it isn't a good solution either.

gnzlbg (Oct 10 2019 at 16:32, on Zulip):

So IMO the problematic issue here is the incorrect assumption that, if the Rust compiler does not see a program creating a second thread of execution, such second thread of execution does not exist

gnzlbg (Oct 10 2019 at 16:36, on Zulip):

This assumption would also break signal handling, e.g., if only the threads of execution that are created by the program exist, then only they can raise signals, so if none of them do, then no signals can be raised and we could optimize appropriately.

RalfJ (Oct 10 2019 at 18:35, on Zulip):

there isnt, though, as we can see the entire program

If two processes are working on inter-process shared memory, Rust doesn't see all the changes to that memory

there's no code to set up shared memory so I think it is fair game for rustc to assume there's no such thing

RalfJ (Oct 10 2019 at 18:35, on Zulip):

if you'd call unknown code somewhere -- like, make a single syscall -- the game would be very different

RalfJ (Oct 10 2019 at 18:35, on Zulip):

This assumption would also break signal handling, e.g., if only the threads of execution that are created by the program exist, then only they can raise signals, so if none of them do, then no signals can be raised and we could optimize appropriately.

yeah signals suck. I am not aware of any formal model of them.^^

RalfJ (Oct 10 2019 at 18:36, on Zulip):

but to e clear, the only memory that might be shared here is PTR, right? x and y are definitely private and we can do whatever we want

RalfJ (Oct 10 2019 at 18:38, on Zulip):

so we could imagine having a clause that says that the compiler always needs to assume that there might be other threads in the Abstract Machine, even if it can see main. like, the initial state could consist of more than just running main. That seems reaosnable, but I am not sure if LLVM implements this correctly.

Lokathor (Oct 10 2019 at 22:14, on Zulip):

PTR can have been placed by the linker. And it can have been placed at an MMIO address for example. However, since they are atomic loads and not volatile loads, my understanding is that LLVM is free to make x and y both share the value of a single load.

gnzlbg (Oct 11 2019 at 07:13, on Zulip):

there's no code to set up shared memory so I think it is fair game for rustc to assume there's no such thing

I don't think that's required ? The OS kernel can set that for you and, e.g., it can share part of the kernel address space with that of a process when that process is launched and put an address to it in a static somewhere (e.g. in the auxiliary vectors of an ELF binary, etc.).

but to e clear, the only memory that might be shared here is PTR, right? x and y are definitely private and we can do whatever we want

Yes, I think that's a reasonable restriction. In practice, this isn't necessarily the case, e.g., a debugger or a profiler might want to inspect what a running thread is doing behind its back, including inspecting its stack, or maybe even modify it, but I think its reasonable for Rust to assume this never happens. If that happens, we provide no guarantees.

so we could imagine having a clause that says that the compiler always needs to assume that there might be other threads in the Abstract Machine, even if it can see main. like, the initial state could consist of more than just running main. That seems reaosnable, but I am not sure if LLVM implements this correctly.

Yes, such a clause is what I had in mind.

gnzlbg (Oct 14 2019 at 10:02, on Zulip):

One big concern here is reading undef, but even if we added a dedicated intrinsic to Rust, we still couldn't prevent that -- LLVM quite simply does not provide the required primitives to avoid undef here.

@RalfJ LLVM provides volatile inline assembly blocks

gnzlbg (Oct 14 2019 at 10:02, on Zulip):

LLVM probably doesn't guarantee that you can use them to turn undef into "not undef", but... it wouldn't not work either

gnzlbg (Oct 14 2019 at 10:03, on Zulip):

That particular implementation strategy would just have "stronger" semantics than what we guarantee for those intrinsics

gnzlbg (Oct 14 2019 at 10:04, on Zulip):

(e.g. atomic loads can be re-ordered around volatile loads, but if we were to use such an approach, they probably wouldn't be)

Hadrien Grasland (Oct 14 2019 at 12:59, on Zulip):

@gnzlbg You have to admit though, that it would be a shame to resolve the issue in an architecture-specific way (and therefore once per architecture), when all it would take to resolve the problem cleanly is for LLVM to close what is admittedly a spec hole that makes "their" volatile less well-defined than the underlying hardware loads and stores.

Hadrien Grasland (Oct 14 2019 at 13:00, on Zulip):

The purpose of undefined behavior is to enable optimizations of higher-level languages. The purpose of volatile is to disable optimizations and defer to hardware semantics. Therefore, the two are fundamentally at odds, and volatile should only have the minimal possible amount of UB to enable optimization of surrounding non-volatile code.

Hadrien Grasland (Oct 14 2019 at 13:03, on Zulip):

Besides, if "LLVM doesn't guarantee it but it wouldn't not work either" is considered enough, using today's volatile is arguably enough for many purposes ;)

gnzlbg (Oct 14 2019 at 14:32, on Zulip):

when all it would take to resolve the problem cleanly and portably is for LLVM to close what is admittedly a spec hole

That looks like quite a bit of work to me.

gnzlbg (Oct 14 2019 at 14:33, on Zulip):

There is an LLVM bug open, and nobody appears to think that such hole is worth fixing, so if somebody wanted to fix it they would probably need to achieve consensus on what the fix is, implement it, and land it.

Hadrien Grasland (Oct 14 2019 at 14:39, on Zulip):

That is indeed quite a bit of work. However, the same amount of work would be needed to guarantee that inline assembly loads are not allowed to return undef.

gnzlbg (Oct 14 2019 at 14:41, on Zulip):

My point was that we only have to guarantee that at the Rust language level

Hadrien Grasland (Oct 14 2019 at 14:41, on Zulip):

Therefore, in my view...

- If we want to fix that hole, we need LLVM's help, one way or another.
- If we are satisfied with a "should work, but isn't guaranteed to" statu quo, volatile is as good as inline assembly.

gnzlbg (Oct 14 2019 at 14:42, on Zulip):

I'm of the opinion that we should provide whatever guarantees we want for Rust

Hadrien Grasland (Oct 14 2019 at 14:42, on Zulip):

If the inline assembly gets into LLVM's hands (and I think it does eventually), then we need LLVM's help too.

gnzlbg (Oct 14 2019 at 14:43, on Zulip):

On paper, sure. In practice, I'm fine with opening an issue about that, and punting how to fix it when the first user provides an example of a miscompilation

Hadrien Grasland (Oct 14 2019 at 14:43, on Zulip):

I'm of the opinion that we should provide whatever guarantees we want for Rust

I would agree with you in an ideal world. But as long as rustc uses LLVM as a backend, we are not living in such an ideal world.

gnzlbg (Oct 14 2019 at 14:43, on Zulip):

We don't have to use inline assembly, we could use global_asm!

gnzlbg (Oct 14 2019 at 14:43, on Zulip):

or the linker to call some assembly blob, or ..

Hadrien Grasland (Oct 14 2019 at 14:43, on Zulip):

And writing hardware-specific backends is an awful lot of work.

gnzlbg (Oct 14 2019 at 14:44, on Zulip):

If we put the intrinsics in core::arch, we can implement them as users need them

Hadrien Grasland (Oct 14 2019 at 14:44, on Zulip):

AFAIK, non-inline assembly means one CALL per volatile load or store, which seems too much.

gnzlbg (Oct 14 2019 at 14:45, on Zulip):

I don't think its worth it to invest too much time to support, e.g., s390x, when no user might ever need that there

gnzlbg (Oct 14 2019 at 14:45, on Zulip):

AFAIK, non-inline assembly means one CALL per volatile load or store, which seems too much.

Such is life if inline assembly turns out to not be enough.

gnzlbg (Oct 14 2019 at 14:46, on Zulip):

The point is, we can enforce Rust semantics with a hammer if needed be. Right now, we don't have to because inline assembly would work, and maybe even read_volatile

Hadrien Grasland (Oct 14 2019 at 14:46, on Zulip):

Sure, but just supporting "mainstream" platforms already means supporting x86, x86_64, infinitely many dialects of ARM and POWER, RISC-V, WASM...

gnzlbg (Oct 14 2019 at 14:46, on Zulip):

We have asm tests in rustc that check the assembly that some code generates, we can add them for these intrinsics

gnzlbg (Oct 14 2019 at 14:46, on Zulip):

@Hadrien Grasland if you want to invest time in implementing the codegen for all those platforms, or to fix LLVM, go for it

gnzlbg (Oct 14 2019 at 14:47, on Zulip):

For me, x86_64 is enough

gnzlbg (Oct 14 2019 at 14:47, on Zulip):

And if some day all platforms support these, we can even implement generic ones on top

Hadrien Grasland (Oct 14 2019 at 14:47, on Zulip):

But I think we are in violent agreement here regarding what concretely needs to be done.

gnzlbg (Oct 14 2019 at 14:47, on Zulip):

Yes, I think we are kind of also in agreement of what the ideal implementation would be

gnzlbg (Oct 14 2019 at 14:47, on Zulip):

(LLVM fixing their bug)

gnzlbg (Oct 14 2019 at 14:48, on Zulip):

But that's not going to happen on its own, and I don't think waiting for that fix should block adding a way to solve these problems in Rust

Hadrien Grasland (Oct 14 2019 at 14:48, on Zulip):
  1. Specify the semantics that we want at the Rust level.
  2. Implement it via LLVM's atomic volatile.
  3. Add checks that the generated ASM is correct on various archs.
  4. Postpone any other work, including arch-specific volatile loads and stores, until we actually have a problem that this solves.
gnzlbg (Oct 14 2019 at 14:49, on Zulip):

One example is SIMD, x86_64 is good enough for 99% of our current users, some ARM SIMD intrinsics are ok for the rest, MIPS intrinsics for ~1 user, and PPC intrinsics for ~1 user

gnzlbg (Oct 14 2019 at 14:49, on Zulip):

The users that cared about MIPS, ARM, and PPC, invested their time into implementing them

Hadrien Grasland (Oct 14 2019 at 14:49, on Zulip):

That's short-sighted. Many supercomputing centers use POWER CPUs..

Hadrien Grasland (Oct 14 2019 at 14:50, on Zulip):

And ARM SVE is also getting a lot of attention from that community.

gnzlbg (Oct 14 2019 at 14:50, on Zulip):

So? If they want to use them, they should implement them, or pay someone that does

gnzlbg (Oct 14 2019 at 14:50, on Zulip):

That's how open source works

gnzlbg (Oct 14 2019 at 14:50, on Zulip):

We are not making supporting those platforms impossible

gnzlbg (Oct 14 2019 at 14:50, on Zulip):

We are providing a trivial way to supporting those platforms

gnzlbg (Oct 14 2019 at 14:51, on Zulip):

For anybody interested

gnzlbg (Oct 14 2019 at 14:51, on Zulip):

For this particular case, implementing two intrinsics for any platform using inline assembly should be trivial

Hadrien Grasland (Oct 14 2019 at 14:52, on Zulip):

My point is, you're introducing extra rustc porting work for every new platform just to solve a problem which we're not sure we actually have.

Hadrien Grasland (Oct 14 2019 at 14:52, on Zulip):

And it's not even guaranteed to resolve said problem in any meaningful way.

gnzlbg (Oct 14 2019 at 14:52, on Zulip):

Not really

gnzlbg (Oct 14 2019 at 14:53, on Zulip):

Adding a new target to rustc doesn't require adding those intrinsics

gnzlbg (Oct 14 2019 at 14:53, on Zulip):

E.g. ARM does not support x86 SIMD intrinsics

Hadrien Grasland (Oct 14 2019 at 14:53, on Zulip):

Sure. But there's nothing x86-specific about volatile loads and stores.

Hadrien Grasland (Oct 14 2019 at 14:53, on Zulip):

So it doesn't belong in core::arch imo.

gnzlbg (Oct 14 2019 at 14:54, on Zulip):

No, but since LLVM doesn't guarantee any useful semantics, we can't provide them in a portable way either that's guaranteed to work

gnzlbg (Oct 14 2019 at 14:54, on Zulip):

So either we don't provide them, or provide intrinsics that are not guaranteed to work, or provide target-dependent intrinsics that are guaranteed to work

gnzlbg (Oct 14 2019 at 14:55, on Zulip):

Right now we provide intrinsics that are not guaranteed to work, and people want more

Hadrien Grasland (Oct 14 2019 at 14:56, on Zulip):

We can also provide arch-agnostic intrinsics that are guaranteed to work at the spec level, implement them the LLVM way, and wait for bug reports proving that it doesn't work before going for anything more complicated.

Hadrien Grasland (Oct 14 2019 at 14:56, on Zulip):

I mean, we're already relying on a lot of undocumented LLVM semantics.

gnzlbg (Oct 14 2019 at 14:57, on Zulip):

Sure.

gnzlbg (Oct 14 2019 at 14:57, on Zulip):

But then we are back to "volatile are hardware semantics", "native types", and all other target-dependent stuff

gnzlbg (Oct 14 2019 at 14:58, on Zulip):

We'd need to document the guarantees in a target-dependent way

Hadrien Grasland (Oct 14 2019 at 14:58, on Zulip):

Sure. Volatile is an arch-agnostic interface with arch-specific semantics.

Hadrien Grasland (Oct 14 2019 at 14:59, on Zulip):

We don't need to document the semantics ourselves, they are in each architecture's respective documentation.

gnzlbg (Oct 14 2019 at 14:59, on Zulip):

We need to guarantee that loads / stores won't tear

Hadrien Grasland (Oct 14 2019 at 14:59, on Zulip):

All we need to do is to defeat the stupid spec hole that allows a wide volatile access to be split into multiple narrow ones.

Hadrien Grasland (Oct 14 2019 at 14:59, on Zulip):

And that can be done just by switching to atomic volatile.

gnzlbg (Oct 14 2019 at 14:59, on Zulip):

no, that's not enough

Hadrien Grasland (Oct 14 2019 at 15:00, on Zulip):

Why?

gnzlbg (Oct 14 2019 at 15:00, on Zulip):

if you then try to use an atomic volatile operation for a platform that doesn't have an appropriate instruction, you'll get at best an LLVM assertion

gnzlbg (Oct 14 2019 at 15:00, on Zulip):

more likely a "instruction fails to select"-type LLVM error, or a segfault]

Hadrien Grasland (Oct 14 2019 at 15:00, on Zulip):

I think I have a way to resolve that.

gnzlbg (Oct 14 2019 at 15:01, on Zulip):

at worst, LLVM will just ignore the atomic and emit multiple instructions that tear

gnzlbg (Oct 14 2019 at 15:01, on Zulip):

so we need to encode in the front-end all that information for each target

Hadrien Grasland (Oct 14 2019 at 15:01, on Zulip):

It's not even a very original one, someone already suggested it before. But it needs to be bikeshedded to death by the RFC process ;)

Hadrien Grasland (Oct 14 2019 at 15:02, on Zulip):

The way I propose to resolve this is to add AtomicXyz::load_volatile(self_: *const Self, o: Ordering) and AtomicXyz::store_volatile(self_: *const Self, o: Ordering).

gnzlbg (Oct 14 2019 at 15:02, on Zulip):

If I can call such an intrinsic for a 64-bit load for some target, but that fails to compile for another target, then that intrinsic isn't "portable" either

gnzlbg (Oct 14 2019 at 15:02, on Zulip):

How is that any different from adding core::arch intrinsics ?

gnzlbg (Oct 14 2019 at 15:02, on Zulip):

its just different "in syntax", but the atomic types are not available on all targets

gnzlbg (Oct 14 2019 at 15:02, on Zulip):

each target exposes different subsets of the atomic types, or even none

Hadrien Grasland (Oct 14 2019 at 15:03, on Zulip):

This is being fixed.

gnzlbg (Oct 14 2019 at 15:03, on Zulip):

How?

gnzlbg (Oct 14 2019 at 15:03, on Zulip):

They are implemented in libcore, so they can't be emulated

gnzlbg (Oct 14 2019 at 15:03, on Zulip):

Do they then fail at run-time ?

Hadrien Grasland (Oct 14 2019 at 15:03, on Zulip):

By adding support for platforms which have only atomic loads and stores.

gnzlbg (Oct 14 2019 at 15:03, on Zulip):

?

Hadrien Grasland (Oct 14 2019 at 15:04, on Zulip):

(Which are not actually atomic, but anyhow)

gnzlbg (Oct 14 2019 at 15:04, on Zulip):

Where is that being done ?

gnzlbg (Oct 14 2019 at 15:04, on Zulip):

Adding atomic types for platforms for which their semantics aren't atomic sounds like a bad idea

gnzlbg (Oct 14 2019 at 15:04, on Zulip):

Not even C++ does that

Hadrien Grasland (Oct 14 2019 at 15:04, on Zulip):

https://github.com/rust-lang/rust/pull/65214

gnzlbg (Oct 14 2019 at 15:05, on Zulip):

That's not what I was referring to

gnzlbg (Oct 14 2019 at 15:05, on Zulip):

That's just conditionally making different atomic operations available depending on the target

gnzlbg (Oct 14 2019 at 15:05, on Zulip):

some targets have 0 operations available

Hadrien Grasland (Oct 14 2019 at 15:05, on Zulip):

Sorry, that was wrong.

gnzlbg (Oct 14 2019 at 15:05, on Zulip):

Code that uses Atomic_ isn't portable

Hadrien Grasland (Oct 14 2019 at 15:05, on Zulip):

some targets have 0 operations available

Sure about that?

gnzlbg (Oct 14 2019 at 15:06, on Zulip):

For any example you give me, I can give you a --target that will fail to compile

gnzlbg (Oct 14 2019 at 15:06, on Zulip):

Sure about that?

Yes, for example, nvptx64-nvidia-cuda

Hadrien Grasland (Oct 14 2019 at 15:06, on Zulip):

Not even relaxed loads and stores of AtomicUsize?

Hadrien Grasland (Oct 14 2019 at 15:06, on Zulip):

That's plain broken, as CUDA kernels can actually have atomic operations inside...

gnzlbg (Oct 14 2019 at 15:07, on Zulip):

Yes, but not with C11 semantics

gnzlbg (Oct 14 2019 at 15:07, on Zulip):

they have different atomic operations, with different semantics, that do not match the orderings that we have

Hadrien Grasland (Oct 14 2019 at 15:07, on Zulip):

sigh one more argument for adding unordered, I guess.

gnzlbg (Oct 14 2019 at 15:07, on Zulip):

There are proposals for trying to bring C11 atomics to nvidia GPUs though

Hadrien Grasland (Oct 14 2019 at 15:07, on Zulip):

Anyhow.

Hadrien Grasland (Oct 14 2019 at 15:08, on Zulip):

You started this discussion by saying you wanted a solution for mainstream platforms, not obscure niche.

Hadrien Grasland (Oct 14 2019 at 15:08, on Zulip):

Now we're definitely getting into obscure niche territory.

gnzlbg (Oct 14 2019 at 15:08, on Zulip):

Does i386 have atomics ?

Hadrien Grasland (Oct 14 2019 at 15:08, on Zulip):

Yes

gnzlbg (Oct 14 2019 at 15:08, on Zulip):

we also have at least one i386 target

Hadrien Grasland (Oct 14 2019 at 15:08, on Zulip):

Otherwise you couldn't implement a mutex on it.

Hadrien Grasland (Oct 14 2019 at 15:10, on Zulip):

IMO, getting volatile to work as intended on every platform that has Relaxed atomic loads and stores would already be a reasonable achievement.

Hadrien Grasland (Oct 14 2019 at 15:11, on Zulip):

And I could live with having arch-specific abstractions for the other platforms until either 1/they become C11-compliant or 2/we add weaker atomic orderings that match their semantics.

gnzlbg (Oct 14 2019 at 15:12, on Zulip):

IMO, getting volatile to work as intended on every platform that has Relaxed atomic loads and stores would already be a reasonable achievement.

This should be doable for those platforms

Hadrien Grasland (Oct 14 2019 at 15:14, on Zulip):

I think this would match the spirit of what you're trying to do with x86-specific volatile operations, but in a way that can reach basically all mainstream platforms targeted by Rust.

Hadrien Grasland (Oct 14 2019 at 15:14, on Zulip):

Without custom assembly, CALL overhead on each volatile operation, etc.

gnzlbg (Oct 14 2019 at 15:14, on Zulip):

oh, check this out: https://www.youtube.com/watch?v=VogqOscJYvk

gnzlbg (Oct 14 2019 at 15:14, on Zulip):

I haven't watched it, but maybe they manage to make C++ atomics work with cuda already?

Hadrien Grasland (Oct 14 2019 at 15:17, on Zulip):

IIRC, the problem is that GPUs are not cache-coherent.

Hadrien Grasland (Oct 14 2019 at 15:17, on Zulip):

So they miss one guarantee from C11, which is that all threads must observe stores to a given memory location as occuring in the same order.

Hadrien Grasland (Oct 14 2019 at 15:18, on Zulip):

Unless fences are used to flush them.

Hadrien Grasland (Oct 14 2019 at 15:19, on Zulip):

They could probably be modeled by LLVM's unordered atomics, but well... I think we've established that someone must do a PhD on them before they're accepted into the Rust memory model.

gnzlbg (Oct 14 2019 at 15:19, on Zulip):

yes, just like POWER, GPUs share store buffers between threads

gnzlbg (Oct 14 2019 at 15:19, on Zulip):

or at least nvidia ones do

gnzlbg (Oct 14 2019 at 15:20, on Zulip):

so multiple stores can be viewed by other threads in the wrong orders

Hadrien Grasland (Oct 14 2019 at 15:20, on Zulip):

They have store coherence on a given Compute Unit, but not across CUs, I believe.

gnzlbg (Oct 14 2019 at 15:20, on Zulip):

yes

gnzlbg (Oct 14 2019 at 15:21, on Zulip):

I think we've established that someone must do a PhD on them before they're accepted into the Rust memory model.

More like 5 PhDs should build up an operational semantics for Rust using atomics, and then another 5 PhDs should extend that with further useful orderings

gnzlbg (Oct 14 2019 at 15:22, on Zulip):

That was @RalfJ plan all along

Hadrien Grasland (Oct 14 2019 at 15:22, on Zulip):

So barring that, NVidia providing a way to implement C11 Relaxed on top of nvptx, even if with suboptimal efficiency, sounds like a more reasonable short-term plan.

Hadrien Grasland (Oct 14 2019 at 15:24, on Zulip):

Possibly combined with NVPTX-specific unordered atomic intrinsics with a big warning attached to them.

gnzlbg (Oct 14 2019 at 15:24, on Zulip):

Maybe they did already, that cppcon video is a couple of days old

Hadrien Grasland (Oct 14 2019 at 15:25, on Zulip):

"We have no formal Rust semantics for these and they will eat your laundry if you try to use them to synchronize Rust memory operations. See the CUDA Programming Guide to learn more about their hardware-level semantics."

Hadrien Grasland (Oct 14 2019 at 15:29, on Zulip):

But thanks, that's actually useful material for one question which I'm currently asking myself as part of my nefarious plan for soft-deprecating non-atomic volatile.

Hadrien Grasland (Oct 14 2019 at 15:30, on Zulip):

Namely, do we actually a platform where we can have non-atomic volatile, but not Relaxed atomic volatile.

Hadrien Grasland (Oct 14 2019 at 15:30, on Zulip):

Now I have one example.

Hadrien Grasland (Oct 14 2019 at 15:35, on Zulip):

Posted at https://github.com/rust-lang/unsafe-code-guidelines/issues/152

Hadrien Grasland (Oct 14 2019 at 15:41, on Zulip):

So, I have given the NVidia presentation a quick skim.

Hadrien Grasland (Oct 14 2019 at 15:41, on Zulip):

Essentially, they propose to have C11-like atomic semantics, but on a restricted device scope (e.g. thread block).

gnzlbg (Oct 14 2019 at 15:44, on Zulip):

So atomics don't have C11-like semantics within a warp ?

gnzlbg (Oct 14 2019 at 15:44, on Zulip):

But they do across warps?

Hadrien Grasland (Oct 14 2019 at 15:45, on Zulip):

No, I think it's more like, you have C11 semantics "up to" a certain scope.

Hadrien Grasland (Oct 14 2019 at 15:45, on Zulip):

That could be only inside a warp (architecturally free), or within warps + thread blocks (what CUDA has today if I understand correctly).

Hadrien Grasland (Oct 14 2019 at 15:46, on Zulip):

But again, I only gave this a quick skim, and that's definitely not enough for something as subtle as a memory model.

Hadrien Grasland (Oct 14 2019 at 15:48, on Zulip):

Also, best slide ever. SAT.png

Hadrien Grasland (Oct 14 2019 at 15:53, on Zulip):

Apparently, they even have a way to do device-wide or cross-device atomics if you're interested in some expensive fences.

Hadrien Grasland (Oct 14 2019 at 15:58, on Zulip):

But well, I'm speculating quite a bit here, and this presentation is more about how they got C++-style atomics than what the finally shipped product actually does.

Hadrien Grasland (Oct 14 2019 at 16:01, on Zulip):

I tried to search for documentation a bit on the web, but it doesn't seem published yet. Even the CUDA programming guide is not up to date yet, it's not marked as implemented in the famous https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cpp11-language-features table, nor does this document contain any occurence of "std::atomic" or "<atomic>"

Hadrien Grasland (Oct 14 2019 at 16:14, on Zulip):

So I suspect that this hasn't shipped in an actual CUDA toolkit release yet.

Hadrien Grasland (Oct 14 2019 at 16:16, on Zulip):

And it will be hard to make any progress on nvptx until it does.

Hadrien Grasland (Oct 14 2019 at 16:37, on Zulip):

So, to summarize...

For the common case of a platform that has at least Relaxed loads and stores that directly map into native loads and stores, I think we agree that AtomicXyz::load_volatile/store_volatile is better than current ptr::read_volatile/write_volatile in every way and should basically replace it:

I personally think that atomic volatile would be good enough for "defer to hardware semantics" use cases in order to alleviate the need for platform-specific volatile load and store intrinsics in core::arch for those platforms. I'll probably keep thinking that unless someone can come up with a concrete case of LLVM compiling atomic volatile loads or stores into anything other than a hardware load and stores in an important use case like interaction with untrusted code via shared memory.

So far, even if LLVM don't want to guarantee that it won't happen, it has never happened, and it is unclear under which circumstances that could possibly happen (as UB-induced miscompilation could only be triggered by knowledge which LLVM doesn't have, such as "this memory location is never initialized" or "some foreign code will do a non-atomic store there"). Therefore, I think the "wait for a bug report" strategy is fine. If and when that happens we can think about going for more complex solutions to this problem, such as custom assembly.

The intrinsics route can still be useful today, in the case of platforms like nvptx which...

  1. Cannot have Relaxed atomic volatile operations, or which implement them in a needlessly inefficient way.
  2. Benefit from having sane volatile ops (in the nvptx case, that can be used when calling NVidia's synchronization intrinsics).

Sounds like a plan?

gnzlbg (Oct 14 2019 at 16:41, on Zulip):

Sounds good to me

gnzlbg (Oct 14 2019 at 16:42, on Zulip):

Regarding the API I think you raised an issue about Self pointer types, aren't those supported on nightly behind abstract_self_types ? If so, then the standard library can just use them.

gnzlbg (Oct 14 2019 at 16:43, on Zulip):

But I don't know how that feature works for raw pointers exactly

Hadrien Grasland (Oct 14 2019 at 16:43, on Zulip):

Can clients of the standard library using stable do so as well?

Hadrien Grasland (Oct 14 2019 at 16:44, on Zulip):

In any case, if Self pointer types are supported, I would expect all the basic cases to work, and brokenness to lie in corner cases like trait objects or slices ;)

Hadrien Grasland (Oct 14 2019 at 16:55, on Zulip):

Okay, I tested it.

Hadrien Grasland (Oct 14 2019 at 16:55, on Zulip):

It's clunkier than references because there isn't e.g. an automatic *mut -> *const conversion, but it actually works.

Hadrien Grasland (Oct 14 2019 at 16:58, on Zulip):

People may be anxious about de-facto stabilizing this feature by making it part of a public std API though.

gnzlbg (Oct 14 2019 at 17:35, on Zulip):

@Hadrien Grasland clients of the standard library can use standard library APIs that use this feature on stable

gnzlbg (Oct 14 2019 at 17:36, on Zulip):

they cannot use the feature themselves on stable Rust to write their own APIs

Hadrien Grasland (Oct 14 2019 at 18:15, on Zulip):

This much I know. What I'm afraid of is that someone could say that it is not acceptable to stabilize a public API that uses #![feature(arbitrary_self_types)], because then it is not possible to remove the arbitrary_self_types feature without breaking the public API of std, which amounts to making a compatibility-binding promise that at least the relevant subset of arbitrary_self_types will be eventually stabilized.

Hadrien Grasland (Oct 14 2019 at 18:19, on Zulip):

For example, at the time where the array traits implementation was switched to const generics, @centril made sure that this implementation was fully hidden from the public API for this reason. In that case, I don't think they feared that const generics would be eventually removed, but understand that they were more anxious about the possibility of const generics semantics changing in a fundamental way that would alter the semantics of the public std API throughout the feature development process.

Hadrien Grasland (Oct 14 2019 at 18:21, on Zulip):

As far as I know, the "pointer receiver" subset of arbitrary_self_types is somewhat contentious at this point in time, and it's not 100% clear whether we want to eventually stabilize it or not. Therefore, there might be similar reservations about the prospect of exposing it in a public API.

RalfJ (Oct 14 2019 at 20:36, on Zulip):

LLVM provides volatile inline assembly blocks

I have no idea what that means

RalfJ (Oct 14 2019 at 20:37, on Zulip):

On paper, sure. In practice, I'm fine with opening an issue about that, and punting how to fix it when the first user provides an example of a miscompilation

I find it highly unlikely that an affected user will be able to identify that they are affected, let alone provide a clear miscompilation examples. So this strategy IMO equals "we ignore the problem and put our head into the sand".
Which, admittedly, seems like a fair approach with LLVM :P

RalfJ (Oct 14 2019 at 20:40, on Zulip):

I think we've established that someone must do a PhD on them before they're accepted into the Rust memory model.

More like 5 PhDs should build up an operational semantics for Rust using atomics, and then another 5 PhDs should extend that with further useful orderings

I don't think the scope is quite that large^^ but yes I personally think that to accept more things into our concurrency model I'd like to see at least 2 papers studying the same model and proving some stuff about it.
that said, I do realize that those are high standards and could totally understand if the lang team would set the bar lower.

RalfJ (Oct 14 2019 at 20:41, on Zulip):

(also I didn't read all of your conversation, I figured the gist of it ends up on GH. too much text here.^^)

Hadrien Grasland (Oct 14 2019 at 20:41, on Zulip):

@RalfJ In GCC, marking an inline assembly block volatile means that the compiler can neither move it around in code nor remove it.

Hadrien Grasland (Oct 14 2019 at 20:42, on Zulip):

LLVM probably understands it similarly.

Hadrien Grasland (Oct 14 2019 at 20:43, on Zulip):

And yes, I did add the TL;DR to my github post.

Hadrien Grasland (Oct 14 2019 at 20:44, on Zulip):

Basically, we ended up agreeing that atomic volatile is fine for most archs, but there are actual archs like nvptx where regular loads and stores are not Relaxed and we may want to keep non-atomic volatile for the sake of those.

Hadrien Grasland (Oct 14 2019 at 20:51, on Zulip):

Also, here's a nice NVidia slide for you : https://rust-lang.zulipchat.com/user_uploads/4715/yoBU9sbUqpr5SJ7IPpkeiJn-/SAT.png

RalfJ (Oct 14 2019 at 20:56, on Zulip):

RalfJ In GCC, marking an inline assembly block volatile means that the compiler can neither move it around in code nor remove it.

wait so normally the compiler can remove it...?!?

RalfJ (Oct 14 2019 at 20:57, on Zulip):

and how is "you cant move it around" different from "it clobbers all the state of everything"?

Hadrien Grasland (Oct 14 2019 at 20:57, on Zulip):

IIRC, compiler can indeed drop an asm block if it can prove that e.g. the clobbers are not observable.

Hadrien Grasland (Oct 14 2019 at 20:58, on Zulip):

Yeah, optimizers are a pain at times.

RalfJ (Oct 14 2019 at 20:58, on Zulip):

Also, here's a nice NVidia slide for you : https://rust-lang.zulipchat.com/user_uploads/4715/yoBU9sbUqpr5SJ7IPpkeiJn-/SAT.png

yeah seen that :D
not big into SAT solvers / model checkers myself. they work great for whole-program stuff but I'm more interested in compositional methods. but that's just saying my research focus is different, those are still great tools!

RalfJ (Oct 14 2019 at 20:58, on Zulip):

IIRC, compiler can indeed drop an asm block if it can prove that e.g. the clobbers are not observable.

lol

centril (Oct 15 2019 at 07:57, on Zulip):

that said, I do realize that those are high standards and could totally understand if the lang team would set the bar lower.

@RalfJ If you think that's where the bar should be set then I will set the bar there :slight_smile:

gnzlbg (Oct 15 2019 at 09:31, on Zulip):

I have no idea what that means

It means that no operations are re-ordered around the inline assembly block IIUC, and that the block is also assumed to have side-effects

gnzlbg (Oct 15 2019 at 09:32, on Zulip):

wait so normally the compiler can remove it...?!?

That's how I understand it. Rust makes all inline assembly blocks "volatile", but LLVM doesn't require that

gnzlbg (Oct 15 2019 at 09:33, on Zulip):

So if your inline assembly block would be "const" (like a const fn in the Rust sense), then LLVM could execute it once, cache the result, and replace other executions with the result

gnzlbg (Oct 15 2019 at 09:35, on Zulip):

or if you write a assembly block that has no inputs, no outputs, and no effects, LLVM could just remove it

rkruppe (Oct 15 2019 at 09:59, on Zulip):

"volatile inline assembly" is a GCC-ism and a misleading name IMO. LLVM IR calls the flag sideeffect, a much more indicative name: the asm is doing ~something~ besides what's visible from the output specifiers, and of course correct optimizations must not duplicate or drop that side effect in any execution or reorder it with respect to other side effects. But unlike the volatile term it can't be interpreted as meaning "don't duplicate this snippet of assembly in the --emit asm output" (which is something some people want when e.g. inserting labels into the assembly for instrumentation or post-processing).

Lokathor (Oct 15 2019 at 16:52, on Zulip):

(deleted)

Lokathor (Oct 15 2019 at 16:54, on Zulip):

So my question is, and I think I've asked this before and gotten a vague answer: if a platform has no atomic instructions, how do atomic types work? Like old ARM has no atomics, can the compiler just fake it and use normal load/store as relaxed or something like that?

Hadrien Grasland (Oct 15 2019 at 17:35, on Zulip):

This PR by @Amanieu suggests that it can work indeed: https://github.com/rust-lang/rust/pull/65214

Hadrien Grasland (Oct 15 2019 at 17:36, on Zulip):

Conceptually speaking, there is no reason why it shouldn't, since Relaxed atomic load/store basically translates into "global cache coherence on regular loads and stores" in hardware, and most CPUs actually guarantee this.

Hadrien Grasland (Oct 15 2019 at 17:36, on Zulip):

(though as @gnzlbg found out, GPUs don't)

Amanieu (Oct 15 2019 at 17:36, on Zulip):

@Lokathor The Atomic* types are still available but they only provide load and store, not any of the other atomic operations.

Lokathor (Oct 15 2019 at 17:39, on Zulip):

oh well that's fine then

Hadrien Grasland (Oct 15 2019 at 17:39, on Zulip):

@Amanieu Out of curiosity, do we ever allow ourselves to not support all atomic orderings ?

Hadrien Grasland (Oct 15 2019 at 17:40, on Zulip):

That could be convenient if we ever allowed ourselves to port to platforms without global cache coherence.

Amanieu (Oct 15 2019 at 17:40, on Zulip):

No, all orderings must be supported.

Hadrien Grasland (Oct 15 2019 at 17:41, on Zulip):

OK, then if I understand the nvptx weirdness correctly, we'll need to do something similar to what NVidia did in C++ and provide hardware-specific "scoped atomics" which only synchronize at a certain scale.

Hadrien Grasland (Oct 15 2019 at 17:41, on Zulip):

If we ever want atomics on nvptx, that is.

Amanieu (Oct 15 2019 at 17:41, on Zulip):

In practice this means that the architecture must support a simple memory barrier instruction (SeqCst fence) or be so simple (no OoO, caches, etc) that only a compiler fence is needed.

Amanieu (Oct 15 2019 at 17:42, on Zulip):

I'll need to look into nvptx, but yea, this sounds like the kind of thing that you would put into std::arch

Hadrien Grasland (Oct 15 2019 at 17:43, on Zulip):

To save you some search trouble, it seems that the relevant NVidia atomic header is not yet part of the published version of the CUDA toolkit, and only exists as fragments of slides in a C++ conference for now.

Hadrien Grasland (Oct 15 2019 at 17:44, on Zulip):

I'm eagerly waiting for the matching release of NVidia tooling and docs to finally get a proper understanding of what GPU memory models actually look like. Seems they are quite a bit weirder than CPU memory models.

Amanieu (Oct 15 2019 at 17:45, on Zulip):

I think it's simply a matter of scoping. Normal atomics work across the entire address space, while nvptx has faster atomics that only work within the current thread group or something similar.

Hadrien Grasland (Oct 15 2019 at 17:46, on Zulip):

This is what I understood, but I would like to cross-check my understanding.

Lokathor (Oct 15 2019 at 17:46, on Zulip):

@Amanieu ARM (at least the early versions) would fall under the "so simple" category

Amanieu (Oct 15 2019 at 17:46, on Zulip):

Yes, anything pre-ARMv6 is basically single-core

Amanieu (Oct 15 2019 at 17:47, on Zulip):

and even then, I think ARMv5/ARMv4 have memory barrier instructions, through CP15

Lokathor (Oct 15 2019 at 17:48, on Zulip):

I don't recall seeing a memory barrier in ARMv4T, but i could have missed something

gnzlbg (Oct 15 2019 at 18:15, on Zulip):

Amanieu 7:45 PM
I think it's simply a matter of scoping. Normal atomics work across the entire address space, while nvptx has faster atomics that only work within the current thread group or something similar.

Well, this is a good way to see it I guess.

gnzlbg (Oct 15 2019 at 18:15, on Zulip):

The main issue is that normal atomics on such a target are so slow that nobody would use them.

Hadrien Grasland (Oct 15 2019 at 18:43, on Zulip):

I think this schematic from AMD's GCN presentations best sums up how fsked up GPU cache hierarchies are https://mynameismjp.files.wordpress.com/2018/01/amd_caches.png .

Hadrien Grasland (Oct 15 2019 at 18:43, on Zulip):

(schematic also gives an overly simple view as texture L1 cache is also split per compute unit)

Hadrien Grasland (Oct 15 2019 at 18:44, on Zulip):

(while other data and icache may be shared between multiple CUs... yeah, I'm really curious about how the NVidia folks managed to formalize that kind of craziness)

nagisa (Nov 06 2019 at 02:57, on Zulip):

https://github.com/asajeffrey/shared-data

oh wow I entirely missed this link, would definitely have linked you to our crates earlier

Last update: Nov 19 2019 at 17:55UTC