Stream: t-lang/wg-unsafe-code-guidelines

Topic: Uninit data and DMA


Thales Fragoso (Feb 28 2020 at 20:48, on Zulip):

Is there a way to create a safe abstractions over DMA transactions without UB ? Let's say I have a [MaybeUninit<u8>; 1024 ] and tell the DMA to fill it, and I would wish to transform it in a &[u8] after it's done, would it be UB to use slice::from_raw_parts ?

Thales Fragoso (Feb 28 2020 at 20:48, on Zulip):

How would I tell the compiler that the thing is in fact initialized, since it has no idea of a DMA peripheral ?
Sorry if this isn't the right place for this

Amanieu (Feb 28 2020 at 22:33, on Zulip):

So first of all you would need a fence or memory barrier to ensure the DMA is properly completed as seen by the CPU and compiler. After that sure, slice::from_raw_parts is fine. MaybeUninit<u8>is guaranteed to have the same layout as u8.

ecstatic-morse (Feb 28 2020 at 23:00, on Zulip):

@Thales Fragoso You need to use ptr::read_volatile and ptr::write_volatile to access the DMA buffer. I would avoid creating a &[u8], since it's too easy to do a non-volatile read by mistake. Instead, you should define your own slice equivalent that uses volatile reads/writes for all operations.

Thales Fragoso (Feb 28 2020 at 23:03, on Zulip):

@ecstatic-morse Why would I need volatile operations after the transaction has been completed ?

ecstatic-morse (Feb 28 2020 at 23:06, on Zulip):

Because optimizing compilers are free to omit and/or reorder reads and writes if the changes cannot be detected on the abstract machine.

ecstatic-morse (Feb 28 2020 at 23:07, on Zulip):

And the abstract machine has no idea that a DMA transaction is occurring behind the scenes.

Lokathor (Feb 28 2020 at 23:08, on Zulip):

You don't need volatile once the action is complete. However, you also must not make the &[T] before the action occurs of course.

Thales Fragoso (Feb 28 2020 at 23:08, on Zulip):

Yes, the compiler fence thing, I know

Lokathor (Feb 28 2020 at 23:10, on Zulip):

I would suggest skipping the MaybeUninit part, if you cab

Lokathor (Feb 28 2020 at 23:10, on Zulip):

Just have a zeroed buffer

Thales Fragoso (Feb 28 2020 at 23:11, on Zulip):

That wouldn't make much sense, if I have to zeroed it I might just as well copy the thing myself

Thales Fragoso (Feb 28 2020 at 23:11, on Zulip):

@Lokathor is this UB ?
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9139b07bc1015bfeb286e881602e98ea
If it is, how is that different from the DMA case ?

Thales Fragoso (Feb 28 2020 at 23:21, on Zulip):

@ecstatic-morse the write volatile makes sense, in the sense of only working with raw pointers

ecstatic-morse (Feb 28 2020 at 23:24, on Zulip):

@Lokathor the OP still needs volatile operations. If they were to check the DMA flag in a loop, then emit an mfence or whatever, and only then create a slice pointing to memory that is uninitialized or zeroed or whatever, the compiler is still free to const-propagate across the memory barrier.

Lokathor (Feb 28 2020 at 23:26, on Zulip):

@Thales Fragoso well if miri says it's fine i guess it's fine. But it sure looks to me like that code makes a reference into uninit memory

ecstatic-morse (Feb 28 2020 at 23:27, on Zulip):

@Lokathor @Thales Fragoso miri is not normative. miri has false negatives ATM.

Thales Fragoso (Feb 28 2020 at 23:27, on Zulip):

@Lokathor It sure does, but it will be the same thing with DMA, i.e. the physical memory would actually be initialized, but the compiler would have no idea about that

ecstatic-morse (Feb 28 2020 at 23:28, on Zulip):

and that code is UB

ecstatic-morse (Feb 28 2020 at 23:29, on Zulip):

although I don't think we've committed to whether mem::uninitialized::<u8>() will always be UB

Thales Fragoso (Feb 28 2020 at 23:29, on Zulip):

Yes, I thought so, I will wrap the the buffer and only allow access through volatile operations and raw pointers

Thales Fragoso (Feb 28 2020 at 23:30, on Zulip):

Then it would be okay, right ?

ecstatic-morse (Feb 28 2020 at 23:32, on Zulip):

@Thales Fragoso Yes. I might ask the embedded rust discord if there are preexisting libraries for wrapping a DMA buffer.

Lokathor (Feb 28 2020 at 23:32, on Zulip):

Hmm, this form of DMA is nothing like the operation I was expecting, honestly.

Thales Fragoso (Feb 28 2020 at 23:33, on Zulip):

@ecstatic-morse heh, I came from there, there isn't any as far as a know dealing with uninitialized

Thales Fragoso (Feb 28 2020 at 23:33, on Zulip):

There is actually one PR that does the &[u8] thing, that's why I was questioning myself about this

Thales Fragoso (Feb 28 2020 at 23:33, on Zulip):

I will comment on the thread about this

Lokathor (Feb 28 2020 at 23:34, on Zulip):

How are you calling the DMA anyway?

Thales Fragoso (Feb 28 2020 at 23:34, on Zulip):

Memory Mapped IO

Thales Fragoso (Feb 28 2020 at 23:35, on Zulip):

Just write to a bit on a specific memory location

Thales Fragoso (Feb 28 2020 at 23:35, on Zulip):

And you can check another bit to see if it's done, or activate an interrupt

Jonas Schievink (Feb 28 2020 at 23:38, on Zulip):

none of the existing DMA code I've seen in the wild uses volatile

Thales Fragoso (Feb 28 2020 at 23:38, on Zulip):

Isn't there a freeze type method to say that the compiler that I don't care that it's just random bits, just read it and don't throw it away ?

Jonas Schievink (Feb 28 2020 at 23:39, on Zulip):

I believe LLVM has something like that, but Rust doesn't expose it atm

Thales Fragoso (Feb 28 2020 at 23:39, on Zulip):

@Jonas Schievink this is true, I wonder if we are swinging in UB
Edit: regarding the volatile

Lokathor (Feb 28 2020 at 23:39, on Zulip):

LLVM just barely added it

Lokathor (Feb 28 2020 at 23:40, on Zulip):

our version of LLVM doesn't have it i think

ecstatic-morse (Feb 28 2020 at 23:41, on Zulip):

@Jonas Schievink what do they use? Maybe my reading of compiler_fence is too conservative? Does it forbid omitting reads as well as reordering like asm volatile("" ::: "memory"); would in C?

Lokathor (Feb 28 2020 at 23:41, on Zulip):

@Thales Fragoso so you write a pointer to X register, write a value to Y register to copy N bytes, and then read Z until it's done?

Thales Fragoso (Feb 28 2020 at 23:42, on Zulip):

Yep

Thales Fragoso (Feb 28 2020 at 23:42, on Zulip):

Pretty much

Thales Fragoso (Feb 28 2020 at 23:43, on Zulip):

All volatile operations on MMIO, of course

Jonas Schievink (Feb 28 2020 at 23:44, on Zulip):

@ecstatic-morse DMA code usually just uses a compiler fence and maybe an UnsafeCell around the buffer or its elements

Lokathor (Feb 28 2020 at 23:44, on Zulip):

Yeah okay that's basically what I expected now that you put it that way

Thales Fragoso (Feb 28 2020 at 23:45, on Zulip):

@Jonas Schievink which implementation uses UnsafeCell ?

Jonas Schievink (Feb 28 2020 at 23:45, on Zulip):

@Thales Fragoso The one in stm32-usbd uses VolatileCell https://github.com/stm32-rs/stm32-usbd/blob/master/src/endpoint_memory.rs

Lokathor (Feb 28 2020 at 23:45, on Zulip):

Then accessing the buffer shouldn't need volatile

Lokathor (Feb 28 2020 at 23:46, on Zulip):

just the fence after the DMA completes, and then it's all normal memory after that

Jonas Schievink (Feb 28 2020 at 23:46, on Zulip):

That specific one should be fine I guess

Jonas Schievink (Feb 28 2020 at 23:47, on Zulip):

The &'static mut worries me a little, does that still allow an UnsafeCell to change behind the compiler's back?

Thales Fragoso (Feb 28 2020 at 23:47, on Zulip):

@Lokathor how if the compiler decides to not read the thing after the compiler fence since it thinks it's all zeros ?

Thales Fragoso (Feb 28 2020 at 23:47, on Zulip):

Does compiler fence prevent that ?

Jonas Schievink (Feb 28 2020 at 23:49, on Zulip):

I'd really expect a function that's documented to "restrict the kinds of memory re-ordering the compiler is allowed to do" to also affect const prop though

Thales Fragoso (Feb 28 2020 at 23:49, on Zulip):

@Jonas Schievink Oh, yes the USB one, it's not really using the DMA peripheral but it's the same principle

Jonas Schievink (Feb 28 2020 at 23:49, on Zulip):

Doesn't the USB peripheral have built-in DMA or something?

Jonas Schievink (Feb 28 2020 at 23:50, on Zulip):

(Currently bringing up USB on the nRF52840, so all of this is pretty relevant)

Thales Fragoso (Feb 28 2020 at 23:50, on Zulip):

In the stm case it has an arbiter which moderates the accesses of the USB peripheral and the core

Thales Fragoso (Feb 28 2020 at 23:51, on Zulip):

The USB memory is in another place, reserved to it

Jonas Schievink (Feb 28 2020 at 23:51, on Zulip):

Right, but mapped to the CPU bus. So effectively this is just DMA with a slightly weirder memory setup.

Thales Fragoso (Feb 28 2020 at 23:52, on Zulip):

Effectively, yes, the USB peripheral does the writing to the memory

Thales Fragoso (Feb 28 2020 at 23:54, on Zulip):

@Jonas Schievink all of this is because I'm trying to do USB with uninit buffers, but the end_point trait expects a &mut [u8] to write to it

Thales Fragoso (Feb 28 2020 at 23:55, on Zulip):

And I was worrying about creating a slice to a uninit thing, quite a show stopper

Thales Fragoso (Feb 28 2020 at 23:56, on Zulip):

Just wished uninitialized u8 was fine

ecstatic-morse (Feb 28 2020 at 23:57, on Zulip):

@Thales Fragoso Since you've already been to the discord channel, you've probably already read the book. They seem to be very careful to do only volatile writes in addition to the memory barriers.

Thales Fragoso (Feb 28 2020 at 23:59, on Zulip):

I read it some time ago, but don't they just hand the &[u8] back to the user afterwards ?

Thales Fragoso (Feb 28 2020 at 23:59, on Zulip):

Then the user is free do read/write to it the way they want

Lokathor (Feb 29 2020 at 00:00, on Zulip):

@Thales Fragoso you escaped a pointer to a buffer to the wild world and then did a memory fence, it can't assume the buffer is still anything

ecstatic-morse (Feb 29 2020 at 00:00, on Zulip):

Ah, I thought they were copying it out.

Thales Fragoso (Feb 29 2020 at 00:01, on Zulip):

Copy is no good, heh, that's why they are using DMA in the first place

Thales Fragoso (Feb 29 2020 at 00:02, on Zulip):

@Lokathor that seems plausible, apart from the outside world part, to rust it's just another address on memory

ecstatic-morse (Feb 29 2020 at 00:02, on Zulip):

Well if compiler_fence also declares "arbitrary-side effects may have occurred" then it's fine to just create a &[u8]. I've not read anything that explicitly states this, but it seems like it's maybe implicit/obvious?

ecstatic-morse (Feb 29 2020 at 00:02, on Zulip):

Not to me obviously.

Jonas Schievink (Feb 29 2020 at 00:02, on Zulip):

There has to be some way we can give a &[u8] back to the user though, otherwise DMA would become a total nuisance in all APIs

Thales Fragoso (Feb 29 2020 at 00:03, on Zulip):

@ecstatic-morse I really don't like "implicit" things

ecstatic-morse (Feb 29 2020 at 00:03, on Zulip):

Nor do I.

Lokathor (Feb 29 2020 at 00:04, on Zulip):

@Thales Fragoso volatile means "special spooky actions happen here, you must do it exactly as often as i say"

Thales Fragoso (Feb 29 2020 at 00:05, on Zulip):

That's the thing, if we give &[u8] back to the user they sure won't be using volatile_read

Lokathor (Feb 29 2020 at 00:06, on Zulip):

Volatile is still greatly restricted in what those special things can do, it isn't total chaos, but it's allowed to do side effects the compiler doesn't see

Lokathor (Feb 29 2020 at 00:06, on Zulip):

that's the whole point of the attribute

Amanieu (Feb 29 2020 at 00:06, on Zulip):

You can't have a &[u8] because the compiler assumes no aliasing. This is false since DMA has a reference to the buffer.

Amanieu (Feb 29 2020 at 00:07, on Zulip):

Basically you need to construct the slice after the DMA has complete and you have issued your fence.

Jonas Schievink (Feb 29 2020 at 00:07, on Zulip):

The &[u8] would of course only exist when DMA is not running

Amanieu (Feb 29 2020 at 00:07, on Zulip):

Then no problem.

Jonas Schievink (Feb 29 2020 at 00:07, on Zulip):

Yeah, that's what I'm thinking

Thales Fragoso (Feb 29 2020 at 00:07, on Zulip):

But it's not constructed, it's given back

Jonas Schievink (Feb 29 2020 at 00:08, on Zulip):

@ecstatic-morse FWIW the docs of compiler_fence pretty clearly say "with Acquire, subsequent reads and writes cannot be moved ahead of preceding reads" and that's essentially the fundamental guarantee we need

Thales Fragoso (Feb 29 2020 at 00:08, on Zulip):

But it doesn't say it can't be omitted

Jonas Schievink (Feb 29 2020 at 00:09, on Zulip):

Ugh, is that really going to be an issue?

Thales Fragoso (Feb 29 2020 at 00:10, on Zulip):

Let's say we have a zeroed buffer, the compiler knows it's zeroed, then we DMA into the buffer, and after the whole thing is complete we ask it for buffer[0]

Thales Fragoso (Feb 29 2020 at 00:10, on Zulip):

The compiler is sure that this would return zero, what would it do the read?

Jonas Schievink (Feb 29 2020 at 00:10, on Zulip):

Yeah...

Jonas Schievink (Feb 29 2020 at 00:11, on Zulip):

I meant to be joking when I said "another week another soundness issue" last week, but it looks like this is really going to continue indefinitely

Thales Fragoso (Feb 29 2020 at 00:13, on Zulip):

We should create a DmaSlice type and spread it over the ecosystem

Thales Fragoso (Feb 29 2020 at 00:13, on Zulip):

Is there a volatile_copy_nonoverlapping operation ? Heh

ecstatic-morse (Feb 29 2020 at 00:14, on Zulip):

@Amanieu This is the relative section of the LLVM refererence

Jonas Schievink (Feb 29 2020 at 00:14, on Zulip):

This will probably poison all APIs that transitively call DMA functions, and greatly affect performance too

Jonas Schievink (Feb 29 2020 at 00:15, on Zulip):

Well, unless it's read-only DMA

ecstatic-morse (Feb 29 2020 at 00:15, on Zulip):

Given that definition, Rbyte is defined as follows:

If R is volatile, the result is target-dependent. (Volatile is supposed to give guarantees which can support sig_atomic_t in C/C++, and may be used for accesses to addresses that do not behave like normal memory. It does not generally provide cross-thread synchronization.)

Otherwise, if there is no write to the same byte that happens before Rbyte, Rbyte returns undef for that byte.

ecstatic-morse (Feb 29 2020 at 00:16, on Zulip):

where R is defined as :

Every (defined) read operation (load instructions, memcpy, atomic loads/read-modify-writes, etc.) R reads a series of bytes written by (defined) write operations (store instructions, atomic stores/read-modify-writes, memcpy, etc.). For the purposes of this section, initialized globals are considered to have a write of the initializer which is atomic and happens before any other read or write of the memory in question. For each byte of a read R, Rbyte may see any write to the same byte, except:

Lokathor (Feb 29 2020 at 00:17, on Zulip):

Just have old and crappy DMA that doesn't run as a co-processing unit, like me ;3

Amanieu (Feb 29 2020 at 00:20, on Zulip):

Sure, you need a volatile write to initiate DMA and to read the DMA-complete flag. But apart from that you just need to insert a fence to establish that the write by the DMA engine happens before you reading the result of the DMA.

Amanieu (Feb 29 2020 at 00:21, on Zulip):

You can think of the DMA engine as a separate thread. It uses a store-release on the DMA-complete flag and you use a load-acquire to read it. Then you are free to read the DMA results normally

Jonas Schievink (Feb 29 2020 at 00:22, on Zulip):

Okay, sounds like I'll punt on rewriting half the ecosystem until the UCG WG agrees on this :)

Thales Fragoso (Feb 29 2020 at 00:23, on Zulip):

@Amanieu Won't the compiler be allowed to omit a read to the dma Buffer if it thinks it didn't change ?

Lokathor (Feb 29 2020 at 00:23, on Zulip):

Well Amanieu and I have made basically the same case here

ecstatic-morse (Feb 29 2020 at 00:23, on Zulip):

@Amanieu but where's the "write" to memory backing the DMA buffer? Won't any non-volatile read beundef since there's no observable write within that thread?

Amanieu (Feb 29 2020 at 00:24, on Zulip):

@ecstatic-morse The write is done by the DMA engine. You have to stretch the definition of a parallel thread of execution to include the DMA engine.

Amanieu (Feb 29 2020 at 00:25, on Zulip):

I'll put that under "there are platform-specific ways to create them, and we define LLVM IR’s behavior in their presence"

Thales Fragoso (Feb 29 2020 at 00:25, on Zulip):

That's what why think too, there's no write anywhere in any part of the rust abstraction machine

Jonas Schievink (Feb 29 2020 at 00:26, on Zulip):

It does sound like the compiler would have to prove absence of such writes before it can const prop though, which is not really possible once the buffer address has been written to a register with a volatile_write, right?

Thales Fragoso (Feb 29 2020 at 00:27, on Zulip):

Does it even know what a DMA is ?

Amanieu (Feb 29 2020 at 00:27, on Zulip):

It doesn't need to, it is just treated as FFI.

Thales Fragoso (Feb 29 2020 at 00:29, on Zulip):

Does it really ?

Amanieu (Feb 29 2020 at 00:29, on Zulip):

Yes.

Jonas Schievink (Feb 29 2020 at 00:29, on Zulip):

FWIW we also model interrupt handlers as threads, and that model seems to fit them perfectly

Thales Fragoso (Feb 29 2020 at 00:30, on Zulip):

@Amanieu So if a write the address of a &[MaybeUninit<u8>] to a random location with volatile_write can I take a &[u8] to it afterwards without UB ?

Thales Fragoso (Feb 29 2020 at 00:32, on Zulip):

If it treats it like a FFI it won't assume that isn't uninit and throw it away, right ?

Amanieu (Feb 29 2020 at 00:32, on Zulip):

You need a fence. Otherwise the compiler will assume the contents haven't changed and are still uninitialized.

Thales Fragoso (Feb 29 2020 at 00:32, on Zulip):

Yeah yeah, with a fence too

Thales Fragoso (Feb 29 2020 at 00:33, on Zulip):

Then, would it be okay (no UB) even without no DMA present ?

Amanieu (Feb 29 2020 at 00:34, on Zulip):

Well technically you still need someone at the other end to actually initialize the memory for it to be valid.

Thales Fragoso (Feb 29 2020 at 00:34, on Zulip):

Everything is valid for an u8 no ?

ecstatic-morse (Feb 29 2020 at 00:34, on Zulip):

@Amanieu To clarify, you're saying that after a fence instruction, the compiler must assume that all memory may have changed arbitrarily and thus is no longer undef?

ecstatic-morse (Feb 29 2020 at 00:34, on Zulip):

Or just heap memory?

Amanieu (Feb 29 2020 at 00:35, on Zulip):

Anything whose address may be visible to other threads of execution.

ecstatic-morse (Feb 29 2020 at 00:35, on Zulip):

Or just memory whose address is published?

Thales Fragoso (Feb 29 2020 at 00:35, on Zulip):

@ecstatic-morse I think he is also saying that the address would have to be written somewhere with a volatile to emulate a ffi

Amanieu (Feb 29 2020 at 00:35, on Zulip):

Yes, that.

ecstatic-morse (Feb 29 2020 at 00:36, on Zulip):

Ah, okay. How do I know when an address is published?

Amanieu (Feb 29 2020 at 00:36, on Zulip):

(I'd love to continue this discussion but it's getting late)

Jonas Schievink (Feb 29 2020 at 00:36, on Zulip):

some sort of escape analysis perhaps?

Thales Fragoso (Feb 29 2020 at 00:37, on Zulip):

That would fix my initial problem at least...

ecstatic-morse (Feb 29 2020 at 00:37, on Zulip):

@Amanieu np, I'll check the LLVM docs

ecstatic-morse (Feb 29 2020 at 00:54, on Zulip):

@Thales Fragoso So the term I was looking for is "pointer capture". When you take the address of your DMA buffer and write it to the MMIO register used to initiate DMA transfer, that pointer becomes captured and the buffer escapes the current thread of execution. Since you have a single-threaded fence after the you write the address, the compiler cannot reorder any subsequent reads of the buffer before the point at which the pointer was captured, and it can no longer assume that their value (whether 0 or undef) is known.

Lokathor (Feb 29 2020 at 00:54, on Zulip):

@ecstatic-morse when you volatile write the address to a location outside of any location LLVM is using, you've published it

Lokathor (Feb 29 2020 at 00:55, on Zulip):

oh zullip didn't update at first, you already got it

ecstatic-morse (Feb 29 2020 at 01:00, on Zulip):

Thanks all! I learned things today.

Thales Fragoso (Feb 29 2020 at 01:01, on Zulip):

@Lokathor what random memory can I write to emulate this ? Would it have to be out of ram ?

ecstatic-morse (Feb 29 2020 at 01:05, on Zulip):

@Thales Fragoso Any non-stack address didn't get from malloc should cause the pointer to be marked as captured. Capture tracking is itself not precise, so it's not possible to say precisely what locations a pointer can be written that don't cause it to become captured.

ecstatic-morse (Feb 29 2020 at 01:05, on Zulip):

https://llvm.org/doxygen/CaptureTracking_8cpp_source.html

Lokathor (Feb 29 2020 at 01:05, on Zulip):

@Thales Fragoso You can't just write to random memory XD that's also not allowed

ecstatic-morse (Feb 29 2020 at 01:06, on Zulip):

(is what I'm reading ATM)

Thales Fragoso (Feb 29 2020 at 01:08, on Zulip):

@Lokathor well, not random, but precisely chosen to not cause side effects

Lokathor (Feb 29 2020 at 01:09, on Zulip):

the semantics of volatile are not only specific to a general build target, they're specific to the device. on any device with an OS and a memory management unit, you'll run afoul of the MMU

Thales Fragoso (Feb 29 2020 at 01:09, on Zulip):

Good thing I don't have a MMU heh

Lokathor (Feb 29 2020 at 01:09, on Zulip):

;3

Lokathor (Feb 29 2020 at 01:10, on Zulip):

"consult your datasheets"

Thales Fragoso (Feb 29 2020 at 01:22, on Zulip):

Thanks all

RalfJ (Feb 29 2020 at 08:53, on Zulip):

Lokathor said:

Thales Fragoso well if miri says it's fine i guess it's fine. But it sure looks to me like that code makes a reference into uninit memory

Miri doesnt catch all UB, just a lot of it. Also see the README. And also to make it possible to run Miri on more code, Miri currently does not complain about uninitialized integers, or references to uninitialized data.

RalfJ (Feb 29 2020 at 08:57, on Zulip):

Thales Fragoso said:

Everything is valid for an u8 no ?

uninitialized memory isn't (maybe -- see https://github.com/rust-lang/unsafe-code-guidelines/issues/71)

RalfJ (Feb 29 2020 at 08:58, on Zulip):

also while "captured/escaped pointers" etc are important notions in the compiler, they are not part of the spec, so one has to be very careful when using them to reason about absence of UB...

Thales Fragoso (Feb 29 2020 at 14:03, on Zulip):

@RalfJ But won't pointer espace "trick" the compiler into thinking that this is fact initialized ?

Thales Fragoso (Feb 29 2020 at 14:04, on Zulip):

If we can't trust that then we will need to change the entire API for DMA in embedded, and that will bring a lot of overhead and hard to use apis

Hanna Kruppe (Feb 29 2020 at 15:42, on Zulip):

The point is that it's precarious to reason about soundness by thinking of specific code transformations that may "break your code" and how those transformations may be blocked by throwing wrenches into (your mental model of) the compiler's internal reasoning. That does not necessarily mean that any particular conclusion you get that way is wrong, just that it's easier to get wrong conclusions.

Thales Fragoso (Feb 29 2020 at 16:03, on Zulip):

I understand that this is a fragile reasoning, but I don't see any other way to do DMA in a sensible way, especially mem-to-mem DMA

Thales Fragoso (Feb 29 2020 at 16:05, on Zulip):

It doesn't make sense to do mem-to-mem DMA without uninitialized data that the compiler has no way to know that it was in fact initialized

Thales Fragoso (Feb 29 2020 at 16:06, on Zulip):

If the CPU has to initialize the data why use DMA then ?

Lokathor (Feb 29 2020 at 16:32, on Zulip):

Well, just as an example, ASM doesn't care at all, so you can do DMA by having a C FFI call that actually links to an assembly block that does the copy and returns with the buffer filled and LLVM literally can't tell that it was DMA or CPU.

Thales Fragoso (Feb 29 2020 at 16:36, on Zulip):

@Lokathor that's why I think pointer escape is sufficient in this case, the mechanism that makes LLVM drop all assumptions about something that got passed to FFI is pointer escape/capture, or am I wrong ?

Amanieu (Feb 29 2020 at 16:37, on Zulip):

The DMA engine is another thread from LLVM's point of view. So what you are doing is just standard inter-thread communication.

Lokathor (Feb 29 2020 at 16:37, on Zulip):

you are correct Thales

Hanna Kruppe (Feb 29 2020 at 16:45, on Zulip):

@Thales Fragoso Note the difference between "makes LLVM drop all assumptions" (which btw is arguably wrong as stated) and casting the DMA transfer as a kind of multi-threaded communication. Both ways end at "this code should be fine", but the latter rests on specifications giving actual guarantee about how any implementation will behave, regardless of how clever or exotic it is. You don't even need the precarious reasoning to get your goal.

Thales Fragoso (Feb 29 2020 at 16:53, on Zulip):

I agree that some semantics I used were very brittle, but that doesn't change the fact that there is no need for an actual DMA transfer to happen to make the code UB-free

Lokathor (Feb 29 2020 at 17:00, on Zulip):

Uhm, yes? I don't think we ever stated that you needed to literally do a DMA to freeze the memory. There's a few ways you can just freeze memory if that's the goal.

Thales Fragoso (Feb 29 2020 at 17:03, on Zulip):

@Lokathor yes, the DMA transfer was the initial goal, but freezing memory is also useful in other contexts, are there other ways to do it without FFI ?

Lokathor (Feb 29 2020 at 17:05, on Zulip):

Not really, the basic idea is always some form of "let code outside of LLVM's view edit the memory and LLVM won't know what happened, so if nothing actually happened then the memory is just plain frozen."

Some day LLVM will have a full intrinsic for it, but I think that is currently just in development (last I heard).

Thales Fragoso (Feb 29 2020 at 17:07, on Zulip):

Thanks for all the explanations

Lokathor (Feb 29 2020 at 17:08, on Zulip):

And as I'm sure Ralf would want me to say: any current way to freeze memory is unspecified and just "happens to work that way" status.

Thales Fragoso (Feb 29 2020 at 17:15, on Zulip):

Is there a portable way of doing this ? without the need to link in a C or asm routine that just returns ?

Hanna Kruppe (Feb 29 2020 at 17:18, on Zulip):

Note that "freezing memory without writing it" is brittle for reasons beyond just compiler behavior / lack of specification. See e.g. https://github.com/rust-lang/rust/pull/58363#issuecomment-512119241

Thales Fragoso (Feb 29 2020 at 17:22, on Zulip):

@Hanna Kruppe thanks for the link, it's good information to keep in mind, but it doesn't concern me in this case, since I don't have an OS or even a heap

Thales Fragoso (Feb 29 2020 at 17:35, on Zulip):

And I sure won't depend on any "freezed" value to make sense before I write to it, I just need it to not be UB

Amanieu (Feb 29 2020 at 18:30, on Zulip):

Remember that there are platforms where uninitialized memory is tracked at the hardware level. You need to actually write to the memory, otherwise it's still uninitialized.

Amanieu (Feb 29 2020 at 18:31, on Zulip):

One surprising example is Linux: if you use madvise(MADV_FREE) (which jemalloc does), then it's telling the OS that if nobody has written to the page since that syscall, it can be reclaimed and replaced with a zero page. This can happen at any time unless you actually write to the memory.

RalfJ (Feb 29 2020 at 18:35, on Zulip):

Thales Fragoso said:

I agree that some semantics I used were very brittle, but that doesn't change the fact that there is no need for an actual DMA transfer to happen to make the code UB-free

that's exactly the point -- this is wrong. unless you are doing something for which the spec says that it freezes memory (which so far isnt possible), then the code does have UB unless something actually initialized that memory (that something can be another device via DMA, sure).
This is the kind of UB that cannot bite you unless the compiler can somehow prove that no DMA is going to happen, but it's still UB.
the fact that "pointers were escaped" doesn't change this -- the spec doesnt have any clause that says "if the pointer escaped it's not UB".

I admit that this might sound like academic nitpicking, but I do think it is important which part of the reasoning here is based on the "source of truth" (the spec, which is ideally written without even mentioning a compiler) and which part is based on looking at how current compilers happen to implement the spec. The latter can lead to correct results, but it can also be very misleading.
In this case though it looks like things work out the same either way, so that's good :)

Thales Fragoso (Mar 09 2020 at 11:21, on Zulip):

There was a question regarding pointer capture/escape on the rust-embedded chat, I wasn't sure of the answer so I will ask here.

The setup is a SPI peripheral reading data of a buffer through DMA. There is a setup stage where the pointer to the buffer gets written to the DMA registers (escaped) and after that a compiler fence.

After that there will be some computations and writing to this buffer in an interrupt handler and using a compiler fence before asserting the DMA transfer, the buffer never gets read anywhere in the code.

The question is, the pointer only escapes in the beginning of the program, would this be enough to prevent any writes to the buffer of being omitted throughout the rest of the program or there is a need for write_volatile ?

There is no concern with reordering apart from the very specific places that will have compiler fences

RalfJ (Mar 09 2020 at 12:10, on Zulip):

There is a setup stage where the pointer to the buffer gets written to the DMA registers (escaped) and after that a compiler fence.

is that a write_volatile?

Thales Fragoso (Mar 09 2020 at 12:12, on Zulip):

Yes, all MMIO interactions are uses write/read volatile by design

Thales Fragoso (Mar 09 2020 at 12:13, on Zulip):

The important part of the question is that the setup only occurs once in the beginning, but the buffer will be written over and over throughout the course of the program

Thales Fragoso (Mar 09 2020 at 12:14, on Zulip):

There will be compiler fences after every complete interaction with the buffer to prevent it from getting too postponed

Lokathor (Mar 09 2020 at 20:10, on Zulip):

So, rust writes to the buffer without reading it and the DMA unit copies the buffer to some other place periodically?

Thales Fragoso (Mar 09 2020 at 20:13, on Zulip):

Yes, and the rust code changes the buffer in-between the DMA readings

Lokathor (Mar 09 2020 at 20:13, on Zulip):

with compiler fencing that sounds fine

Lokathor (Mar 09 2020 at 20:15, on Zulip):

Honestly the volatile part probably doesn't really hurt anyway

Thales Fragoso (Mar 09 2020 at 20:15, on Zulip):

Yeah, I also think that, wasn't sure that if rust code keeps writing to it somewhere it would disregard the pointer as escaped and start optimizing out the writes

Thales Fragoso (Mar 09 2020 at 20:16, on Zulip):

I think the volatile could have a bad effect depending of the amount of data and processing that occurs while filling the buffer

Thales Fragoso (Mar 09 2020 at 20:16, on Zulip):

But that would have to be measured

Lokathor (Mar 10 2020 at 02:32, on Zulip):

if you only ever write to the buffer, unless you're writing the same spot more than once, every byte gets written once volatile or not

Thales Fragoso (Mar 10 2020 at 03:02, on Zulip):

There will be multiple writes to the same byte on the buffer, it would work like that:

1:

Thales Fragoso (Mar 10 2020 at 03:04, on Zulip):

What I'm not sure is that if writes on looping 1 won't get optmized out

Thales Fragoso (Mar 10 2020 at 03:04, on Zulip):

Maybe the best solution would be to escape the pointer once every loop

Thales Fragoso (Mar 10 2020 at 03:04, on Zulip):

Optimizing compilers are hard

Lokathor (Mar 10 2020 at 03:44, on Zulip):

Sorry, I'm speaking specifically about the time between DMA uses.

Each byte of the buffer is written at most once, and then you do a fence/DMA cycle, then each byte is written at most once again on the next loop, then a fence/DMA cycle, etc.

If that is the case, then using volatile or not makes no difference.

Thales Fragoso (Mar 10 2020 at 03:52, on Zulip):

That is the case, but the DMA cycle doesn't write the pointer to the buffer to a register anymore, since it isn't required for the peripheral, since it already has the address from the first setup

Lokathor (Mar 10 2020 at 03:55, on Zulip):

Sure, that's not really related though.

The issue with using volatile vs non-volatile for the actual buffer filling steps is that volatile must do exactly the count of reads and writes written, and normal reads/writes can elide repeated reads or skip early writes if there's no read before the next write. In other words, normal access can turn two writes into one write by skipping the first write. However, if there is already exactly one write per byte into the buffer, the compiler cannot reduce that any further. Thus, both normal access and volatile access would have exactly 1 write per byte into the buffer. So volatile would have no speed difference.

Thales Fragoso (Mar 10 2020 at 03:58, on Zulip):

Oh, ok, now I understand, you were talking about the performance hit not about the problem with omitted writes

Lokathor (Mar 10 2020 at 03:58, on Zulip):

Yes, right

Thales Fragoso (Mar 10 2020 at 03:59, on Zulip):

Yes, putting that way it looks like it's much better to not really depend on compiler behavior and use volatile writes

Lokathor (Mar 10 2020 at 03:59, on Zulip):

Yes, since it can't hurt, just be volatile

Thales Fragoso (Mar 10 2020 at 04:01, on Zulip):

The bad thing would be if that was a lib that hides the implementation and just passes the buffer to the user for the filling etc

Thales Fragoso (Mar 10 2020 at 04:02, on Zulip):

I mean, not necessarily bad, but cumbersome

Lokathor (Mar 10 2020 at 04:04, on Zulip):

In that case, I'd possibly just eat the cost of a few cycles and just re-assign the pointer to the buffer to the DMA unit

Lokathor (Mar 10 2020 at 04:13, on Zulip):

In fact I think you'd have to because in calling the closure with a &mut [MyType] arg I'm pretty sure you'd invalidate the existing pointer under the stacked borrow rules (?)

Thales Fragoso (Mar 10 2020 at 04:18, on Zulip):

Hmm, I didn't quite follow, what closure ? A user API ?

Lokathor (Mar 10 2020 at 05:14, on Zulip):

Yeah, I guess it depends on what sort of API you have. I was thinking something like

pub fn use_dma_to_move_buffers<F: FnMut(&mut [u8])>(dma: DMAUnit, usize: count) -> Result<(), ()>{ ... }

Or something where you're prepping a buffer in the lib, then user code fills the buffer, then you send off the buffer. In that case, calling the user code would start moving around a unique reference to the buffer, which would (if I remember my stacked borrows properly) invalidate the pointer that you had to the buffer. And once the pointer is invalidated... does it count as escaped any more? I dunno. Like, I really don't. We'd need some real LLVM wizards to tell us that one.

But you could probably make _some sort_ of good API that's using DMA, depending on your exact needs. I think that it's probably device specific enough that I wouldn't expect the embedded-wg to be able to have a super suitable abstraction in the embedded-hal crate, for example. Probably it'd be simper to write up a page of suggested guidelines and then each specific device's crate can have functions for doing DMA on that device.

Thales Fragoso (Mar 10 2020 at 05:25, on Zulip):

Yes, now I understand, we cast the pointer to an u32 to write to register, but I guess that even that way it still applies

RalfJ (Mar 10 2020 at 09:26, on Zulip):

Thales Fragoso said:

Yeah, I also think that, wasn't sure that if rust code keeps writing to it somewhere it would disregard the pointer as escaped and start optimizing out the writes

so as mentioned elsewhere, "escaped pointer" is not a thing in the Rust spec and thus not terribly useful if you are asking "what does the compiler have to do per spec".
but if you are asking exclusively about how current Rust happens to think about it, then -- a pointer, once escaped, can never "unescape" as long as it remains a raw pointer. (creating a mutable reference is a promise to the compiler that the reference is unique and all potentially escaped aliases will not be used again, so you cannot use references.)

RalfJ (Mar 10 2020 at 09:29, on Zulip):

as far as the spec is concerned, I think of the DMA device as something akin to another thread, and once that thread has access to your memory it can just keep accessing it. and since DMA "pesudo-threads" through some magic (that I dont understand but well^^) dont need "real fences" (just compiler fences), the fences you have should be enough synchronization to enable proper communication.

Lokathor (Mar 10 2020 at 17:01, on Zulip):

It becomes very target specific because DMA on some devices halts the CPU while it works (similar to an interrupt), and with other devices the DMA is running along side the CPU (similar to a multi-core situation). So "pseudo-threads" is probably the best general abstraction.

RalfJ (Mar 10 2020 at 17:10, on Zulip):

with other devices the DMA is running along side the CPU (similar to a multi-core situation)

but somehow still in a way that compiler fences are enough, no real fences are needed?

RalfJ (Mar 10 2020 at 17:11, on Zulip):

that is the part that makes little sense to me

Thales Fragoso (Mar 10 2020 at 17:21, on Zulip):

I would say that in a lot of cases you need the memory fence, but that isn't the case for most cortex-m processors

Thales Fragoso (Mar 10 2020 at 17:22, on Zulip):

Omitting the DMB or DSB instruction in the examples in Figure 41 on page 47 and Figure 42 would not cause any error because the Cortex-M processors:

do not re-order memory transfers

do not permit two write transfers to be overlapped.

Thales Fragoso (Mar 10 2020 at 17:23, on Zulip):

The use of DMB is rarely needed in Cortex-M processors because they do not reorder memory transactions. However, it is needed if the software is to be reused on other ARM processors, especially multi-master systems.

RalfJ (Mar 10 2020 at 17:29, on Zulip):

Thales Fragoso said:

I would say that in a lot of cases you need the memory fence, but that isn't the case for most cortex-m processors

ah, fair. so these pseudo-threads have some target-specific rules for whether synchronizing with them requires a hardware fence or just a compiler fence. makes sense.

Lokathor (Mar 10 2020 at 17:32, on Zulip):

when dma runs along side cpu there usually is external fencing i think. for example, when using vulkan to move data to the gpu memory there are fences involved.

Last update: Jun 05 2020 at 21:40UTC