Stream: t-compiler/wg-self-profile

Topic: mmap concerns


Jake Goulding (Mar 27 2019 at 12:15, on Zulip):

Jake Goulding that's fine in this case, i.e. there's no real risk here

Why is that?

Jake Goulding (Mar 27 2019 at 12:16, on Zulip):

(This is one of my big problems with the mmap crate, which doesn't clearly explain this unsafety in the docs the last time I checked)

mw (Mar 27 2019 at 12:16, on Zulip):

the file is exclusively accessed by a single process and all data is written just once

mw (Mar 27 2019 at 12:17, on Zulip):

i.e. we open a new file, touch each byte once for writing and never look at it again

Jake Goulding (Mar 27 2019 at 12:17, on Zulip):

exclusively accessed by a single process

That's my point though; you literally cannot guarantee that. While it's writing, an automated program can see it and start writing to it as well

mw (Mar 27 2019 at 12:18, on Zulip):

the same is true for basically everything in the compiler where the file system is touched

Jake Goulding (Mar 27 2019 at 12:18, on Zulip):

Sure, but in the case of File, you don't introduce memory unsafety, the file just gets clobbered

mw (Mar 27 2019 at 12:19, on Zulip):

do you have a specific scenario in mind that worries you?

mw (Mar 27 2019 at 12:20, on Zulip):

we are not using the mmap for communicating between threads or processes

mw (Mar 27 2019 at 12:21, on Zulip):

it is just supposed to let the OS do some paging magic

Jake Goulding (Mar 27 2019 at 12:21, on Zulip):

With file-backed-mmap, you get a *mut T and convert it to a &T or &mut T. During the time that reference exists, if anything else decides to write to that file, the reference's guarantees are invalidated.

Jake Goulding (Mar 27 2019 at 12:22, on Zulip):

I do not have any specific scenarios, no. I'm just pointing out that, as far I have been able to determine, file-backed-mmap cannot be used in Rust without introducing memory unsafety based on my understanding of the rules.

Jake Goulding (Mar 27 2019 at 12:22, on Zulip):

It doesn't matter what the intended use is, only what the possible uses are.

mw (Mar 27 2019 at 12:22, on Zulip):

well yeah, the memory/file is accessed only from unsafe code anyway

mw (Mar 27 2019 at 12:23, on Zulip):

but that's like saying you can't use &mut [] safely because someone else might use unsafe to get another reference

Jake Goulding (Mar 27 2019 at 12:24, on Zulip):

only from unsafe code anyway

You aren't allowed to violate Rust's guarantees in an unsafe block. The compiler just cannot validate them, so it's up to the programmer.

Jake Goulding (Mar 27 2019 at 12:24, on Zulip):

but that's like saying you can't use &mut [] safely because someone else might use unsafe to get another reference

I'm sorry if I've given this impression, but it's definitely not the same case.

mw (Mar 27 2019 at 12:25, on Zulip):

we are doing a runtime check that makes sure that we don't violate the invariants in the current process

Jake Goulding (Mar 27 2019 at 12:25, on Zulip):

In the &mut[] case, the person who wrote the unsafe block code didn't uphold the reference's guarantees of non-mutability / single-mutable

mw (Mar 27 2019 at 12:25, on Zulip):

if another process writes over the file, well that's a bug

Jake Goulding (Mar 27 2019 at 12:25, on Zulip):

A bug that (I believe) introduces memory unsafety

Jake Goulding (Mar 27 2019 at 12:26, on Zulip):

Thus my statement:

file-backed-mmap cannot be used in Rust without introducing memory unsafety

mw (Mar 27 2019 at 12:26, on Zulip):

so you are worried about an attacker?

rkruppe (Mar 27 2019 at 12:26, on Zulip):

Chiming in from the back row to say that you can side step these concerns by creating a &[Cell<u8>] and never having a reference to a non-cell type in the first place

mw (Mar 27 2019 at 12:26, on Zulip):

if no other process access the file, then no assumptions that the compiler made are violated

Jake Goulding (Mar 27 2019 at 12:27, on Zulip):

There's two schools of thought about that: does UB occur only when it happens, or when it could happen? I believe Rust falls into the latter case.

mw (Mar 27 2019 at 12:28, on Zulip):

I thought the problem with UB is that the compiler does invalid optimizations

Jake Goulding (Mar 27 2019 at 12:28, on Zulip):

@rkruppe ah, that's a missing piece I hadn't considered. I wonder how painful it is to read/write to such a thing

Jake Goulding (Mar 27 2019 at 12:31, on Zulip):

@mw for example

fn main() {
    let mut i = 42;
    unsafe {
        let mut a = &mut *(&mut i as *mut _);
        let mut b = &mut *(&mut i as *mut _);

        println!("{:p}, {:p}", a, b);
    }
}

Miri reports this as causing undefined behavior. We never actually mutate anything; the sheer existence of the aliased pointers is enough.

mw (Mar 27 2019 at 12:36, on Zulip):

I don't see how this relates to memory mapped files that are accessed exclusively by one process

rkruppe (Mar 27 2019 at 12:45, on Zulip):

The caveat "exclusively accessed by one process" is a big one because that's not something you can do in that process to ensure others won't access the file, and unlike e.g. unsafe code in the same process duplicating mutable references, there is no convention or assignment of responsibility that would prevent it. Perfectly well-behaved processes and well-intentioned users can trample all over your mmap'd file. IMO a program that "only has UB if other processes access the file" is just as buggy as a program that "only has UB if the input file contains <perfectly plausible errors>", e.g. a malformed gzip archive that leads to a buffer overflow in a decompression program.

Jake Goulding (Mar 27 2019 at 12:46, on Zulip):

That is a wonderfully better way of expressing my concerns, thank you :-)

Jake Goulding (Mar 27 2019 at 12:47, on Zulip):

Practically, I don't know if anything bad will happen now. What's most insidious is when compiler changes in the future break seemingly "working" code.

mw (Mar 27 2019 at 12:51, on Zulip):

I'm still not convinced this applies here :)

mw (Mar 27 2019 at 12:51, on Zulip):

I'm not trying to convince you that mutable shared memory maps are fine in general

mw (Mar 27 2019 at 12:51, on Zulip):

I totally see what your concern is

mw (Mar 27 2019 at 12:52, on Zulip):

I just don't see a real risk in this specific case

mw (Mar 27 2019 at 12:55, on Zulip):

I mean by that same argument, we would not be allowed to read crate metadata from disk because someone could have changed the length of an array in the binary format

mw (Mar 27 2019 at 12:56, on Zulip):

this is a form of FFI where there is always some risk that some code outside of Rust has changed something unexpectedly

Jake Goulding (Mar 27 2019 at 13:02, on Zulip):

I mean by that same argument, we would not be allowed to read crate metadata from disk because someone could have changed the length of an array in the binary format

I'm not sure this is the same argument. In that case, with an untrusted input like the file, we would perform some check in our process that the array length is correct. If it wasn't, we don't allow access to that data (panic, return a Result, etc.).

If we didn't perform that check, but assumed that the array length was accurate and assumed that data at some offset was of a specific form, then that might lead to memory unsafety and should indeed be disallowed.

Jake Goulding (Mar 27 2019 at 13:06, on Zulip):

With file-backed-mmap, there's no possible check that our program can make to know that the underlying data hasn't changed.

mw (Mar 27 2019 at 13:08, on Zulip):

but the data is in the file is not read by the rustc process, it is only written

rkruppe (Mar 27 2019 at 13:08, on Zulip):

I personally don't care super strongly about this situation :) But the one thing that actually gives me a stomach ache when mmap'd files come up, is: what precedent does this set? Even if in this situation the risk is tiny and acceptable for big performance wins, I worry that there is insufficient caution about the pitfalls of mmap in the community. For example, rustc's clearly not intended to be run in a hostile execution environment, and that's fine for rustc, but a different project may have different needs and if its authors are unaware of the potential UB of mmap, they may handle it insufficiently.

mw (Mar 27 2019 at 13:10, on Zulip):

that's something that should be documented in the memmap crate, I think

mw (Mar 27 2019 at 13:11, on Zulip):

I mean, I don't mind adding a big warning comment to the measureme crate too, should it turn out that mmap files are actually faster

mw (Mar 27 2019 at 13:12, on Zulip):

but memory mapped files are a fundamental OS facility, they should be usable in Rust

Jake Goulding (Mar 27 2019 at 13:17, on Zulip):

that's something that should be documented in the memmap crate, I think

I agree, which is why I started this thread with "(This is one of my big problems with the mmap crate, which doesn't clearly explain this unsafety in the docs the last time I checked)"

Jake Goulding (Mar 27 2019 at 13:19, on Zulip):

Although I think that the crate itself is flawed, as it gives you back a type which derefs to [u8] so you can't even use a Cell, it's already too late.

Jake Goulding (Mar 27 2019 at 13:20, on Zulip):

I mean, I don't mind adding a big warning comment to the measureme crate too, should it turn out that mmap files are actually faster

The problem is that the warning is unactionable. It would be like "if you use mmap, then your program has undefined behavior, but it's faster!". What do I do with that as a user?

mw (Mar 27 2019 at 13:22, on Zulip):

the warning would be "we carefully checked that this specific use of memory mapped files does not introduce undefined behaviour. that does not mean that memory mapped files are easy to use from Rust"

mw (Mar 27 2019 at 13:22, on Zulip):

that's rather different, I'd say

Jake Goulding (Mar 27 2019 at 13:30, on Zulip):

I'm afraid that I'm saying the same thing over and over, so I apologize. I can see that I'm not going to be able to convince you, so perhaps I'm wrong. My last attempt:

this specific use of memory mapped files does not introduce undefined behaviour

There is no way we can make such a statement. The running program is completely unable to make this guarantee.

mw (Mar 27 2019 at 13:34, on Zulip):

even if it nevers casts a pointer into the memory in question to a & or &mut? i.e. if it only ever uses raw pointers to that memory?

Wesley Wiser (Mar 27 2019 at 13:34, on Zulip):

On the one hand, I agree: using mmap does seem to create the potential to violate Rust's semantics and introduce UB. On the other hand, if the scenario we're worried about is another process causing the violation to happen, I'm not sure why this is an issue and ptrace isn't. Ultimately we're at the mercy of the OS not to do something which violates the language semantics.

mw (Mar 27 2019 at 13:37, on Zulip):

i.e. if I only ever use raw pointers, there should not be any UB, right? same as with Cell

nagisa (Mar 27 2019 at 13:43, on Zulip):

That's my point though; you literally cannot guarantee that. While it's writing, an automated program can see it and start writing to it as well

You can lock the file.

nagisa (Mar 27 2019 at 13:45, on Zulip):

The API you’re looking for is flock that could maintain the invariant of 1 writer 0 readers or at least 1 writer many readers.

mw (Mar 27 2019 at 13:46, on Zulip):

then the next complaint will be that these locks are only advisory and can be ignored at will :P

nagisa (Mar 27 2019 at 13:53, on Zulip):

… Sure.

nagisa (Mar 27 2019 at 14:08, on Zulip):

As I mentioned in some issue before, you could use pwrite and pread to write and read at arbitrary offsets in a file.

nagisa (Mar 27 2019 at 14:09, on Zulip):

You would have to get rid of conveniences that mmap affords you, but functionally there should be little difference.

mw (Mar 27 2019 at 14:22, on Zulip):

these sound expensive, being system calls?

Last update: Nov 17 2019 at 07:50UTC