Stream: t-lang/wg-unsafe-code-guidelines

Topic: reading uninit memory *is* ub right?


nagisa (Apr 01 2019 at 22:16, on Zulip):

Reading unitialized memory (as in let x = Vec::with_capacity(128); x.set_len(128); println!("{:?}", x)) is UB, right? I’m not going crazy, right?

nagisa (Apr 01 2019 at 22:21, on Zulip):

/me sanity checks self with miri

Daniel Carosone (Apr 02 2019 at 00:11, on Zulip):

sure looks like it

Jake Goulding (Apr 02 2019 at 02:06, on Zulip):

It would be "fun" to see what would bee true in a world where that's somehow not UB.

Cem Karan (Apr 02 2019 at 13:42, on Zulip):

Reading unitialized memory (as in let x = Vec::with_capacity(128); x.set_len(128); println!("{:?}", x)) is UB, right? I’m not going crazy, right?

I couldn't even get it to compile without putting x.set_len() inside of an unsafe block, so this is my version:

fn main() {
    let mut x: Vec<u8> = Vec::with_capacity(128);
    unsafe {
        x.set_len(128);
    }
    println!("{:?}", x);
}

And even then, the output under either debug or release profiles is always

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

So I'd say that it is technically UB, but someone thought to use calloc() or something similar underneath it all (at least on Ubuntu 18.04.2 LTS, rustc 1.33.0 (2aa4c46cf 2019-02-28)) so the behavior is well-defined by accident at least... :thinking:

rkruppe (Apr 02 2019 at 13:51, on Zulip):

I don't think it's using calloc. It just happens to be the first use of this part of memory in that process, and linux zeroes pages before handing them out to processes.

Cem Karan (Apr 02 2019 at 13:53, on Zulip):

You're probably right about that, I haven't dug into the code to check. That said, the docs do not make any promises about initializing memory, so even if it just happens to work out that the memory is always initialized, it is still UB.

Jake Goulding (Apr 02 2019 at 13:56, on Zulip):

For example:

fn main() {
    let mut x: Vec<u8> = (0..).take(128).collect();
    x.clear();

    unsafe {
        x.set_len(128);
    }

    println!("{:?}", x);
}
Cem Karan (Apr 02 2019 at 13:57, on Zulip):

@Jake Goulding What's the output of that (too lazy to compile and test at the moment...)

Jake Goulding (Apr 02 2019 at 13:57, on Zulip):

You know about the playground, right? :wink:

Jake Goulding (Apr 02 2019 at 13:57, on Zulip):
[0, 1, 2, 3...]
Cem Karan (Apr 02 2019 at 13:58, on Zulip):

I did say I was being lazy! :stuck_out_tongue:

Jake Goulding (Apr 02 2019 at 14:00, on Zulip):

the memory is always initialized

Is this accurate terminology? I'd think that the memory isn't initialized at this point; it just happens to have some values in it from the beforetime.

Cem Karan (Apr 02 2019 at 14:01, on Zulip):

Who are quoting? If I said that, then I was dead wrong...

rkruppe (Apr 02 2019 at 14:03, on Zulip):

Just to be completely sure that we're all on the same page: that it's UB to read uninitialized memory does not simply mean that you get some unpredictable string of ones and zeros from whatever was last written there. It is capital-U Undefined Behavior which can have arbitrary and arbitrarily bad consequences, such as running rm -rf / or making demons come out of your nose.

Cem Karan (Apr 02 2019 at 14:07, on Zulip):

@rkruppe To be that level of UB, you'd have to show that you can trigger something that bad just by reading the memory (no writing). Can you? I don't know of a method of triggering something like rm -rf solely by reading memory, but I could be wrong. (Some weird trick involving mmap() or /proc? IDK...)

rkruppe (Apr 02 2019 at 14:09, on Zulip):

That's not how this works. As you said previously, even if everything appears to work fine in test programs, it's still UB because the docs says so. Because it's declared to be UB by the language, if you write and run a program that does it, Rust implementations are free to do whatever with it, including such bad and apparently contrived things.

Cem Karan (Apr 02 2019 at 14:10, on Zulip):

Good point, you're right. I was jumping ahead and thinking about practical consequences.

rkruppe (Apr 02 2019 at 14:11, on Zulip):

(That said I actually do have a program in mind that could plausibly wind up running rm -rf / even though under a naive "uninitalized memory is just an unpredictable fixed bit string" it shouldn't)

Cem Karan (Apr 02 2019 at 14:14, on Zulip):

Would that be using safe code only?

Cem Karan (Apr 02 2019 at 14:14, on Zulip):

I mean, that is starting to sound like something the secure code working group would want to look into as well...

rkruppe (Apr 02 2019 at 14:16, on Zulip):

Safe Rust can't read uninitalized memory because reading uninitialized memory is UB and safe code is designed to not have UB. You need some unsafe to make the uninitalized memory available. (Or you exploit one of the open soundness issues to do it, but that's not really the point of the exercise.)

Cem Karan (Apr 02 2019 at 14:17, on Zulip):

OK, I'm more comfortable with unsafe having this possibility. Pure safe code should be safe at all times.

rkruppe (Apr 02 2019 at 14:19, on Zulip):

Huh? If reading uninitialized memory wasn't UB, then safe code would be allowed to do it (directly or indirectly). Because all "safe" means is "no UB". Safe code isn't (and shouldn't be) a stand-in for some standard of "security" or "defensive coding" or other meanings of "safe".

Cem Karan (Apr 02 2019 at 14:19, on Zulip):

So, are you able to test out your idea and see if this is 'UB' or just 'ub'?

Cem Karan (Apr 02 2019 at 14:21, on Zulip):

I agree, 'safe' does not mean 'secure'. However, from an ergonomics point of view, it does feel like a nice set of training wheels before you hit the big-boy land of unsafe{}

rkruppe (Apr 02 2019 at 14:27, on Zulip):

So, are you able to test out your idea and see if this is 'UB' or just 'ub'?

The couple variants I tried get optimized out before they reach the point in the optimization pipeline where I would have hoped things go really awry. I was shooting for a similar approach to the C or C++ example that made the rounds a while ago where calling a function pointer that could only be null or one specific function (and which in reality would be null) would call that function instead of crashing.

rkruppe (Apr 02 2019 at 14:29, on Zulip):

I agree, 'safe' does not mean 'secure'. However, from an ergonomics point of view, it does feel like a nice set of training wheels before you hit the big-boy land of unsafe{}

That is pretty fundamentally opposed to how safe and unsafe Rust are designed and related. If something can be exposed to safe code without UB (either as primitive or by writing a safe API that uses unsafe internally) then it should be exposed, to minimize the amount of times people have to use unsafe and open themselves up to even worse than whatever "dangerous" thing we're talking about.

Cem Karan (Apr 02 2019 at 14:42, on Zulip):

That is pretty fundamentally opposed to how safe and unsafe Rust are designed and related...

I don't think that they are fundamentally opposed. I agree that 'safe' doesn't mean 'secure', or 'it is impossible to shoot yourself in the foot in safe code'; training wheels don't guarantee that you won't find a way to flip your bike, or prevent getting hit by a truck either. However, the 'no UB in safe rust' rule is a set of training wheels; the compiler won't let me use uninitialized memory, won't let me overflow buffers, etc. That makes it easier for new programmers to learn rust; just avoid using unsafe until you have a good idea of what you're doing, and then be really, really careful when using unsafe in your code.

rkruppe (Apr 02 2019 at 14:50, on Zulip):

I guess I have two issues with this angle. The other one is that it casts safe Rust as a kid's playground and unsafe Rust as the thing for Real Programmers. In reality, everyone should minimize their use of unsafe, and IMHO there is no discernible "minimum programming skill" for being allowed to write unsafe, because most conventional measures of programming skill don't correlate well with being able to avoid UB (cf. how highly skilled and experienced C programmers continue to write programs full of UB) and the detailed knowledge being necessary for writing sound unsafe code has practically no overlap with the knowledge necessary to be productive in safe Rust.

Cem Karan (Apr 02 2019 at 15:00, on Zulip):

I see and agree with your points. However, for someone that is new to rust, avoiding 'unsafe' is a pretty good first step towards being productive while you're still learning the language. That's where I'm at right now; I've been programming professionally for about 18 years now, but only just started learning rust this year. I needed to be productive within 3 days of starting to learn the language; by avoiding unsafe blocks, I got a feel for the borrow checker, and quickly got a better idea of rust's semantics while still being productive. I could have written all of my code inside of unsafe blocks right from the start, but that would have been code that was syntactically rust, but morally equivalent to C code with a lot of void pointers all over the place. Not a good use of the language.

IMHO, safe vs. unsafe is actually one of the major selling points of rust; it lets you take risks that you can't in other languages.

Jake Goulding (Apr 02 2019 at 17:01, on Zulip):

See the pictures in https://stackoverflow.com/a/51224196/155423

nagisa (Apr 02 2019 at 18:04, on Zulip):

FWIW it is possible to get uninitialized buffers in "safe" code if the unsafe is abstracted away from your code in certain libraries.

nagisa (Apr 02 2019 at 18:04, on Zulip):

/me won’t be pointing fingers

Tom Phinney (Apr 02 2019 at 18:58, on Zulip):

UB is a compiler-writer concept that is intended to circumscribe those aspects of programming where the compiler is not required to uphold the programmer's expressed intentions. In other words, compiler optimizations are permitted – one might almost say encouraged – to screw up all code that contains any amount of UB. rustc attempts to prove to itself that code is not UB, aborting compilation when it fails to complete those proofs. unsafe is simply a compiler keyword via which a programmer tells the compiler that the programmer has taken on a limited part of that proof responsibility. If the programmer lies, or simply is wrong, and UB is actually present, then any compiler guarantees about realizing the programmer's intent are null and void.

Last update: Nov 19 2019 at 18:55UTC