Stream: t-compiler/shrinkmem-sprint

Topic: rustc_middle mac os x weirdness


view this post on Zulip pnkfelix (Feb 22 2021 at 20:36):

While experimenting with reporting maxrss acquired via getrusage, I have found something odd

view this post on Zulip pnkfelix (Feb 22 2021 at 20:37):

I’m still trying to confirm this, but it seems like on my mac, I am consistently observing rustc_middle taking about 2.5 GB maxrss

view this post on Zulip Joshua Nelson (Feb 22 2021 at 20:39):

is that lower or higher than you expect?

view this post on Zulip pnkfelix (Feb 22 2021 at 20:40):

lower

view this post on Zulip pnkfelix (Feb 22 2021 at 20:40):

let me go check what i was seeing on linux

view this post on Zulip pnkfelix (Feb 22 2021 at 20:40):

but I thought on Linux it was on the order of 6 GB

view this post on Zulip tm (Feb 22 2021 at 20:42):

it could be a consequence of #70951 :-).

view this post on Zulip pnkfelix (Feb 22 2021 at 20:46):

oh, you’re right; I wasn’t checking if I was comparing the same builds

view this post on Zulip pnkfelix (Feb 22 2021 at 20:55):

(but, no; I think both of the builds I’m evaluated are at the same point in the git history, and both predate PR #70951 landing. I think. Checking again now.)

view this post on Zulip Tyson Nottingham (Feb 22 2021 at 21:40):

The memory usage is going to depend on the number of CPUs on the system, so if they're different, that could account for it. Also, the allocators will behave differently.

Btw, I found that changes over the last ~8 days have reduced max-RSS when compiling the entire rustc by around 750 MB on my 8 CPU system. I'm sure #70951 accounts for all or most of that.

view this post on Zulip pnkfelix (Feb 22 2021 at 21:49):

Tyson Nottingham said:

The memory usage is going to depend on the number of CPUs on the system, so if they're different, that could account for it.

is this true even if I pass -j1 to x.py, or will passing that act as a control for that variable?

view this post on Zulip Tyson Nottingham (Feb 22 2021 at 21:51):

It's true even with -j1, at least until #82127 is in the bootstrapping compiler.

view this post on Zulip simulacrum (Feb 22 2021 at 21:52):

I'm not sure about default allocators, but I recall that eddyb ran into a jemalloc behavior where there's a background thread that every 30 seconds cleans up memory or something like that, which meant that if your benchmark was ~30 seconds long (under or over depending on noise easily) you could see massively different instruction counts. I imagine it'd have similar effects on memory.

view this post on Zulip simulacrum (Feb 22 2021 at 21:53):

Tyson Nottingham said:

It's true even with -j1, at least until #82127 is in the bootstrapping compiler.

I find this surprising -- I would expect -j1 to isolate you to one CPU (aside from allocator shenanigans like the one I mention), so the number of CPUs shouldn't matter

view this post on Zulip simulacrum (Feb 22 2021 at 21:53):

comparing macOS's default allocator to glibc though seems like it's going to have different behavior though no matter what

view this post on Zulip simulacrum (Feb 22 2021 at 21:53):

(or to jemalloc)

view this post on Zulip pnkfelix (Feb 22 2021 at 21:54):

different behavior, I totally expect. but going from 2.5 GB to 6 GB is more extreme than I would have expected.

view this post on Zulip pnkfelix (Feb 22 2021 at 21:54):

(Caveat: I still need to verify this. I’m doing something else at moment but hoping to come back to this in a sec.)

view this post on Zulip Tyson Nottingham (Feb 22 2021 at 21:58):

simulacrum said:

I find this surprising -- I would expect -j1 to isolate you to one CPU (aside from allocator shenanigans like the one I mention), so the number of CPUs shouldn't matter

It does isolate you to one CPU, but the problem is that the codegen scheduler still thinks we need to codegen a bunch of CGUs to LLVM modules ahead of time in order to meet demand of number-of-CPUs workers, basically. So it has more LLVM modules in memory at once than it needs to with -j1. That's one of the things that #82127 resolves.

view this post on Zulip simulacrum (Feb 22 2021 at 21:59):

Ah, right, and since we use time as a metric for how to schedule, that can definitely vary between cpus

view this post on Zulip pnkfelix (Feb 22 2021 at 22:06):

pnkfelix said:

(Caveat: I still need to verify this. I’m doing something else at moment but hoping to come back to this in a sec.)

hmm, apparently what I rememered as 6 GB on Linux is actually 3.4 GB … so, not as severe as what I had thought… :shrug:

view this post on Zulip pnkfelix (Feb 22 2021 at 22:11):

(argh wait let me try again with a clean build… What’s wrong with me…)

view this post on Zulip pnkfelix (Feb 23 2021 at 15:52):

hmm. So some of my clean Linux runs yielded maxrss 3.4 GB for rustc_middle, but one I just did now (starting from the same point) yielded 7.2 GB. Argh! I forgot, this more recent run wasn’t passing -j1! Passing -j1 brings it back down to 3.4 GB

view this post on Zulip pnkfelix (Feb 23 2021 at 15:52):

so that’s not great, in terms of trying to rely on this data

view this post on Zulip pnkfelix (Feb 23 2021 at 15:53):

I suppose a natural next step would be to use a different tool to try to determine what the truth is

view this post on Zulip pnkfelix (Feb 23 2021 at 15:53):

but I’m also wondering: Do I attempt to land this code at this point? The main data I want to get from it is the maxrss reading, but I am now worried that the values are too noisy to be of real use. (Update: less worried after remembered the -j1 issue. Looking into it more…)


Last updated: Oct 21 2021 at 21:20 UTC