Stream: t-compiler

Topic: excessive disk usage


nikomatsakis (Mar 24 2020 at 19:32, on Zulip):

Has anybody looked into what exactly causes us to use so much disk space when building rustc?

> du -h -s rust-*
25G     rust-0
12G     rust-1
29G     rust-2
57G     rust-3
46G     rust-4
21G     rust-5
15G     rust-6
nikomatsakis (Mar 24 2020 at 19:33, on Zulip):

it seems a bit wild that I regularly run out of disk space on my linux desktop, even though it is quite old and the disk admittedly "only" 512GB or something

nikomatsakis (Mar 24 2020 at 19:33, on Zulip):

/me tempted to talk about the days of floppy disks and "kids today not appreciated what they have"

Jonas Schievink (Mar 24 2020 at 19:34, on Zulip):

Is that with debuginfo enabled? That tends to take a lot of space

nikomatsakis (Mar 24 2020 at 19:34, on Zulip):

I don't think so, but I should check

nikomatsakis (Mar 24 2020 at 19:34, on Zulip):

in some cases it is doing an incremental build

nikomatsakis (Mar 24 2020 at 19:34, on Zulip):

it might be debuginfo-lines or something

nikomatsakis (Mar 24 2020 at 19:35, on Zulip):

ok I guess debuginfo-level=1

Jonas Schievink (Mar 24 2020 at 19:36, on Zulip):

Ah, line tables only still generates heaps of type descriptions according to @eddyb

nikomatsakis (Mar 24 2020 at 19:36, on Zulip):

hmm ok

nikomatsakis (Mar 24 2020 at 19:36, on Zulip):

interesting

Jonas Schievink (Mar 24 2020 at 19:36, on Zulip):

https://github.com/rust-lang/rust/issues/69074

nikomatsakis (Mar 24 2020 at 19:37, on Zulip):

I guess I could do some experiments and turn off debuginfo

nikomatsakis (Mar 24 2020 at 19:37, on Zulip):

though that'd be pretty useless

nikomatsakis (Mar 24 2020 at 19:37, on Zulip):

from an actual "debug the compiler" POV

nikomatsakis (Mar 24 2020 at 19:38, on Zulip):

nikomatsakis said:

I guess I could do some experiments and turn off debuginfo

in order to verify the hypothesis, I mean

davidtwco (Mar 24 2020 at 19:39, on Zulip):

Most of the space IMO is stuff from old builds - if I delete a directory and re-build, it isn't that large. It's when I've been rebuilding in a directory for a while that there's a build-up.

nikomatsakis (Mar 24 2020 at 19:44, on Zulip):

I was wondering about that, @davidtwco

nikomatsakis (Mar 24 2020 at 19:44, on Zulip):

if we were leaking space

bjorn3 (Mar 24 2020 at 19:46, on Zulip):

Cargo doesn't cleanup artifacts compiled using an older compiler, so every time there is a bootstrap bump, disk usage will increase if rustbuild doesn't remove all target dirs.

simulacrum (Mar 24 2020 at 19:47, on Zulip):

(and we indeed do not)

simulacrum (Mar 24 2020 at 19:48, on Zulip):

mostly because people complain when we delete their caches, and we don't know if it's really old (really same reason why cargo doesn't in some sense)

eddyb (Mar 24 2020 at 19:49, on Zulip):

@Jonas Schievink @nikomatsakis FWIW #69080 landed

Jonas Schievink (Mar 24 2020 at 19:50, on Zulip):

ah, nice

eddyb (Mar 24 2020 at 19:50, on Zulip):

went from just over 1GiB (1019MiB) down to 402MiB

eddyb (Mar 24 2020 at 19:50, on Zulip):

(this is one file specifically :P)

nikomatsakis (Mar 25 2020 at 20:01, on Zulip):

simulacrum said:

mostly because people complain when we delete their caches, and we don't know if it's really old (really same reason why cargo doesn't in some sense)

I'm unconvinced by this argument :P

nikomatsakis (Mar 25 2020 at 20:02, on Zulip):

is the idea that people might rebase their working directory to some older branch, basically?

nikomatsakis (Mar 25 2020 at 20:02, on Zulip):

it'd be nice to at least have an option in config.toml

davidtwco (Mar 25 2020 at 20:05, on Zulip):

nikomatsakis said:

it'd be nice to at least have an option in config.toml

or a ./x.py give-me-my-disk-space-back command that could be run when it's a issue.

simulacrum (Mar 25 2020 at 20:07, on Zulip):

@nikomatsakis I'm also unconvinced -- but I would strongly prefer that the disk space clearing on compiler change was implemented in cargo, not rustbuild

simulacrum (Mar 25 2020 at 20:07, on Zulip):

i.e. if we feel that it should have different behavior, then that seems like a change for cargo

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

it seems equally important there

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

I agree

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

like, if we expect people to keep up with our releases, we're ..

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

it feels like we can just have an LRU cache

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

something like "keep the last N toolchains"

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

and people can configure N

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

maybe it is 3 by default or something, idk

simulacrum (Mar 25 2020 at 20:08, on Zulip):

Sure. I think most people would be happy if we even just said N=1

nikomatsakis (Mar 25 2020 at 20:08, on Zulip):

yes, I suspect so

nikomatsakis (Mar 25 2020 at 20:09, on Zulip):

it's got to be the minority that are pinging around between toolchains

nikomatsakis (Mar 25 2020 at 20:09, on Zulip):

but I can imagine that people want stable/nightly

nikomatsakis (Mar 25 2020 at 20:09, on Zulip):

at minimum

simulacrum (Mar 25 2020 at 20:09, on Zulip):

(or perhaps N=1 but within a "channel" i.e. you could share beta/stable/nightly)

nikomatsakis (Mar 25 2020 at 20:09, on Zulip):

so maybe N=2 :)

nikomatsakis (Mar 25 2020 at 20:09, on Zulip):

simulacrum said:

(or perhaps N=1 but within a "channel" i.e. you could share beta/stable/nightly)

yeah, that

centril (Mar 25 2020 at 20:16, on Zulip):

N=2 seems eminently reasonable

Josh Triplett (Mar 27 2020 at 16:12, on Zulip):

How about "keep everything that corresponds to a toolchain still available by rustup or system path"?

Josh Triplett (Mar 27 2020 at 16:12, on Zulip):

If you no longer have the toolchain you're unlikely to want the cache.

simulacrum (Mar 27 2020 at 17:29, on Zulip):

I imagine that would be quite costly for cargo -- you'd need to run all possible toolchains on each "run"

simulacrum (Mar 27 2020 at 17:29, on Zulip):

otherwise e.g. nightly is never going to get cleared out because it's always at the same on-disk location

pnkfelix (Mar 27 2020 at 18:07, on Zulip):

simulacrum said:

I imagine that would be quite costly for cargo -- you'd need to run all possible toolchains on each "run"

by "run all possible toolchains", do you mean invoke them with --version to observe what version of the compiler they correspond to (which sounds lightweight to me, at least for cargo's purposes)?

pnkfelix (Mar 27 2020 at 18:07, on Zulip):

or do you mean actually feed the crate source into each of the toolchain, which does sound absurdly expensive.

simulacrum (Mar 27 2020 at 18:25, on Zulip):

no, I mean just --version

simulacrum (Mar 27 2020 at 18:25, on Zulip):

that takes around 100ms -- very noticeable on no-op builds

simulacrum (Mar 27 2020 at 18:25, on Zulip):

cargo does a lot to try to avoid that

simulacrum (Mar 27 2020 at 18:28, on Zulip):

well okay locally I'm seeing 25ms, but that's still quite costly

pnkfelix (Mar 27 2020 at 18:32, on Zulip):

/me wonders if rustup could keep this info in a separate file, locally

pnkfelix (Mar 27 2020 at 18:33, on Zulip):

then you'd just pay to read from one file, rather than invoking a bunch of commands.

pnkfelix (Mar 27 2020 at 18:33, on Zulip):

(but it would also be an extra source of complexity and potential inconsistency within rustup...)

simulacrum (Mar 27 2020 at 18:47, on Zulip):

yeah, I guess, though it wouldn't help in rustbuild (where we constantly swap out e.g. stage1 compilers)

simulacrum (Mar 27 2020 at 18:48, on Zulip):

and relatively frequently -- every 6 weeks at least -- get a new bootstrap compiler

mark-i-m (Mar 27 2020 at 20:48, on Zulip):

Personally, I would be fine with some sort of cargo clean --cache that just does whatever the expensive thing and agressively clears out things it doesn't think we will need later

Eric Huss (Mar 28 2020 at 17:45, on Zulip):

We have an issue for this, and I sketched a solution here: https://github.com/rust-lang/cargo/issues/5026#issuecomment-444967785. The idea is to use a single hash for the "nightly" channel (and "beta"), so there wouldn't need to be any cleanup when you switch nightly versions, and Cargo would keep the full version in the fingerprint so it would get recompiled when you get a new nightly. I was concerned mostly that this would disrupt some workflows, but I can't think of any specific problems (not sure if people swap between different nightlies). I could try it out if you'd like (maybe a -Z flag)? I also wasn't sure if rustc's incremental files somehow know which version they are for, or if that directory would also grow without bounds. (maybe one of the smart people here can answer that)

simulacrum (Mar 28 2020 at 21:43, on Zulip):

@Eric Huss IIRC, I've been historically told that incremental is not guaranteed to work, but should be able to auto-detect itself being stale. I imagine if it's just a -Z flag then I think we can shake out problems as we go.

nikomatsakis (Mar 30 2020 at 22:07, on Zulip):

I believe the toolchain info is in the hash

nikomatsakis (Mar 30 2020 at 22:07, on Zulip):

but if not, we ought to be able to fix that

Russell Cohen (Mar 30 2020 at 23:41, on Zulip):

off topic ish, but as newish person contributing to rustc, it wasn't super fun when I ran out of disk space and my computer stopped working. Incremental builds seemed to massively magnify the issue -- it's called out in forge, but I at least didn't expect "a lot of disk space" to be >35GB

Russell Cohen (Mar 30 2020 at 23:42, on Zulip):

could be worth calling it out louder in the docs, or even having x.py monitor how much disk space you've used and warn / abort if you're running out

eddyb (Mar 31 2020 at 04:46, on Zulip):

on Linux at least, my desktop environment does that, and I think recent Windows versions have started doing it too

eddyb (Mar 31 2020 at 04:46, on Zulip):

it's worse on servers, and I have had that happen

eddyb (Mar 31 2020 at 04:48, on Zulip):

my non-incremental build dirs are 4GB

eddyb (Mar 31 2020 at 04:49, on Zulip):

my incremental ones... lol the worst one is 76GB

eddyb (Mar 31 2020 at 04:50, on Zulip):

oh wait no there's another difference: the non-incremental ones also don't have debuginfo or debug-assertions (they're basically nightly-like)

eddyb (Mar 31 2020 at 04:52, on Zulip):

hmm debuginfo+debug-assertions build seems to have 20GB in stage0-rustc outside of the incremental dir

eddyb (Mar 31 2020 at 04:53, on Zulip):

it's gonna be a while before https://github.com/rust-lang/rust/pull/69080 reaches beta

eddyb (Mar 31 2020 at 04:53, on Zulip):

kind of sad, I should've gotten it merged when I opened it

eddyb (Mar 31 2020 at 04:59, on Zulip):

hmm looking in the incremental dir, I have up to 4 copies for some crates

eddyb (Mar 31 2020 at 05:02, on Zulip):

@nikomatsakis oh this is that silly thing where incremental keeps a duplicate of the .o files in the rlib...

eddyb (Mar 31 2020 at 05:02, on Zulip):

we should've stopped doing that years ago

eddyb (Mar 31 2020 at 05:03, on Zulip):

and here I was thinking there was 1GB of librustc_middle and 1GB of librustc_mir cached query data..

eddyb (Mar 31 2020 at 05:05, on Zulip):

134M rust-1/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/incremental/rustc_middle-vbnouhhfsdso/s-fm1o95587b-13ujuvc-tq0tgxcz60na/dep-graph.bin
116M rust-1/build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/incremental/rustc_middle-vbnouhhfsdso/s-fm1o95587b-13ujuvc-tq0tgxcz60na/query-cache.bin

eddyb (Mar 31 2020 at 05:06, on Zulip):

so only about 1/5 of the directory is actually not duplicated

eddyb (Mar 31 2020 at 05:10, on Zulip):

across the entire incremental dir, which has up to 4 stale copies by the looks of it, only 4.5G of 29G are .bin files

eddyb (Mar 31 2020 at 05:15, on Zulip):

there's also https://github.com/rust-lang/rust/issues/66961

pnkfelix (Mar 31 2020 at 13:32, on Zulip):

eddyb said:

it's gonna be a while before https://github.com/rust-lang/rust/pull/69080 reaches beta

maybe we should consider backporting it...

eddyb (Mar 31 2020 at 14:00, on Zulip):

feel free to nominate it, I guess

mark-i-m (Apr 01 2020 at 16:43, on Zulip):

Just ran into this again, btw:

86G     ./rust
91G     ./rust2
75G     ./rust3
$ du -hsc ./build/*
7.2G    ./build/bootstrap
2.0G    ./build/cache
6.4M    ./build/tmp
188K    ./build/tmp-dry-run
812K    ./build/tmp-rustbuild-tests
66G     ./build/x86_64-unknown-linux-gnu
75G     total
$ du -hsc ./build/x86_64-unknown-linux-gnu/*
4.0K    ./build/x86_64-unknown-linux-gnu/compiler-doc
225M    ./build/x86_64-unknown-linux-gnu/crate-docs
112M    ./build/x86_64-unknown-linux-gnu/doc
1.4G    ./build/x86_64-unknown-linux-gnu/llvm
1.6M    ./build/x86_64-unknown-linux-gnu/md-doc
44K     ./build/x86_64-unknown-linux-gnu/native
377M    ./build/x86_64-unknown-linux-gnu/stage0
1.9G    ./build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools
12G     ./build/x86_64-unknown-linux-gnu/stage0-rustc
214M    ./build/x86_64-unknown-linux-gnu/stage0-std
145M    ./build/x86_64-unknown-linux-gnu/stage0-sysroot
2.3G    ./build/x86_64-unknown-linux-gnu/stage0-tools
4.0K    ./build/x86_64-unknown-linux-gnu/stage0-tools-bin
5.2G    ./build/x86_64-unknown-linux-gnu/stage1
29G     ./build/x86_64-unknown-linux-gnu/stage1-rustc
70M     ./build/x86_64-unknown-linux-gnu/stage1-std
2.0G    ./build/x86_64-unknown-linux-gnu/stage1-tools
145M    ./build/x86_64-unknown-linux-gnu/stage2
61M     ./build/x86_64-unknown-linux-gnu/stage2-std
600M    ./build/x86_64-unknown-linux-gnu/stage2-tools
4.0K    ./build/x86_64-unknown-linux-gnu/stage2-tools-bin
11G     ./build/x86_64-unknown-linux-gnu/test
66G     total
mark-i-m (Apr 01 2020 at 16:43, on Zulip):

I'm glad to provide more data if it's helpful

eddyb (Apr 01 2020 at 16:43, on Zulip):

that's similar to what I'm seeing

eddyb (Apr 01 2020 at 16:44, on Zulip):

if you nuke non-LLVM dirs you can recover some of the space usage

eddyb (Apr 01 2020 at 16:44, on Zulip):

I usually use rm -rf build/*/stage* for this

eddyb (Apr 05 2020 at 19:39, on Zulip):

almost forgot about the duplication between incremental and regular artifacts, opened #70823 just now

Eric Huss (Apr 05 2020 at 21:30, on Zulip):

I posted https://github.com/rust-lang/cargo/pull/8073 to use the same filenames between nightly versions. From my tests, I think it should be safe, though it can always be reverted or put behind a config option if it is a problem. Ofc this won't help rustc development until the next release.

eddyb (Apr 05 2020 at 21:32, on Zulip):

@Eric Huss will Cargo still rebuild artifacts correctly?

eddyb (Apr 05 2020 at 21:32, on Zulip):

i.e. does it still use the full version elsewhere?

Eric Huss (Apr 05 2020 at 21:32, on Zulip):

Yes, the full version is in the fingerprint.

eddyb (Apr 05 2020 at 21:32, on Zulip):

does your PR passes the same -C metadata for all nightlies?

Eric Huss (Apr 05 2020 at 21:32, on Zulip):

yes

eddyb (Apr 05 2020 at 21:33, on Zulip):

because uhhh this might fix a perf nightmare in a simpler way, no -Z build-std hacks

Eric Huss (Apr 05 2020 at 21:33, on Zulip):

what is that?

eddyb (Apr 05 2020 at 21:33, on Zulip):

@Eric Huss https://github.com/rust-lang/rust/issues/69060#issuecomment-604928032

eddyb (Apr 05 2020 at 21:33, on Zulip):

the compiler is non-deterministic in -C metadata content

eddyb (Apr 05 2020 at 21:34, on Zulip):

because crate hashes are sort of the "roots" of all hashes and they are used to sort things in incrementally-stable ways

eddyb (Apr 05 2020 at 21:35, on Zulip):

so when we compare two compilers, the simple fact that they accurately report their versions is enough to cause, through Cargo, up to 3% or so noise on syn-opt, and a bit less on other crates

eddyb (Apr 05 2020 at 21:35, on Zulip):

results are much more stable if the compiler is built with the dev instead of nightly channel

Eric Huss (Apr 05 2020 at 21:41, on Zulip):

Hm. Well let me know if it causes a problem. I'm not sure if it is relevant, but I also noticed that the incremental cache fingerprint includes the full release version (with git hash). I'm not sure if that's relevant to what you're talking about, but could be part of why -dev works better (because it doesn't have a git hash).

eddyb (Apr 05 2020 at 21:42, on Zulip):

@Eric Huss nah we know it's -C metadata

eddyb (Apr 05 2020 at 21:43, on Zulip):

without it there's no change in hashes inside the compiler and no order differences

eddyb (Apr 05 2020 at 21:45, on Zulip):

the version the compiler knows doesn't flow into any hashes AFAIK, it's just used for sanity checks

eddyb (Apr 05 2020 at 21:50, on Zulip):

@Eric Huss anyway whenever we start shipping a Cargo with your changes, perf.rust-lang.org should stop being so noisy

Last update: Jun 04 2020 at 17:05UTC