Stream: t-compiler

Topic: internalize_symbols and incremental artifacts #59535


pnkfelix (Nov 12 2019 at 15:40, on Zulip):

hey @mw and/or @Alex Crichton , I wanted to bounce some thoughts off of you two

pnkfelix (Nov 12 2019 at 15:40, on Zulip):

in particular, I wanted to talk about a theory I put forth here.

pnkfelix (Nov 12 2019 at 15:42, on Zulip):

We currently make decisions about whether to change a symbol's Linkage from External to Internal based on the set of accessors we currently observe. However, I theorize that old incremental codegen artifacts may be relying on a symbol's previous classification as External

pnkfelix (Nov 12 2019 at 15:43, on Zulip):

so I was wondering: can/should we have a rule that if a symbol was considered External in any previous compile, then an incremental compilation is forced to keep it External?

pnkfelix (Nov 12 2019 at 16:05, on Zulip):

(this is not the only possible solution to the theorized problem. For example, instead of constraining which symbols can be marked Internal based on previous compiles, we could instead force the codegen artifacts to be recompiled rather than reused; but doing this well would require determiniing which codegen artifacts depend on the symbols previously marked as External, and I do not know how hard that is to implement.)

Alex Crichton (Nov 12 2019 at 16:46, on Zulip):

mw will probably have more to say about this than I, but at least from my perspective I would say that artifacts from a previous compile should only ever accelerate future compiles, never actually change the future compile

Alex Crichton (Nov 12 2019 at 16:46, on Zulip):

so that way no matter what your cache looks like cargo build always produces the same thing

Alex Crichton (Nov 12 2019 at 16:46, on Zulip):

so I'd probably say that this should lean on the side of recompiling more rather than keeping something External

Alex Crichton (Nov 12 2019 at 16:46, on Zulip):

although if that's too onerous, then we could also perhaps start exporting more symbols in incremental mode

Zoxc (Nov 13 2019 at 01:55, on Zulip):

However, I theorize that old incremental codegen artifacts may be relying on a symbol's previous classification as External

Old artifacts must not be used if they are not the same as the artifacts produced by the current compilation session, so what the previous classification was should not matter. In theory at least, there might be bugs in the compiler.

pnkfelix (Nov 13 2019 at 04:12, on Zulip):

my current inclination is to try turning off the internalize_symbols optimization if we have incremental compilation turned on...

pnkfelix (Nov 13 2019 at 04:14, on Zulip):

(Though I worry that may cause too great a regression in code quality, especially code size...)

pnkfelix (Nov 13 2019 at 04:17, on Zulip):

One question related to this: Am I right that LLVM could be inlining a function that comes from a different codegen unit in the current crate? To be honest that part of the story does not match my mental model of how LLVM optimization works.

pnkfelix (Nov 13 2019 at 04:21, on Zulip):

Hmm: is there some #[rustc_attribute] I can use to force items to be assigned to particular codegen units? That would help me write a test case that would allow exploring the behavior here without relying on the oddities of this particular test.

mw (Nov 13 2019 at 10:57, on Zulip):

/me is busy at the moment. Will take a closer look on Friday.

pnkfelix (Nov 13 2019 at 11:22, on Zulip):

One question related to this: Am I right that LLVM could be inlining a function that comes from a different codegen unit in the current crate? To be honest that part of the story does not match my mental model of how LLVM optimization works.

further update on this detail: I think the inlining here is occurring during LTO

pnkfelix (Nov 13 2019 at 11:54, on Zulip):

Hmm: is there some #[rustc_attribute] I can use to force items to be assigned to particular codegen units? That would help me write a test case that would allow exploring the behavior here without relying on the oddities of this particular test.

(maybe getting this effect is supposed to be as simple as choosing the right set of mod items. still looking; but I think an explicit attribute could potentially be a good thing.)

pnkfelix (Nov 14 2019 at 09:35, on Zulip):

I left more notes in the bug (#59535). I now better understand what's happening. Part of the problem is that we are reusing the post-ThinLTO object file for a codegen-unit (and that's what holds the result of inlining fn B from a different codegen-unit); so fn B's call to fn A has been inlined into that object file. But independently, due to changes elsewhere in the source code, we reclassified fn A as internal, and subsequently optimized away the definition of fn A.

pnkfelix (Nov 14 2019 at 09:38, on Zulip):

I am looking over lto.rs now, and also trying to understand the LLVM LTO docs. One oddity is that the erroneously reused codegen unit claims it does not import from any modules; but surely the call to the other function should be considered an import?

mw (Nov 15 2019 at 16:13, on Zulip):

Argh, sorry I didn't get to this before the weekend ...

mw (Nov 15 2019 at 16:16, on Zulip):

I think there is a small bug somewhere here, as opposed to a conceptual problems because otherwise we'd see much more of this

mw (Nov 15 2019 at 16:17, on Zulip):

and I remember that I ran into problems immediately when not handling symbol internalization properly while initially implementing incremental ThinLTO

pnkfelix (Nov 15 2019 at 16:18, on Zulip):

yes I am working my way gradually towards the camp of "small bug somewhere" rather than "conceptual problem"

pnkfelix (Nov 15 2019 at 16:18, on Zulip):

it just took me a while to put all the pieces together

pnkfelix (Nov 15 2019 at 16:19, on Zulip):

at this point I'm not sure whether the small bug is on our end or within LLVM.

mw (Nov 15 2019 at 16:19, on Zulip):

yeah, it's non trivial and should be better documented

pnkfelix (Nov 15 2019 at 16:19, on Zulip):

I was pretty sure for a while it was our fault, but then when I was trying to walk through some of the execution, the behavior I saw from LLVM was pretty strange

mw (Nov 15 2019 at 16:19, on Zulip):

I'll reserved a 2h time slot for this on Tuesday

pnkfelix (Nov 15 2019 at 16:19, on Zulip):

anyway don't worry about delaying this

mw (Nov 15 2019 at 16:20, on Zulip):

:+1:

pnkfelix (Nov 15 2019 at 16:20, on Zulip):

I spent a long time on it because I know how painful incremetnal bugs are, but also, we have so many that we don't have reproducible test cases for

pnkfelix (Nov 15 2019 at 16:20, on Zulip):

so the fact that we could actually reproduce this one got me very excited.

mw (Nov 15 2019 at 16:20, on Zulip):

:D

pnkfelix (Nov 15 2019 at 16:20, on Zulip):

perhaps over enthusiastically so

mw (Nov 19 2019 at 12:20, on Zulip):

this is incredibly weird :)

mw (Nov 19 2019 at 12:20, on Zulip):

I've been able to reproduce with doit2.sh from https://github.com/pnkfelix/rust-issue-59535

mw (Nov 19 2019 at 12:21, on Zulip):

but I have no clue what's going on yet

mw (Nov 19 2019 at 12:21, on Zulip):

it looks like this has nothing to do with ThinLTO

mw (Nov 19 2019 at 12:29, on Zulip):

the volatile cgu for rubble::ble declares <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Display>::fmt in the initial version

mw (Nov 19 2019 at 12:30, on Zulip):

in the next version it should declare <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Debug>::fmt but the CGU is exactly the same version as initially

mw (Nov 19 2019 at 12:32, on Zulip):

the internalizer correctly makes the symbol for <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Display>::fmt internal, but it looks like the volatile object file for rubble::ble is not recompiled, although the MIR it is generated from should have changed.

mw (Nov 19 2019 at 12:33, on Zulip):

I'm trying to debug further but for that I need a local build of rustc that can handle the thumbv7em-none-eabi target

mw (Nov 19 2019 at 12:34, on Zulip):

how hard can that be :P

pnkfelix (Nov 19 2019 at 12:42, on Zulip):

Oh I should have warned you about that target dependence

pnkfelix (Nov 19 2019 at 12:42, on Zulip):

what did I end up having to do there ...

pnkfelix (Nov 19 2019 at 12:43, on Zulip):

I'm curious you say the MIR should have changed. I could have sworn that I validated that the inlining in question was happening during a LTO optimization pass

pnkfelix (Nov 19 2019 at 12:44, on Zulip):

regarding the target dependence, I know that I installed thumbv7em-none-eabi via rustup, but that's not the local build

mw (Nov 19 2019 at 12:45, on Zulip):

if I compile v2 with an empty cache, the volatile cgu does not reference the function in question anymore

mw (Nov 19 2019 at 12:45, on Zulip):

it references a different function instead

mw (Nov 19 2019 at 12:45, on Zulip):

and everything works

mw (Nov 19 2019 at 12:46, on Zulip):

but let me take another look...

pnkfelix (Nov 19 2019 at 12:47, on Zulip):

okay, this is how I built: time python /home/pnkfelix/Dev/Mozilla/rust.git/x.py dist --stage 2 --target thumbv7em-none-eabi

pnkfelix (Nov 19 2019 at 12:47, on Zulip):

though that might need to be preceded by a python x.py build --stage 2 --target thumbv7em-none-eabi

pnkfelix (Nov 19 2019 at 12:48, on Zulip):

(for some reason I vaguely recall being unhappy about my attempts to go directly to dist)

mw (Nov 19 2019 at 12:49, on Zulip):

I'm getting thread 'main' panicked at 'All the *-none-* and nvptx* targets are no-std targets', src/bootstrap/sanity.rs:186:17 with that

mw (Nov 19 2019 at 12:50, on Zulip):

regarding the obj file, already the "no-opt" version of the LLVM IR should look different, so this is before ThinLTO, unless I'm overlooking something

pnkfelix (Nov 19 2019 at 12:50, on Zulip):

hmm.

pnkfelix (Nov 19 2019 at 12:52, on Zulip):

when you say "the volatile cgu", is that referring to mks4f5lxe2i4sqc ? (I am assuming the names are stable between your platform and my own, but perhaps that is an invalid assumption)

mw (Nov 19 2019 at 12:53, on Zulip):

I'm compiling with -Zhuman-readable-cgu-names, which gives me names corresponding to the module path

pnkfelix (Nov 19 2019 at 12:53, on Zulip):

ah hmm

mw (Nov 19 2019 at 12:53, on Zulip):

e.g. rubble.rubble.7rcbfp3g-ble.volatile.rcgu.bc

pnkfelix (Nov 19 2019 at 12:53, on Zulip):

didn't know about that flag

mw (Nov 19 2019 at 12:54, on Zulip):

the above can be decoded as rubble[7rcbfp3g]::ble

mw (Nov 19 2019 at 12:54, on Zulip):

where the version with volatile will contain the code corresponding to generic instances

mw (Nov 19 2019 at 12:55, on Zulip):

and the one without volatile will contain the code for non-generic stuff from that module

mw (Nov 19 2019 at 13:04, on Zulip):

I don't understand the no_std logic in x.py

mw (Nov 19 2019 at 13:10, on Zulip):

@simulacrum or @Pietro Albini, do you happen to know how I can locally build a rustc that can target thumbv7em-none-eabi?

simulacrum (Nov 19 2019 at 13:10, on Zulip):

should be able to x.py build src/libstd --target thumbv7em-none-eabi

simulacrum (Nov 19 2019 at 13:10, on Zulip):

unless you need C compiler or something

mw (Nov 19 2019 at 13:10, on Zulip):

x.py dist --stage 2 --target thumbv7em-none-eabi gives me thread 'main' panicked at 'All the *-none-* and nvptx* targets are no-std targets', src/bootstrap/sanity.rs:186:17

mw (Nov 19 2019 at 13:11, on Zulip):

maybe if I delete my config.toml?

pnkfelix (Nov 19 2019 at 13:11, on Zulip):

should be able to x.py build src/libstd --target thumbv7em-none-eabi

I know this was not sufficient for me

simulacrum (Nov 19 2019 at 13:11, on Zulip):

you can also wait for a while and just use dist-various-1

simulacrum (Nov 19 2019 at 13:11, on Zulip):

that'll definitely work

pnkfelix (Nov 19 2019 at 13:11, on Zulip):

I definitely had to do something beyond x.py build to get a libstd built (and put into a working location) for the target

simulacrum (Nov 19 2019 at 13:11, on Zulip):

./src/ci/docker/run.sh dist-various-1

mw (Nov 19 2019 at 13:13, on Zulip):

Re-running x.py build src/libstd --target thumbv7em-none-eabi without a config.toml now ...

mw (Nov 19 2019 at 13:13, on Zulip):

(I think) I have the correct GCC arm toolchain installed already

simulacrum (Nov 19 2019 at 13:13, on Zulip):

we don't seem to do anything special in that docker container so I would expect that to work

simulacrum (Nov 19 2019 at 13:14, on Zulip):

https://github.com/rust-lang/rust/blob/a0d40f8bdfcc3c28355467973f97fd4c45ac5876/src/ci/docker/dist-various-1/Dockerfile#L170

pnkfelix (Nov 19 2019 at 13:16, on Zulip):

I'll try a clean build with x.py build src/libstd --target thumbv7em-none-eabi to try to get a better record of what went wrong there. I had assume it was user error back when I found that to be insufficient.

mw (Nov 19 2019 at 13:18, on Zulip):

/me is still waiting for build ...

mw (Nov 19 2019 at 13:18, on Zulip):

(I'm glad I have a rather snappy desktop machine available)

mw (Nov 19 2019 at 13:20, on Zulip):

OK, it failed with something different

mw (Nov 19 2019 at 13:20, on Zulip):

7 mins for a stage 2 build is not bad though!

mw (Nov 19 2019 at 13:21, on Zulip):
cargo:warning=/xoxo/3-rust/src/llvm-project/compiler-rt/lib/builtins/int_util.c:57:10: fatal error: stdlib.h: No such file or directory
cargo:warning=   57 | #include <stdlib.h>
cargo:warning=      |          ^~~~~~~~~~
cargo:warning=compilation terminated.
simulacrum (Nov 19 2019 at 13:25, on Zulip):

hm that is ... unexpected

simulacrum (Nov 19 2019 at 13:25, on Zulip):

dist-various-1 would be my recommendation (you can comment a bunch in the relevant dockerfile to get it faster)

mw (Nov 19 2019 at 13:29, on Zulip):

I had to install arm-none-eabi-newlib on my system, now it works

mw (Nov 19 2019 at 13:30, on Zulip):

why would the GCC package not include that? weird

mw (Nov 19 2019 at 13:30, on Zulip):

now I need to get rust-lld working :(

pnkfelix (Nov 19 2019 at 13:32, on Zulip):

Is that not just a rustup thing?

pnkfelix (Nov 19 2019 at 13:32, on Zulip):

you are really making wish I had kept a full transcript of my full debugging session...

mw (Nov 19 2019 at 13:41, on Zulip):

using rust-lld for the rustup's nightly now fails compiling the support libraries... frustrating

mw (Nov 19 2019 at 13:51, on Zulip):

now my reproduction script seems broken even with nightly :(

mw (Nov 19 2019 at 13:51, on Zulip):

(btw, thanks for you help, @simulacrum!)

mw (Nov 19 2019 at 14:07, on Zulip):

Hm, so the thing that changes is <&rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Debug>::fmt. Is this maybe something completely compiler generated for which we never instantiate any MIR?

mw (Nov 19 2019 at 14:07, on Zulip):
; Function Attrs: nounwind optsize
define hidden zeroext i1 @_RNvXsL_NtCs86ZAEM1TFWr_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_(i32** noalias nocapture readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* align 4 dereferenceable(52)) unnamed_addr #0 {
  %3 = load i32*, i32** %0, align 4, !nonnull !0
  %4 = tail call zeroext i1 @_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs86ZAEM1TFWr_4core3fmt5Debug3fmt(i32* noalias nonnull readonly align 4 dereferenceable(4) %3, %"core::fmt::Formatter"* nonnull align 4 dereferenceable(52) %1)
  ret i1 %4
}
mw (Nov 19 2019 at 14:08, on Zulip):

Might be a shim that incr. comp. is somehow missing?

mw (Nov 19 2019 at 14:08, on Zulip):

/me needs to run now unfortunately

mw (Nov 21 2019 at 10:28, on Zulip):

@pnkfelix I suggest trying to find out where the above LLVM IR is generated and what MIR it is generated from if any. You can use -Zprint-mono-items=lazy to get additional information.

pnkfelix (Nov 21 2019 at 10:28, on Zulip):

okay thanks

pnkfelix (Nov 21 2019 at 10:29, on Zulip):

I did have a question for you

pnkfelix (Nov 21 2019 at 10:29, on Zulip):

related to this

pnkfelix (Nov 21 2019 at 10:30, on Zulip):

but I don't think I'm going to be able to re-establish the context for my Q quickly enough...

pnkfelix (Nov 21 2019 at 10:31, on Zulip):

the big picture of the Question was that I had tracked down a spot in the rust <-> LLVM interactions

pnkfelix (Nov 21 2019 at 10:31, on Zulip):

where Rust was querying LLVM for a list of the modules that a given cgu depended upon

pnkfelix (Nov 21 2019 at 10:31, on Zulip):

and I remember being surprised to see an empty list for the case in question, something where the given cgu was making a call out to a function in another cgu

pnkfelix (Nov 21 2019 at 10:32, on Zulip):

So that was what I had wanted to ask you about, the expected semantics of what that list contained

pnkfelix (Nov 21 2019 at 10:32, on Zulip):

but now I cannot find any of the details quickly. (I restarted the relevant emacs sessions so that I could reboot the machine, and this effectively obliterates my brain's working memory.)

pnkfelix (Nov 21 2019 at 10:39, on Zulip):

ah, wait, thank goodness, my working directory still has my instrumentation in it

pnkfelix (Nov 21 2019 at 10:41, on Zulip):

rustc_codegen_llvm::back::lto is the relevant module in rustc

pnkfelix (Nov 21 2019 at 10:42, on Zulip):

and I added some code to instrument this loop: https://github.com/rust-lang/rust/blob/f1b882b55805c342e46ee4ca3beeef1d1fa2044b/src/librustc_codegen_llvm/back/lto.rs#L509

pnkfelix (Nov 21 2019 at 10:43, on Zulip):

but of course I have since lost my build of the libcore for the thumbv7em target. :sad:

pnkfelix (Nov 21 2019 at 10:45, on Zulip):

well, I guess my question is still relevant, nonetheless: should we expect import_map.modules_imported_by(module_name) to include modules that hold the definitions for functions being called from module_name ? Or do I misunderstand what "imported_by" is meant to mean here?

pnkfelix (Nov 21 2019 at 10:46, on Zulip):

(because it is just those sorts of functions, undergoing changes, that I would expect would lead us to conclude that we cannot reuse the post-ThinLTO-optimized artifact.)

mw (Nov 21 2019 at 12:20, on Zulip):

iirc, import_map.modules_imported_by(module_name) should contain all LLVM modules that ThinLTO actually pulled definitions in from. So it is ThinLTO specific. There is another "import step" that is done already in rustc: All functions that are marked as #[inline] are instantiated in all LLVM module that contain calls to them.

mw (Nov 21 2019 at 12:22, on Zulip):

so there can be functions that get "inlined across modules" before ThinLTO is invoked.

mw (Nov 21 2019 at 12:23, on Zulip):

but in that case we actually don't inline across module boundaries, we just instantiate internal copies of a given function in each calling module.

mw (Nov 21 2019 at 12:24, on Zulip):

this functionality comes from a time before there was ThinLTO and it yields better results than ThinLTO, so we had to keep it.

pnkfelix (Nov 21 2019 at 12:37, on Zulip):

The reason I asked the question is that my (old, but still possibly current) hypothesis was that ThinLTO was doing an inlining step here that is subsequently invalidated by a later incremental compile

pnkfelix (Nov 21 2019 at 12:38, on Zulip):

and that we fail to deduce that we cannot reuse the post-ThinLTO-optimized object because the import_map is failing to tell us about the dependence

pnkfelix (Nov 21 2019 at 12:39, on Zulip):

But I need to create my original work environment before I can provide better information

mw (Nov 21 2019 at 12:50, on Zulip):

hypothesis was that ThinLTO was doing an inlining step here that is subsequently invalidated by a later incremental compile

that's not what I found. it would be good to verify that you are really seeing something different than I.

pnkfelix (Nov 22 2019 at 12:13, on Zulip):

@mw I want to spend a little bit of time on this, to try to get us in sync on our mutual understanding of the problem

pnkfelix (Nov 22 2019 at 12:14, on Zulip):

I can either: 1. try to recreate the steps I followed previously that led me to my earlier hypothesis, spelling out my thinking as I go in more detail. Or I could: 2. try to reproduce your result and in the process understand your thinking

pnkfelix (Nov 22 2019 at 12:15, on Zulip):

I do want to at least confirm: You did get to a point eventually where you were yourself reproducing the whole problem locally, right?

pnkfelix (Nov 22 2019 at 12:16, on Zulip):

(it seems like you must have, based on the statements you were making. but I just want to be certain that you weren't drawing conclusions based on reading stuff I had posted.)

mw (Nov 22 2019 at 13:22, on Zulip):

Yes, I was able to reproduce the linker error mentioned in the GH issue:

= note: rust-lld: error: undefined symbol: _$LT$rubble..ble..time..Duration$u20$as$u20$core..fmt..Display$GT$::fmt::h343296729f83afc4
          >>> referenced by time.rs:129 (src/ble/time.rs:129)
          >>>               /home/jonas/dev/rubble/target/thumbv7em-none-eabi/debug/deps/rubble-a38354bf4561073f.3qtzuenakrd0qcaf.rcgu.o:(_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h26ac05bd6f4f461e)
mw (Nov 22 2019 at 13:23, on Zulip):

I was able to do so with your reproduction repro, via doit2.sh

mw (Nov 22 2019 at 13:25, on Zulip):

I added -Zsymbol-mangling-version=v0 -Zhuman-readable-cgu-names to all rustc invocations for better readability.

pnkfelix (Nov 22 2019 at 13:35, on Zulip):

okay. Oh, maybe the -Zsymbol-mangling-version=v0 is why I've been getting different symbols than you

mw (Nov 22 2019 at 13:43, on Zulip):

definitely :)

mw (Nov 22 2019 at 13:43, on Zulip):

I don't know if we have an online demangler for the new scheme somewhere yet

mw (Nov 22 2019 at 13:44, on Zulip):

the rustc-demangle crate can handle it.

mw (Nov 22 2019 at 13:45, on Zulip):

it's better than the old scheme for debugging because symbol names actually contain information about generic arguments in a decodable form

pnkfelix (Nov 22 2019 at 14:19, on Zulip):

I've been sitting here wondering why my stage2 is taking so long to build. And then I remembered that in this build, I set llvm.optimize to false (to get assertions and try to improve my ability to use it with rr)

pnkfelix (Nov 22 2019 at 14:20, on Zulip):

I think stage1 rustc has been building librustc for 2.5 hours

davidtwco (Nov 22 2019 at 14:34, on Zulip):

(having been interacting with rustc symbol mangling recently, I’ve been intending to try rustfilt when I’m next looking at things, allegedly supports the new mangling scheme)

mw (Nov 22 2019 at 14:59, on Zulip):

@davidtwco rustfilt needs to update it's rustc-demangle version, but then it should work

pnkfelix (Nov 25 2019 at 10:05, on Zulip):

@mw what is your schedule like today? Maybe we could pair-program on this a bit so that I can show you my observations?

pnkfelix (Nov 25 2019 at 10:06, on Zulip):

(the other options is that I could try to document them, as I described above. But if we can talk live, that would probably be more efficient.)

mw (Nov 25 2019 at 10:29, on Zulip):

I'd could make time for this between 15:30 and 17:00 today.

pnkfelix (Nov 25 2019 at 10:37, on Zulip):

hmm, I have to pick up my son at school today, and I have to leave at like 15:45

pnkfelix (Nov 25 2019 at 10:37, on Zulip):

we could either try for a very short sync up today, or we wait for another day

pnkfelix (Nov 25 2019 at 10:38, on Zulip):

I will say that I still don't see the exact symbols that you are seeing in your code snippet; but I see something similar enough (in terms of its code) that I do think we are talking about the same thing

pnkfelix (Nov 25 2019 at 10:39, on Zulip):

you asked there where that code was coming from. I had thought it was an artifact of the derived fmt::Debug impl for Option<Duration>, but I don't know if I actually verified that.

pnkfelix (Nov 25 2019 at 10:45, on Zulip):

I see something similar enough (in terms of its code) that I do think we are talking about the same thing

(To be clear: I mean that we are talking about the same code snippet. But I don't think we've reached consensus on what is going wrong.)

pnkfelix (Nov 25 2019 at 10:51, on Zulip):

you asked there where that code was coming from. I had thought it was an artifact of the derived fmt::Debug impl for Option<Duration>, but I don't know if I actually verified that.

(Hmm, I guess if this were the case, then we would be finding this in one of the cgu's named core.xxxx-in-rubble.yyyy-coremodname.volatile.*. So that impression may be wrong.)

pnkfelix (Nov 25 2019 at 10:55, on Zulip):

but, oh: There is a call from core.3u8ad80z-in-rubble.7rcbfp3g-option.volatile over into the function we're discussing. From reading the .ll, it is indeed called referenced from the method <core::option::Option<rubble::ble::Duration> as core::fmt::Debug>::fmt

mw (Nov 25 2019 at 12:41, on Zulip):

I can try to make it ~15:00 if that works for you

pnkfelix (Nov 25 2019 at 12:52, on Zulip):

Okay yeah if you can do 15:00 I think that would be good

pnkfelix (Nov 26 2019 at 11:07, on Zulip):

hey @mw i have more questions

pnkfelix (Nov 26 2019 at 11:07, on Zulip):

in particular, after adding some more instrumentation and then looking at things more, I noticed this:

pnkfelix (Nov 26 2019 at 11:08, on Zulip):

during the first (clean) compile, we do observe from LLVM that the ble.volatile module imports the ble module:

[DEBUG rustc_codegen_llvm::back::lto] imported_module_callback importing_module_name: rubble.7rcbfp3g-ble.volatile imported_module_name: rubble.7rcbfp3g-ble
pnkfelix (Nov 26 2019 at 11:09, on Zulip):

but during the second compile (the one reusing incremental state, including the PostLTO optimized version of that cgu) , the data fed back from LLVM does not include that import information anymore.

pnkfelix (Nov 26 2019 at 11:10, on Zulip):

which leads me to wonder: Who is in charge in this scenario of remembering that dependency link?

pnkfelix (Nov 26 2019 at 11:10, on Zulip):

since the whole point here is that the imported module has changed. But I think its the Rust's compilers job to know that, not LLVM's, right?

pnkfelix (Nov 26 2019 at 11:13, on Zulip):

Do we somehow seed the initial LLVM state with information about previous compilation sessions? Or should we be incorporating past dependencies into this map somehow?

pnkfelix (Nov 26 2019 at 12:05, on Zulip):

... Update: Using rr to step through the second LLVM run now. It looks like LLVM's computeImportForFunction may be failing to resolve the callee for the ble.volatile function's call.

pnkfelix (Nov 26 2019 at 12:19, on Zulip):

AH! Huge realization: I had one hand tied behind my back this whole time because every time I tried to pass along -C llvm-args=..., the bug went away

pnkfelix (Nov 26 2019 at 12:20, on Zulip):

But I just realized: Its because we (rightly) invalidate the previous incremental state if you don't pass the same set of llvm-args to both compiler invocations.

pnkfelix (Nov 26 2019 at 12:20, on Zulip):

So now I just pass the same extra settings to both invocations, and boom, I now can start getting debug output from LLVM!

mw (Nov 26 2019 at 13:16, on Zulip):

iirc, the ThinLTO import logic should always run on the full set of modules, so including the modules that we are going to re-use.

pnkfelix (Nov 26 2019 at 13:17, on Zulip):

So here's what LLVM debug info tells me so far:

pnkfelix (Nov 26 2019 at 13:18, on Zulip):

in the following, rubble1 means the first (clean) compile, and rubble2 means the second one that reuses previous work-products.

pnkfelix (Nov 26 2019 at 13:18, on Zulip):

Some rubble1 output:

pnkfelix (Nov 26 2019 at 13:18, on Zulip):
* Module rubble.7rcbfp3g-ble.volatile exports 0 functions and 0 vars. Imports from 1 modules.
 - 1 functions imported from rubble.7rcbfp3g-ble
 - 0 global vars imported from rubble.7rcbfp3g-ble
pnkfelix (Nov 26 2019 at 13:19, on Zulip):

analogous rubble2 output:

* Module rubble.7rcbfp3g-ble.volatile exports 0 functions and 0 vars. Imports from 0 modules.
pnkfelix (Nov 26 2019 at 13:19, on Zulip):

when I worked my way backwards to the place where these import lists were computed, I found this:

pnkfelix (Nov 26 2019 at 13:19, on Zulip):

rubble1 output:

Computing import for Module 'rubble.7rcbfp3g-ble.volatile'
Initialize import for 544407980792537262 (_RNvXsL_NtCs2KcseFXz5HB_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_)
 edge -> 5161150766116740131 (_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs2KcseFXz5HB_4core3fmt5Debug3fmt) Threshold:100
 edge -> 10788096245104502577 (_RNvXNtCs4fqI2P2rA04_6rubble3bleNtB2_8DurationNtNtCs2KcseFXz5HB_4core3fmt7Display3fmt) Threshold:70
ignored! No qualifying callee with summary found.
pnkfelix (Nov 26 2019 at 13:20, on Zulip):

Analogous rubble2 output:

Computing import for Module 'rubble.7rcbfp3g-ble.volatile'
Initialize import for 544407980792537262 (_RNvXsL_NtCs2KcseFXz5HB_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_)
 edge -> 5161150766116740131 (_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs2KcseFXz5HB_4core3fmt5Debug3fmt) Threshold:100
ignored! No qualifying callee with summary found.
pnkfelix (Nov 26 2019 at 13:21, on Zulip):

When I tried to step through the latter computation, it seemed like it was rejecting during selectCallee because of ImportFailureReason::TooLarge

pnkfelix (Nov 26 2019 at 13:23, on Zulip):

But there's something super weird there. The call that its showing is the call as if the inlining didn't happen. I guess the import list construction is based on the LLVM IR prior to the optimization steps, even though we are going to end up deciding to reuse the PostLTO optimized cgu in the end?

pnkfelix (Nov 26 2019 at 13:24, on Zulip):

iirc, the ThinLTO import logic should always run on the full set of modules, so including the modules that we are going to re-use.

So basically, it looks to me like it is running on the modules we are re-using, but its making different decisions about what to construct for the Import List

pnkfelix (Nov 26 2019 at 13:25, on Zulip):

Which actually brings me to a question I've had in the back of my mind for the past hour or so...:

pnkfelix (Nov 26 2019 at 13:28, on Zulip):

@mw What do you think of this theory as to the root bug here: lto.rs, when it decides which modules can be reused in PostLTO form, looks at the import-list and sees if they are all on the green modules list. But it is looking at the import list that it currently gets from LLVM. Should we be instead (or additionally) looking at the old import list, from the time that we did the LTO optimization that we are reusing? (Presumably we would just serialize the ThinLTOImports and re-load that, or add it to the dep-graph maybe.)

mw (Nov 26 2019 at 13:28, on Zulip):

hm, so the import list of rubble1 above says that rubble.ble.volatile imports from rubble.ble, and rubble.ble clearly changes in rubble2, so my question is: why is rubble.ble.volatile marked as PostLTO reuse?

mw (Nov 26 2019 at 13:29, on Zulip):

we are only looking at the previous import list

mw (Nov 26 2019 at 13:29, on Zulip):

the new import list is just stored for subsequent sessions

pnkfelix (Nov 26 2019 at 13:29, on Zulip):

I don't think that's true?

pnkfelix (Nov 26 2019 at 13:29, on Zulip):

By "import list" I mean the ThinLTOImports

pnkfelix (Nov 26 2019 at 13:30, on Zulip):

those are never serialized today AFAICT

pnkfelix (Nov 26 2019 at 13:30, on Zulip):

(let me just make sure this hasn't changed in the last month)

mw (Nov 26 2019 at 13:31, on Zulip):

maybe I'm misremembering. let me take a closer look

pnkfelix (Nov 26 2019 at 13:31, on Zulip):

so this import_map is relying on this LLVM call

mw (Nov 26 2019 at 13:36, on Zulip):

Yes, I think what I was talking about was how I did things in the first version of the PR implementing all of this. but that was missing out on some re-use. Maybe we need a combination of both? I need to think more thoroughly about this...

pnkfelix (Nov 26 2019 at 13:55, on Zulip):

@mw would it be plausible for the incr_comp_session_dir to store, for each LTO-optimized work-product M, the value associated in the import_map for key M (i.e. import_map[M])?

mw (Nov 26 2019 at 13:57, on Zulip):

Yeah, I had already implemented that in https://github.com/rust-lang/rust/pull/53673. Alex talked me out of it ;)

pnkfelix (Nov 26 2019 at 13:58, on Zulip):

does that mean this comment from alex ?

pnkfelix (Nov 26 2019 at 13:59, on Zulip):

I'm willing to treat this bug as a concrete counter-example. Though of course I'd prefer a smaller example... :(

mw (Nov 26 2019 at 13:59, on Zulip):

Given all the information you've gathered so far, I have the feeling that I would understand exactly what's going if I sat down and thought about this for half an hour in peace and quiet. Unfortunately I need to run now :(

mw (Nov 26 2019 at 14:00, on Zulip):

But if you want, you can assign the bug to me.

mw (Nov 26 2019 at 14:01, on Zulip):

does that mean this comment from alex ?

rather this comment, I think: https://github.com/rust-lang/rust/pull/53673#discussion_r212708336

mw (Nov 26 2019 at 14:03, on Zulip):

But I still haven't paged in all the logic here -- having just discovered that some of my "knowledge" is in fact based on a version of the PR that never landed.

pnkfelix (Nov 26 2019 at 14:38, on Zulip):

Yeah, I had already implemented that in https://github.com/rust-lang/rust/pull/53673. Alex talked me out of it ;)

I'm guessing the answer here is "No", but: Do you still have the original code for this lying around somewhere, for me to look at and maybe try out myself, rather than trying to reimplement it myself?

mw (Nov 26 2019 at 19:27, on Zulip):

the github comments are still referencing it, so it might be just a matter of a few clicks?

pnkfelix (Nov 27 2019 at 10:51, on Zulip):

The github comments say "outdated code" and when I tried to follow them, I got one of those pages that says "We went looking everywhere, but couldn’t find those commits."

bjorn3 (Nov 27 2019 at 20:16, on Zulip):

The force-push messages have working before and after commit links in them.

pnkfelix (Nov 28 2019 at 10:06, on Zulip):

ah thanks @bjorn3 , I've always overlooked that

pnkfelix (Nov 29 2019 at 16:05, on Zulip):

@mw okay I think I have a patch for this

pnkfelix (Nov 29 2019 at 16:06, on Zulip):

I'm going to do some internal testing, but here is the relevant commit; it is largely inspired by the approach you referenced in that earlier PR.

mw (Dec 02 2019 at 07:56, on Zulip):

Great, I'll put looking at that on my todo list

pnkfelix (Dec 02 2019 at 10:25, on Zulip):

at this point I'm going to spend some time today exploring whether I can make a more robust regression test

pnkfelix (Dec 02 2019 at 11:26, on Zulip):

@mw am I correct that the tests in src/incremental/thinlto are supposedly exercising ThinLTO because they have -O in their // compile-flags ?

mw (Dec 02 2019 at 12:17, on Zulip):

yes

pnkfelix (Dec 02 2019 at 13:33, on Zulip):

I'm not sure I'm going to be able to construct a (robust) test case for this. Even with the knowledge of the bug that I've garnered, I really don't know how to reliably get LLVM to reproduce the at-LTO-time inlining behavior that we need to witness on my local machine.

mw (Dec 02 2019 at 13:37, on Zulip):

small functions like fn foo() -> u32 { 0 } are imported very reliably by ThinLTO

mw (Dec 02 2019 at 13:38, on Zulip):

and functions with #[inline(never)] are also reliably not imported by ThinLTO, afaik

mw (Dec 02 2019 at 13:38, on Zulip):

maybe that helps?

pnkfelix (Dec 02 2019 at 13:44, on Zulip):

The scenario I need to recreate is one where we change our mind about whether to mark a fn as internal or not

pnkfelix (Dec 02 2019 at 13:45, on Zulip):

while at the same time, LTO has inlined the reference to that (to-be-marked internal) fn into some cgu that won't be rebuilt under the old logic

pnkfelix (Dec 02 2019 at 13:45, on Zulip):

I'm able to get the first step down pretty reliably

pnkfelix (Dec 02 2019 at 13:45, on Zulip):

but I haven't made much luck with the second part.

pnkfelix (Dec 02 2019 at 13:46, on Zulip):

I guess I could try inspecting the generated intermediate products from the LTO passes to try to figure things out more

pnkfelix (Dec 02 2019 at 16:07, on Zulip):

is there an #[inline(..)] attribute with the effect of "inline at link-time if possible, but not earlier." ?

mw (Dec 02 2019 at 16:25, on Zulip):

nope

mw (Dec 02 2019 at 16:26, on Zulip):

but functions without an #[inline] attribute are guaranteed to not be inlined across CGU boundaries before ThinLTO

mw (Dec 02 2019 at 16:27, on Zulip):

and you can rely on different mods to end up as separate CGUs (when compiling incrementally), even if they are defined in the same source file.

pnkfelix (Dec 02 2019 at 16:42, on Zulip):

yeah, i've been making use of the latter property

pnkfelix (Dec 02 2019 at 16:43, on Zulip):

(the test infrastructure is impressive. The main thing missing is docs explaining how the tests work in the first place.)

pnkfelix (Dec 04 2019 at 13:31, on Zulip):

@mw I decided to give up on finding a more robust test

pnkfelix (Dec 04 2019 at 13:32, on Zulip):

I filed PR #67020. It still has a test, it just embeds the one we've already been using as a run-make test for the corresponding thumb target.

mw (Dec 04 2019 at 15:25, on Zulip):

I'll try to review tomorrow before the triage meeting

pnkfelix (Dec 05 2019 at 12:57, on Zulip):

thanks for your input. I really wish we could get a smaller test case

pnkfelix (Dec 05 2019 at 12:58, on Zulip):

I keep musing about trying to dive deeper into my understanding of LLVM optimization passes, with the thought that if I knew the right -C llvm-opts=... to feed in, I could replicate this more readily (and more generally)

pnkfelix (Dec 05 2019 at 12:59, on Zulip):

When you write "Yes, but the import map is always re-computed from scratch from the entire set of modules, including the older ones", do you mean that the import map somehow uses the state of the compiled object code ?

pnkfelix (Dec 05 2019 at 13:01, on Zulip):

another option here, which may be unpalatable, but would satisfy your desire to ensure we don't rely on anything beyond the previous serialized import map: for each CGU, if there is any change at all to the import map, just do not reuse the PostLTO object file

pnkfelix (Dec 05 2019 at 13:02, on Zulip):

given that the import map seems to very rarely change for green modules, this condition seems like it would fire so rarely that it would not regress anything.

pnkfelix (Dec 05 2019 at 13:03, on Zulip):

oh, maybe this is what you meant by " If the two things match up" ?

mw (Dec 05 2019 at 13:03, on Zulip):

When you write "Yes, but the import map is always re-computed from scratch from the entire set of modules, including the older ones", do you mean that the import map somehow uses the state of the compiled object code ?

the import map is based on the state of the pre-optimization LLVM bitcode (which we also keep around for cached modules)

mw (Dec 05 2019 at 13:05, on Zulip):

another option here, which may be unpalatable, but would satisfy your desire to ensure we don't rely on anything beyond the previous serialized import map: for each CGU, if there is any change at all to the import map, just do not reuse the PostLTO object file

I need to think about that. It might be the correct fix actually

pnkfelix (Dec 05 2019 at 13:06, on Zulip):

I would have no problem adopting the latter.

pnkfelix (Dec 05 2019 at 13:06, on Zulip):

and I think I could hack it up fast

pnkfelix (Dec 05 2019 at 13:06, on Zulip):

just set-equality; so sort both slices and compare them for equality. easy peasy.

mw (Dec 05 2019 at 13:07, on Zulip):

I can believe that it is correct and clean because it sounds like a manual re-implementation of the red-green algorithm

pnkfelix (Dec 05 2019 at 13:07, on Zulip):

okay cool, i prefer this

mw (Dec 05 2019 at 13:08, on Zulip):

and then only save the "current" version of the import map?

pnkfelix (Dec 05 2019 at 13:09, on Zulip):

exactly

mw (Dec 05 2019 at 13:10, on Zulip):

that sounds good then

pnkfelix (Dec 05 2019 at 13:10, on Zulip):

let me go ahead and give it a whirl. I just finished pushing a new version of the PR with the test fixed in a couple ways

pnkfelix (Dec 05 2019 at 13:10, on Zulip):

oh but I guess I need to do triage for the rustc meeting first (!)

pnkfelix (Dec 05 2019 at 13:11, on Zulip):

i'll follow up on this after the meeting.

pnkfelix (Dec 05 2019 at 13:11, on Zulip):

(or when I finish triage; whichever comes first)

mw (Dec 05 2019 at 13:11, on Zulip):

this bug is a cautionary tale about trying to do incremental compilation outside of the well-tested framework :)

mw (Dec 05 2019 at 13:11, on Zulip):

no rush :)

pnkfelix (Dec 05 2019 at 13:11, on Zulip):

its definitely getting on my list of "favorite bugs" now, though

pnkfelix (Dec 05 2019 at 13:11, on Zulip):

right up there with some great GC bugs I tracked down back in the day

pnkfelix (Dec 05 2019 at 13:12, on Zulip):

It also has led me to think harder about how to go about testing incremental in the first place

mw (Dec 05 2019 at 13:12, on Zulip):

my favorite is still a random memory corruption/dangling pointer bug in LLVM that only rustc triggered :)

pnkfelix (Dec 05 2019 at 13:13, on Zulip):

this was my attempt to create a test that would expose the bug for me locally: https://gist.github.com/pnkfelix/8a5870135def2f6c46bbd37feed70e2c

pnkfelix (Dec 05 2019 at 13:14, on Zulip):

I was basically trying to manually enumerate as many ways to combine calls between modules that I could think of

mw (Dec 05 2019 at 13:17, on Zulip):

I'll probably give this a quick try tomorrow morning, just from the scenario description in your PR.

pnkfelix (Dec 05 2019 at 13:20, on Zulip):

I'll probably give this a quick try tomorrow morning, just from the scenario description in your PR.

making a test, you mean? That would be great.

pnkfelix (Dec 05 2019 at 21:39, on Zulip):

@mw FYI I have updated the PR with the simpler strategy. It makes me happy.

mw (Dec 06 2019 at 10:51, on Zulip):

I'm so proud: https://github.com/rust-lang/rust/pull/67020#issuecomment-562526208

mw (Dec 06 2019 at 10:52, on Zulip):

Your description of the failure scenario is spot on

pnkfelix (Dec 06 2019 at 10:55, on Zulip):

I cannot say how happy I am you found that

pnkfelix (Dec 06 2019 at 10:55, on Zulip):

import-instr-limit seems important. :)

pnkfelix (Dec 06 2019 at 10:55, on Zulip):

(as in, I should learn about it. it may have made my life easier)

mw (Dec 06 2019 at 10:56, on Zulip):

it should make the test more stable, although I don;t think it is needed with the current defaults

mw (Dec 06 2019 at 10:56, on Zulip):

I only found out about it recently too

pnkfelix (Dec 06 2019 at 10:57, on Zulip):

the fact that the intermediate function needs to be a certain size was something I was not paying attention to in my attempts to replicate

pnkfelix (Dec 06 2019 at 10:57, on Zulip):

it needs to be not too big, but not too small either, right?

mw (Dec 06 2019 at 10:57, on Zulip):

(it can massively speed up optimized compilation in exchange for slightly lower runtime performance: https://github.com/rust-lang/rust/pull/66625)

pnkfelix (Dec 06 2019 at 10:57, on Zulip):

(I'm speaking of foo::bar, here)

mw (Dec 06 2019 at 10:58, on Zulip):

Since it is an internal function in cfail2 and LLVM knows that there is only a single call site, it will probably inline it even if it gets very big

pnkfelix (Dec 06 2019 at 13:21, on Zulip):

@mw I assume we'd all be happiest if I incorporate your new test into my PR. :smiley:

mw (Dec 06 2019 at 13:41, on Zulip):

yes please

Josh Triplett (Dec 10 2019 at 23:59, on Zulip):

Very nice compilation time speedup! I would love to see that in CI.

pnkfelix (Dec 11 2019 at 22:00, on Zulip):

Very nice compilation time speedup! I would love to see that in CI.

@Josh Triplett this was aimed at a different topic, no?

Josh Triplett (Dec 11 2019 at 22:02, on Zulip):

No, it was related to https://github.com/rust-lang/rust/pull/66625 posted above.

Last update: Dec 12 2019 at 00:50UTC