hey @mw and/or @Alex Crichton , I wanted to bounce some thoughts off of you two
in particular, I wanted to talk about a theory I put forth here.
We currently make decisions about whether to change a symbol's Linkage from External to Internal based on the set of accessors we currently observe. However, I theorize that old incremental codegen artifacts may be relying on a symbol's previous classification as External
so I was wondering: can/should we have a rule that if a symbol was considered External in any previous compile, then an incremental compilation is forced to keep it External?
(this is not the only possible solution to the theorized problem. For example, instead of constraining which symbols can be marked Internal based on previous compiles, we could instead force the codegen artifacts to be recompiled rather than reused; but doing this well would require determiniing which codegen artifacts depend on the symbols previously marked as External, and I do not know how hard that is to implement.)
mw will probably have more to say about this than I, but at least from my perspective I would say that artifacts from a previous compile should only ever accelerate future compiles, never actually change the future compile
so that way no matter what your cache looks like cargo build
always produces the same thing
so I'd probably say that this should lean on the side of recompiling more rather than keeping something External
although if that's too onerous, then we could also perhaps start exporting more symbols in incremental mode
However, I theorize that old incremental codegen artifacts may be relying on a symbol's previous classification as External
Old artifacts must not be used if they are not the same as the artifacts produced by the current compilation session, so what the previous classification was should not matter. In theory at least, there might be bugs in the compiler.
my current inclination is to try turning off the internalize_symbols optimization if we have incremental compilation turned on...
(Though I worry that may cause too great a regression in code quality, especially code size...)
One question related to this: Am I right that LLVM could be inlining a function that comes from a different codegen unit in the current crate? To be honest that part of the story does not match my mental model of how LLVM optimization works.
Hmm: is there some #[rustc_attribute] I can use to force items to be assigned to particular codegen units? That would help me write a test case that would allow exploring the behavior here without relying on the oddities of this particular test.
/me is busy at the moment. Will take a closer look on Friday.
One question related to this: Am I right that LLVM could be inlining a function that comes from a different codegen unit in the current crate? To be honest that part of the story does not match my mental model of how LLVM optimization works.
further update on this detail: I think the inlining here is occurring during LTO
Hmm: is there some #[rustc_attribute] I can use to force items to be assigned to particular codegen units? That would help me write a test case that would allow exploring the behavior here without relying on the oddities of this particular test.
(maybe getting this effect is supposed to be as simple as choosing the right set of mod
items. still looking; but I think an explicit attribute could potentially be a good thing.)
I left more notes in the bug (#59535). I now better understand what's happening. Part of the problem is that we are reusing the post-ThinLTO object file for a codegen-unit (and that's what holds the result of inlining fn B
from a different codegen-unit); so fn B
's call to fn A
has been inlined into that object file. But independently, due to changes elsewhere in the source code, we reclassified fn A
as internal, and subsequently optimized away the definition of fn A
.
I am looking over lto.rs now, and also trying to understand the LLVM LTO docs. One oddity is that the erroneously reused codegen unit claims it does not import from any modules; but surely the call to the other function should be considered an import?
Argh, sorry I didn't get to this before the weekend ...
I think there is a small bug somewhere here, as opposed to a conceptual problems because otherwise we'd see much more of this
and I remember that I ran into problems immediately when not handling symbol internalization properly while initially implementing incremental ThinLTO
yes I am working my way gradually towards the camp of "small bug somewhere" rather than "conceptual problem"
it just took me a while to put all the pieces together
at this point I'm not sure whether the small bug is on our end or within LLVM.
yeah, it's non trivial and should be better documented
I was pretty sure for a while it was our fault, but then when I was trying to walk through some of the execution, the behavior I saw from LLVM was pretty strange
I'll reserved a 2h time slot for this on Tuesday
anyway don't worry about delaying this
:+1:
I spent a long time on it because I know how painful incremetnal bugs are, but also, we have so many that we don't have reproducible test cases for
so the fact that we could actually reproduce this one got me very excited.
:D
perhaps over enthusiastically so
this is incredibly weird :)
I've been able to reproduce with doit2.sh
from https://github.com/pnkfelix/rust-issue-59535
but I have no clue what's going on yet
it looks like this has nothing to do with ThinLTO
the volatile
cgu for rubble::ble
declares <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Display>::fmt
in the initial version
in the next version it should declare <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Debug>::fmt
but the CGU is exactly the same version as initially
the internalizer correctly makes the symbol for <rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Display>::fmt
internal, but it looks like the volatile
object file for rubble::ble
is not recompiled, although the MIR it is generated from should have changed.
I'm trying to debug further but for that I need a local build of rustc that can handle the thumbv7em-none-eabi
target
how hard can that be :P
Oh I should have warned you about that target dependence
what did I end up having to do there ...
I'm curious you say the MIR should have changed. I could have sworn that I validated that the inlining in question was happening during a LTO optimization pass
regarding the target dependence, I know that I installed thumbv7em-none-eabi via rustup
, but that's not the local build
if I compile v2 with an empty cache, the volatile cgu does not reference the function in question anymore
it references a different function instead
and everything works
but let me take another look...
okay, this is how I built: time python /home/pnkfelix/Dev/Mozilla/rust.git/x.py dist --stage 2 --target thumbv7em-none-eabi
though that might need to be preceded by a python x.py build --stage 2 --target thumbv7em-none-eabi
(for some reason I vaguely recall being unhappy about my attempts to go directly to dist
)
I'm getting thread 'main' panicked at 'All the *-none-* and nvptx* targets are no-std targets', src/bootstrap/sanity.rs:186:17
with that
regarding the obj file, already the "no-opt" version of the LLVM IR should look different, so this is before ThinLTO, unless I'm overlooking something
hmm.
when you say "the volatile cgu", is that referring to mks4f5lxe2i4sqc
? (I am assuming the names are stable between your platform and my own, but perhaps that is an invalid assumption)
I'm compiling with -Zhuman-readable-cgu-names
, which gives me names corresponding to the module path
ah hmm
e.g. rubble.rubble.7rcbfp3g-ble.volatile.rcgu.bc
didn't know about that flag
the above can be decoded as rubble[7rcbfp3g]::ble
where the version with volatile
will contain the code corresponding to generic instances
and the one without volatile
will contain the code for non-generic stuff from that module
I don't understand the no_std logic in x.py
@simulacrum or @Pietro Albini, do you happen to know how I can locally build a rustc that can target thumbv7em-none-eabi
?
should be able to x.py build src/libstd --target thumbv7em-none-eabi
unless you need C compiler or something
x.py dist --stage 2 --target thumbv7em-none-eabi
gives me thread 'main' panicked at 'All the *-none-* and nvptx* targets are no-std targets', src/bootstrap/sanity.rs:186:17
maybe if I delete my config.toml?
should be able to
x.py build src/libstd --target thumbv7em-none-eabi
I know this was not sufficient for me
you can also wait for a while and just use dist-various-1
that'll definitely work
I definitely had to do something beyond x.py build
to get a libstd built (and put into a working location) for the target
./src/ci/docker/run.sh dist-various-1
Re-running x.py build src/libstd --target thumbv7em-none-eabi
without a config.toml
now ...
(I think) I have the correct GCC arm toolchain installed already
we don't seem to do anything special in that docker container so I would expect that to work
I'll try a clean build with x.py build src/libstd --target thumbv7em-none-eabi
to try to get a better record of what went wrong there. I had assume it was user error back when I found that to be insufficient.
/me is still waiting for build ...
(I'm glad I have a rather snappy desktop machine available)
OK, it failed with something different
7 mins for a stage 2 build is not bad though!
cargo:warning=/xoxo/3-rust/src/llvm-project/compiler-rt/lib/builtins/int_util.c:57:10: fatal error: stdlib.h: No such file or directory cargo:warning= 57 | #include <stdlib.h> cargo:warning= | ^~~~~~~~~~ cargo:warning=compilation terminated.
hm that is ... unexpected
dist-various-1 would be my recommendation (you can comment a bunch in the relevant dockerfile to get it faster)
I had to install arm-none-eabi-newlib
on my system, now it works
why would the GCC package not include that? weird
now I need to get rust-lld
working :(
Is that not just a rustup
thing?
you are really making wish I had kept a full transcript of my full debugging session...
using rust-lld for the rustup's nightly now fails compiling the support libraries... frustrating
now my reproduction script seems broken even with nightly :(
(btw, thanks for you help, @simulacrum!)
Hm, so the thing that changes is <&rubble[317d481089b8c8fe]::ble::Duration as core[5e7ea8cd6850b425]::fmt::Debug>::fmt
. Is this maybe something completely compiler generated for which we never instantiate any MIR?
; Function Attrs: nounwind optsize define hidden zeroext i1 @_RNvXsL_NtCs86ZAEM1TFWr_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_(i32** noalias nocapture readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* align 4 dereferenceable(52)) unnamed_addr #0 { %3 = load i32*, i32** %0, align 4, !nonnull !0 %4 = tail call zeroext i1 @_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs86ZAEM1TFWr_4core3fmt5Debug3fmt(i32* noalias nonnull readonly align 4 dereferenceable(4) %3, %"core::fmt::Formatter"* nonnull align 4 dereferenceable(52) %1) ret i1 %4 }
Might be a shim that incr. comp. is somehow missing?
/me needs to run now unfortunately
@pnkfelix I suggest trying to find out where the above LLVM IR is generated and what MIR it is generated from if any. You can use -Zprint-mono-items=lazy
to get additional information.
okay thanks
I did have a question for you
related to this
but I don't think I'm going to be able to re-establish the context for my Q quickly enough...
the big picture of the Question was that I had tracked down a spot in the rust <-> LLVM interactions
where Rust was querying LLVM for a list of the modules that a given cgu depended upon
and I remember being surprised to see an empty list for the case in question, something where the given cgu was making a call out to a function in another cgu
So that was what I had wanted to ask you about, the expected semantics of what that list contained
but now I cannot find any of the details quickly. (I restarted the relevant emacs sessions so that I could reboot the machine, and this effectively obliterates my brain's working memory.)
ah, wait, thank goodness, my working directory still has my instrumentation in it
rustc_codegen_llvm::back::lto
is the relevant module in rustc
and I added some code to instrument this loop: https://github.com/rust-lang/rust/blob/f1b882b55805c342e46ee4ca3beeef1d1fa2044b/src/librustc_codegen_llvm/back/lto.rs#L509
but of course I have since lost my build of the libcore
for the thumbv7em target. :sad:
well, I guess my question is still relevant, nonetheless: should we expect import_map.modules_imported_by(module_name)
to include modules that hold the definitions for functions being called from module_name
? Or do I misunderstand what "imported_by" is meant to mean here?
(because it is just those sorts of functions, undergoing changes, that I would expect would lead us to conclude that we cannot reuse the post-ThinLTO-optimized artifact.)
iirc, import_map.modules_imported_by(module_name)
should contain all LLVM modules that ThinLTO actually pulled definitions in from. So it is ThinLTO specific. There is another "import step" that is done already in rustc: All functions that are marked as #[inline]
are instantiated in all LLVM module that contain calls to them.
so there can be functions that get "inlined across modules" before ThinLTO is invoked.
but in that case we actually don't inline across module boundaries, we just instantiate internal copies of a given function in each calling module.
this functionality comes from a time before there was ThinLTO and it yields better results than ThinLTO, so we had to keep it.
The reason I asked the question is that my (old, but still possibly current) hypothesis was that ThinLTO was doing an inlining step here that is subsequently invalidated by a later incremental compile
and that we fail to deduce that we cannot reuse the post-ThinLTO-optimized object because the import_map is failing to tell us about the dependence
But I need to create my original work environment before I can provide better information
hypothesis was that ThinLTO was doing an inlining step here that is subsequently invalidated by a later incremental compile
that's not what I found. it would be good to verify that you are really seeing something different than I.
@mw I want to spend a little bit of time on this, to try to get us in sync on our mutual understanding of the problem
I can either: 1. try to recreate the steps I followed previously that led me to my earlier hypothesis, spelling out my thinking as I go in more detail. Or I could: 2. try to reproduce your result and in the process understand your thinking
I do want to at least confirm: You did get to a point eventually where you were yourself reproducing the whole problem locally, right?
(it seems like you must have, based on the statements you were making. but I just want to be certain that you weren't drawing conclusions based on reading stuff I had posted.)
Yes, I was able to reproduce the linker error mentioned in the GH issue:
= note: rust-lld: error: undefined symbol: _$LT$rubble..ble..time..Duration$u20$as$u20$core..fmt..Display$GT$::fmt::h343296729f83afc4 >>> referenced by time.rs:129 (src/ble/time.rs:129) >>> /home/jonas/dev/rubble/target/thumbv7em-none-eabi/debug/deps/rubble-a38354bf4561073f.3qtzuenakrd0qcaf.rcgu.o:(_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h26ac05bd6f4f461e)
I was able to do so with your reproduction repro, via doit2.sh
I added -Zsymbol-mangling-version=v0 -Zhuman-readable-cgu-names
to all rustc invocations for better readability.
okay. Oh, maybe the -Zsymbol-mangling-version=v0
is why I've been getting different symbols than you
definitely :)
I don't know if we have an online demangler for the new scheme somewhere yet
the rustc-demangle
crate can handle it.
it's better than the old scheme for debugging because symbol names actually contain information about generic arguments in a decodable form
I've been sitting here wondering why my stage2 is taking so long to build. And then I remembered that in this build, I set llvm.optimize to false (to get assertions and try to improve my ability to use it with rr
)
I think stage1 rustc has been building librustc for 2.5 hours
(having been interacting with rustc symbol mangling recently, I’ve been intending to try rustfilt when I’m next looking at things, allegedly supports the new mangling scheme)
@davidtwco rustfilt
needs to update it's rustc-demangle
version, but then it should work
@mw what is your schedule like today? Maybe we could pair-program on this a bit so that I can show you my observations?
(the other options is that I could try to document them, as I described above. But if we can talk live, that would probably be more efficient.)
I'd could make time for this between 15:30 and 17:00 today.
hmm, I have to pick up my son at school today, and I have to leave at like 15:45
we could either try for a very short sync up today, or we wait for another day
I will say that I still don't see the exact symbols that you are seeing in your code snippet; but I see something similar enough (in terms of its code) that I do think we are talking about the same thing
you asked there where that code was coming from. I had thought it was an artifact of the derived fmt::Debug
impl for Option<Duration>
, but I don't know if I actually verified that.
I see something similar enough (in terms of its code) that I do think we are talking about the same thing
(To be clear: I mean that we are talking about the same code snippet. But I don't think we've reached consensus on what is going wrong.)
you asked there where that code was coming from. I had thought it was an artifact of the derived
fmt::Debug
impl forOption<Duration>
, but I don't know if I actually verified that.
(Hmm, I guess if this were the case, then we would be finding this in one of the cgu's named core.xxxx-in-rubble.yyyy-coremodname.volatile.*
. So that impression may be wrong.)
but, oh: There is a call from core.3u8ad80z-in-rubble.7rcbfp3g-option.volatile
over into the function we're discussing. From reading the .ll, it is indeed called referenced from the method <core::option::Option<rubble::ble::Duration> as core::fmt::Debug>::fmt
I can try to make it ~15:00 if that works for you
Okay yeah if you can do 15:00 I think that would be good
hey @mw i have more questions
in particular, after adding some more instrumentation and then looking at things more, I noticed this:
during the first (clean) compile, we do observe from LLVM that the ble.volatile module imports the ble module:
[DEBUG rustc_codegen_llvm::back::lto] imported_module_callback importing_module_name: rubble.7rcbfp3g-ble.volatile imported_module_name: rubble.7rcbfp3g-ble
but during the second compile (the one reusing incremental state, including the PostLTO optimized version of that cgu) , the data fed back from LLVM does not include that import information anymore.
which leads me to wonder: Who is in charge in this scenario of remembering that dependency link?
since the whole point here is that the imported module has changed. But I think its the Rust's compilers job to know that, not LLVM's, right?
Do we somehow seed the initial LLVM state with information about previous compilation sessions? Or should we be incorporating past dependencies into this map somehow?
... Update: Using rr
to step through the second LLVM run now. It looks like LLVM's computeImportForFunction
may be failing to resolve the callee for the ble.volatile function's call.
AH! Huge realization: I had one hand tied behind my back this whole time because every time I tried to pass along -C llvm-args=...
, the bug went away
But I just realized: Its because we (rightly) invalidate the previous incremental state if you don't pass the same set of llvm-args to both compiler invocations.
So now I just pass the same extra settings to both invocations, and boom, I now can start getting debug output from LLVM!
iirc, the ThinLTO import logic should always run on the full set of modules, so including the modules that we are going to re-use.
So here's what LLVM debug info tells me so far:
in the following, rubble1
means the first (clean) compile, and rubble2
means the second one that reuses previous work-products.
Some rubble1
output:
* Module rubble.7rcbfp3g-ble.volatile exports 0 functions and 0 vars. Imports from 1 modules. - 1 functions imported from rubble.7rcbfp3g-ble - 0 global vars imported from rubble.7rcbfp3g-ble
analogous rubble2
output:
* Module rubble.7rcbfp3g-ble.volatile exports 0 functions and 0 vars. Imports from 0 modules.
when I worked my way backwards to the place where these import lists were computed, I found this:
rubble1
output:
Computing import for Module 'rubble.7rcbfp3g-ble.volatile' Initialize import for 544407980792537262 (_RNvXsL_NtCs2KcseFXz5HB_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_) edge -> 5161150766116740131 (_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs2KcseFXz5HB_4core3fmt5Debug3fmt) Threshold:100 edge -> 10788096245104502577 (_RNvXNtCs4fqI2P2rA04_6rubble3bleNtB2_8DurationNtNtCs2KcseFXz5HB_4core3fmt7Display3fmt) Threshold:70 ignored! No qualifying callee with summary found.
Analogous rubble2
output:
Computing import for Module 'rubble.7rcbfp3g-ble.volatile' Initialize import for 544407980792537262 (_RNvXsL_NtCs2KcseFXz5HB_4core3fmtRNtNtCs4fqI2P2rA04_6rubble3ble8DurationNtB5_5Debug3fmtBz_) edge -> 5161150766116740131 (_RNvXs_NtCs4fqI2P2rA04_6rubble3bleNtB4_8DurationNtNtCs2KcseFXz5HB_4core3fmt5Debug3fmt) Threshold:100 ignored! No qualifying callee with summary found.
When I tried to step through the latter computation, it seemed like it was rejecting during selectCallee
because of ImportFailureReason::TooLarge
But there's something super weird there. The call that its showing is the call as if the inlining didn't happen. I guess the import list construction is based on the LLVM IR prior to the optimization steps, even though we are going to end up deciding to reuse the PostLTO optimized cgu in the end?
iirc, the ThinLTO import logic should always run on the full set of modules, so including the modules that we are going to re-use.
So basically, it looks to me like it is running on the modules we are re-using, but its making different decisions about what to construct for the Import List
Which actually brings me to a question I've had in the back of my mind for the past hour or so...:
@mw What do you think of this theory as to the root bug here: lto.rs
, when it decides which modules can be reused in PostLTO form, looks at the import-list and sees if they are all on the green modules list. But it is looking at the import list that it currently gets from LLVM. Should we be instead (or additionally) looking at the old import list, from the time that we did the LTO optimization that we are reusing? (Presumably we would just serialize the ThinLTOImports
and re-load that, or add it to the dep-graph maybe.)
hm, so the import list of rubble1
above says that rubble.ble.volatile
imports from rubble.ble
, and rubble.ble
clearly changes in rubble2
, so my question is: why is rubble.ble.volatile
marked as PostLTO reuse?
we are only looking at the previous import list
the new import list is just stored for subsequent sessions
I don't think that's true?
By "import list" I mean the ThinLTOImports
those are never serialized today AFAICT
(let me just make sure this hasn't changed in the last month)
maybe I'm misremembering. let me take a closer look
so this import_map
is relying on this LLVM call
Yes, I think what I was talking about was how I did things in the first version of the PR implementing all of this. but that was missing out on some re-use. Maybe we need a combination of both? I need to think more thoroughly about this...
@mw would it be plausible for the incr_comp_session_dir to store, for each LTO-optimized work-product M
, the value associated in the import_map
for key M
(i.e. import_map[M]
)?
Yeah, I had already implemented that in https://github.com/rust-lang/rust/pull/53673. Alex talked me out of it ;)
does that mean this comment from alex ?
I'm willing to treat this bug as a concrete counter-example. Though of course I'd prefer a smaller example... :(
Given all the information you've gathered so far, I have the feeling that I would understand exactly what's going if I sat down and thought about this for half an hour in peace and quiet. Unfortunately I need to run now :(
But if you want, you can assign the bug to me.
does that mean this comment from alex ?
rather this comment, I think: https://github.com/rust-lang/rust/pull/53673#discussion_r212708336
But I still haven't paged in all the logic here -- having just discovered that some of my "knowledge" is in fact based on a version of the PR that never landed.
Yeah, I had already implemented that in https://github.com/rust-lang/rust/pull/53673. Alex talked me out of it
;)
I'm guessing the answer here is "No", but: Do you still have the original code for this lying around somewhere, for me to look at and maybe try out myself, rather than trying to reimplement it myself?
the github comments are still referencing it, so it might be just a matter of a few clicks?
The github comments say "outdated code" and when I tried to follow them, I got one of those pages that says "We went looking everywhere, but couldn’t find those commits."
The force-push messages have working before and after commit links in them.
ah thanks @bjorn3 , I've always overlooked that
@mw okay I think I have a patch for this
I'm going to do some internal testing, but here is the relevant commit; it is largely inspired by the approach you referenced in that earlier PR.
Great, I'll put looking at that on my todo list
at this point I'm going to spend some time today exploring whether I can make a more robust regression test
@mw am I correct that the tests in src/incremental/thinlto
are supposedly exercising ThinLTO because they have -O
in their // compile-flags
?
yes
I'm not sure I'm going to be able to construct a (robust) test case for this. Even with the knowledge of the bug that I've garnered, I really don't know how to reliably get LLVM to reproduce the at-LTO-time inlining behavior that we need to witness on my local machine.
small functions like fn foo() -> u32 { 0 }
are imported very reliably by ThinLTO
and functions with #[inline(never)]
are also reliably not imported by ThinLTO, afaik
maybe that helps?
The scenario I need to recreate is one where we change our mind about whether to mark a fn as internal or not
while at the same time, LTO has inlined the reference to that (to-be-marked internal) fn into some cgu that won't be rebuilt under the old logic
I'm able to get the first step down pretty reliably
but I haven't made much luck with the second part.
I guess I could try inspecting the generated intermediate products from the LTO passes to try to figure things out more
is there an #[inline(..)]
attribute with the effect of "inline at link-time if possible, but not earlier." ?
nope
but functions without an #[inline]
attribute are guaranteed to not be inlined across CGU boundaries before ThinLTO
and you can rely on different mod
s to end up as separate CGUs (when compiling incrementally), even if they are defined in the same source file.
yeah, i've been making use of the latter property
(the test infrastructure is impressive. The main thing missing is docs explaining how the tests work in the first place.)
@mw I decided to give up on finding a more robust test
I filed PR #67020. It still has a test, it just embeds the one we've already been using as a run-make test for the corresponding thumb target.
I'll try to review tomorrow before the triage meeting
thanks for your input. I really wish we could get a smaller test case
I keep musing about trying to dive deeper into my understanding of LLVM optimization passes, with the thought that if I knew the right -C llvm-opts=...
to feed in, I could replicate this more readily (and more generally)
When you write "Yes, but the import map is always re-computed from scratch from the entire set of modules, including the older ones", do you mean that the import map somehow uses the state of the compiled object code ?
another option here, which may be unpalatable, but would satisfy your desire to ensure we don't rely on anything beyond the previous serialized import map: for each CGU, if there is any change at all to the import map, just do not reuse the PostLTO object file
given that the import map seems to very rarely change for green modules, this condition seems like it would fire so rarely that it would not regress anything.
oh, maybe this is what you meant by " If the two things match up" ?
When you write "Yes, but the import map is always re-computed from scratch from the entire set of modules, including the older ones", do you mean that the import map somehow uses the state of the compiled object code ?
the import map is based on the state of the pre-optimization LLVM bitcode (which we also keep around for cached modules)
another option here, which may be unpalatable, but would satisfy your desire to ensure we don't rely on anything beyond the previous serialized import map: for each CGU, if there is any change at all to the import map, just do not reuse the PostLTO object file
I need to think about that. It might be the correct fix actually
I would have no problem adopting the latter.
and I think I could hack it up fast
just set-equality; so sort both slices and compare them for equality. easy peasy.
I can believe that it is correct and clean because it sounds like a manual re-implementation of the red-green algorithm
okay cool, i prefer this
and then only save the "current" version of the import map?
exactly
that sounds good then
let me go ahead and give it a whirl. I just finished pushing a new version of the PR with the test fixed in a couple ways
oh but I guess I need to do triage for the rustc meeting first (!)
i'll follow up on this after the meeting.
(or when I finish triage; whichever comes first)
this bug is a cautionary tale about trying to do incremental compilation outside of the well-tested framework :)
no rush :)
its definitely getting on my list of "favorite bugs" now, though
right up there with some great GC bugs I tracked down back in the day
It also has led me to think harder about how to go about testing incremental in the first place
my favorite is still a random memory corruption/dangling pointer bug in LLVM that only rustc triggered :)
this was my attempt to create a test that would expose the bug for me locally: https://gist.github.com/pnkfelix/8a5870135def2f6c46bbd37feed70e2c
I was basically trying to manually enumerate as many ways to combine calls between modules that I could think of
I'll probably give this a quick try tomorrow morning, just from the scenario description in your PR.
I'll probably give this a quick try tomorrow morning, just from the scenario description in your PR.
making a test, you mean? That would be great.
@mw FYI I have updated the PR with the simpler strategy. It makes me happy.
Your description of the failure scenario is spot on
I cannot say how happy I am you found that
import-instr-limit seems important. :)
(as in, I should learn about it. it may have made my life easier)
it should make the test more stable, although I don;t think it is needed with the current defaults
I only found out about it recently too
the fact that the intermediate function needs to be a certain size was something I was not paying attention to in my attempts to replicate
it needs to be not too big, but not too small either, right?
(it can massively speed up optimized compilation in exchange for slightly lower runtime performance: https://github.com/rust-lang/rust/pull/66625)
(I'm speaking of foo::bar
, here)
Since it is an internal
function in cfail2 and LLVM knows that there is only a single call site, it will probably inline it even if it gets very big
@mw I assume we'd all be happiest if I incorporate your new test into my PR. :smiley:
yes please
Very nice compilation time speedup! I would love to see that in CI.
Very nice compilation time speedup! I would love to see that in CI.
@Josh Triplett this was aimed at a different topic, no?
No, it was related to https://github.com/rust-lang/rust/pull/66625 posted above.