I have been wondering recently if we could rely solely on ThinLTO for all inlining - that should work (from my understanding) with the exception of when dylibs are used (e.g. rustc).
So far I think all
#[inline] imports are codegen-ed at downstream crates. Is my understanding correct and do you think the above idea is worth implementing?
One way of experimenting with this without too much upfront implementation effort would be to modify the compiler to not duplicate inline functions across codegen units and then do performance tests with different numbers of codegen units.
I suspect that ThinLTO can't quite reach the same runtime performance, but it might be worth a try.
I could imagine
-Copt-level=2 doing only ThinLTO while
-Copt-level=3 would do the code duplication.
I think that logic is implemented here but for non-optimized builds. Should I try having a build with this line changed?
Actually there's a debugging opt so I can just do lolbench
I'm abandoning lolbench since it keeps refusing to work, if anyone know a good alternative or whatever please let me know