One thing that worries me about the parallel compiler is that it will make profiling and benchmarking harder.
Instruction count is the metric with the least variance, by a long way. Currently it correlates well with wall time, which is the true metric of concern (i.e. what users observe). In a parallel compiler, that correlation will be greatly weakened.
As a result, our ability to detect small regressions (and improvements) to compiler performance will be harmed.
More generally, the use of coarse-grained parallelism in compilers is well-established and is known to work well. But I'm worried about fine-grained parallelism. The perf numbers I've seen so far have had some good improvements on some workloads, but also some drastic regressions on others.
I agree that fine-grained parallelism in a compiler is a mostly uncharted field. I wonder what it would look like if we went for a more traditional approach (which usually scales better to distributed compilation).
On the other hand, we should soon be in a position to do a real world evaluation of a compiled with fine-grained parallelism built in, which is pretty interesting.
regarding profiling: the compiler will still allow being locked to only one thread, which should make the correlation between instruction count and wall-time greater again. Maybe we should keep collecting numbers for single-thread runs in order to make detection of regressions easier?
Agree with @mw about regression detection. We can always constraint with one thread to make the performance comparison meaningful.
About profiling, I'm not quite sure whether it refers to profile the compilation using rustc? Is such profiling useful?
We should measure what we ship. I'm worried that measuring single-thread runs would be misleading if we are shipping a multi-threaded compiler.
Isn't multi-threaded codegen already enabled in rustc by default?
Do perf runs disable it (thus decreasing variance) or not (thus not measuring what we ship)?
I believe that we do not disable parallel codegen
We definitely should measure what we ship. But single-threaded runs could be done in addition in order to have improved regression detection.