as you talked about getting some numbers I did a try with my 16core 32 thread cpu with windows.
built rustc@797fd92 with
parallel-compiler = true
| Zthreads | total time |
| ------------ | ------------- |
| 1 | 41.62s |
| 2 | 31.92s |
| 4 | 27.34s |
| 8 | 25.00s |
| 16 | 24.69s |
| 32 | 27.42s |
the numbers is for this command
git clean -fdx && set RUSTFLAGS=-Zthreads=N && cargo +stage2 check
Interesting! Looks like windows has similar issues getting slower the more threads it gets
It may mean semaphores aren't as good as we thought
At least on Windows
hm, so that might be just "windows has other problems" -- I've not done any benchmarking at least on windows
(it might be that e.g. semaphores are much better, but don't actually solve the problem, since we still see a ton of contention)
Yeah I think we will want to settle the Linux story anyway first
And then we can tackle other platforms if still necessary
@simulacrum did a local build of https://github.com/rust-lang/rust/pull/67029 and tried the same way as above and the numbers are almost exactly the same with this build
oh! I was going to ping you when it finished :)
but that sounds not great
Do you know how feasible it is to determine how much time is being spent in system vs. userspace?
I do not know how to do that can try to look around a bit
that would be amazing; I don't know anything about Windows I'm afraid :)
what I did yesterday was to try to use WPA that is the gui to windows event tracing and then I see that all the threads is woken at the same time and then there is some delay and then it happens again
so that's the behavior we were suspecting we'd see on linux without semaphores
I guess windows is exhibiting similar behavior then which is ... not great
(considering we're using semaphores there)
I wonder if we can pass some flag or something
can try to see if I can see the same for the new build.
but I do not know the tools and things that well so maybe I miss read the logs
can not find any way to see the system vs user time in windows and maybe the all wake at the same time is not a problem when I have looked at the self-profile chrome tracing logs the execution is bursty so maybe there only was alot of work to do.
@andjo403 thanks for doing the legwork here!
Unfortunately the only API I know of to get user/kernel times is https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getprocesstimes, but that's very low-level and you'd have to iterate over all processes spawned by Cargo
the cpu time unfortunately won't be too useful because if it's all being spent in the kernel that's not too useful
(which is the problem we had on linux)