I'm looking at changing how the self-profiler records query execution time to be per-query instead of per-category to allow us to show more detailed profiling data at the query level. As part of this, I think it would be great to also support the parallel queries option (currently, using both
-Z self-profile and
-Z query-threads=n results in an ICE). One obvious option would be to make the storage use atomics. Another option might be to have some kind of TLS/task context which would record the data without using atomics. Perhaps there's a third option?
Does anyone have an opinion on which would be the better path to go down? cc @Zoxc
You could use a per thread list of events. I think the more important question is how to visualize things (and what to record).
I'd like to remove query cache hits from the results without
debug_assertions, mostly because that is in the query hot path and isn't affected by the time things take (so enabling
debug_assertions doesn't affect the results, assuming we don't run different queries with
You could use a per thread list of events.
Is there an api or something I should use? Or can I just use regular TLS stuff?
(I've never used Rayon before so I don't know if there's a better way of doing this in that library)
I was expecting per-thread-lists of events
I would probably just use regular TLS
though how to "sweep all the threads"...? I can't remember but I think we make a separate "tcx" per thread or something?
(mm perhaps not?)
but basically I wonder if there is some per-thread state we can already build on (and if not, I suspect we may want it)
Does Rayon send a shutdown message to each thread to get them to terminate or does it just
kill them from the outside or something?
it cerainly doesn't kill them
Maybe there's a way to send a message to each thread to get them to consolidate the data
I don't remember exactly what hooks there are
Ok. I guess I should research Rayon more then :)
I feel like it'd be nice if we could make this distinct from rayon
i'm imagining something like this
you have some thread-local state
if it is not initialized for a given thread, you create a
Arc<ThreadLocalState> and store it in the compiler
you also keep one handle for your current thread
I guess that would still have to use atomics
but they would be uncontended operations
I have a
WorkerLocal type, which allows you to have thread local state and collect it up after
You need ownership to do so though. So you can't do that in
GlobalCtxt with safe code (since I removed the
I also want to add a
SharedWorkerLocal which would allow you to iterate over everything without ownership
I wouldn't worry too much about this though. Just a single global list with a mutex should be fine
Ok. I'll start with that and then we can optimize later.
HashMap<ThreadId, ProfilingState> so making it thread local later will be easy
BTW, are there any docs describing how parallel queries works? Or the vision for how they're supposed to work?
/me notes the topic about design docs in all hands discussion :)
@nikomatsakis That's about writing more docs right? I'm not missing some already existing docs? :)
Last I checked there wasn't anything about parallel queries in the rustc guide.
@Wesley Wiser correct =)
I'm basically saying "I want to try and do more design docs so that the answer is 'yes' and not 'no'"
that said, there were some internals threads that may be helpful
this is probably the main one: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
Thanks for the pointer! There's a lot of good info in that thread
@Wesley Wiser I have a PR to the rustc-guide that will add a little bit about parallel queries: https://github.com/rust-lang/rustc-guide/pull/270/files#diff-7b2b8e856b6cf936c97794a1cc7ec846R212
that does not go into how it's concretely implemented though
Thanks @mw!! :tada: