Stream: t-compiler

Topic: self-profiler and parallel queries


Wesley Wiser (Jan 24 2019 at 17:19, on Zulip):

I'm looking at changing how the self-profiler records query execution time to be per-query instead of per-category to allow us to show more detailed profiling data at the query level. As part of this, I think it would be great to also support the parallel queries option (currently, using both -Z self-profile and -Z query-threads=n results in an ICE). One obvious option would be to make the storage use atomics. Another option might be to have some kind of TLS/task context which would record the data without using atomics. Perhaps there's a third option?

Does anyone have an opinion on which would be the better path to go down? cc @Zoxc

Zoxc (Jan 24 2019 at 17:23, on Zulip):

You could use a per thread list of events. I think the more important question is how to visualize things (and what to record).

Zoxc (Jan 24 2019 at 17:25, on Zulip):

I'd like to remove query cache hits from the results without debug_assertions, mostly because that is in the query hot path and isn't affected by the time things take (so enabling debug_assertions doesn't affect the results, assuming we don't run different queries with debug_assertions...)

Wesley Wiser (Jan 24 2019 at 17:27, on Zulip):

You could use a per thread list of events.

Is there an api or something I should use? Or can I just use regular TLS stuff?

Wesley Wiser (Jan 24 2019 at 17:27, on Zulip):

(I've never used Rayon before so I don't know if there's a better way of doing this in that library)

nikomatsakis (Jan 24 2019 at 17:28, on Zulip):

I was expecting per-thread-lists of events

nikomatsakis (Jan 24 2019 at 17:28, on Zulip):

I would probably just use regular TLS

nikomatsakis (Jan 24 2019 at 17:28, on Zulip):

though how to "sweep all the threads"...? I can't remember but I think we make a separate "tcx" per thread or something?

nikomatsakis (Jan 24 2019 at 17:29, on Zulip):

(mm perhaps not?)

nikomatsakis (Jan 24 2019 at 17:29, on Zulip):

but basically I wonder if there is some per-thread state we can already build on (and if not, I suspect we may want it)

Wesley Wiser (Jan 24 2019 at 17:30, on Zulip):

Does Rayon send a shutdown message to each thread to get them to terminate or does it just kill them from the outside or something?

nikomatsakis (Jan 24 2019 at 17:30, on Zulip):

it cerainly doesn't kill them

Wesley Wiser (Jan 24 2019 at 17:30, on Zulip):

Maybe there's a way to send a message to each thread to get them to consolidate the data

nikomatsakis (Jan 24 2019 at 17:30, on Zulip):

I don't remember exactly what hooks there are

Wesley Wiser (Jan 24 2019 at 17:30, on Zulip):

Ok. I guess I should research Rayon more then :)

nikomatsakis (Jan 24 2019 at 17:30, on Zulip):

that said

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

I feel like it'd be nice if we could make this distinct from rayon

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

i'm imagining something like this

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

you have some thread-local state

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

if it is not initialized for a given thread, you create a Arc<ThreadLocalState> and store it in the compiler

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

you also keep one handle for your current thread

nikomatsakis (Jan 24 2019 at 17:31, on Zulip):

I guess that would still have to use atomics

nikomatsakis (Jan 24 2019 at 17:32, on Zulip):

but they would be uncontended operations

Zoxc (Jan 24 2019 at 17:32, on Zulip):

I have a WorkerLocal type, which allows you to have thread local state and collect it up after

Zoxc (Jan 24 2019 at 17:34, on Zulip):

You need ownership to do so though. So you can't do that in GlobalCtxt with safe code (since I removed the 'a lifetime).

Zoxc (Jan 24 2019 at 17:36, on Zulip):

I also want to add a SharedWorkerLocal which would allow you to iterate over everything without ownership

Zoxc (Jan 24 2019 at 17:38, on Zulip):

I wouldn't worry too much about this though. Just a single global list with a mutex should be fine

Wesley Wiser (Jan 24 2019 at 17:38, on Zulip):

Ok. I'll start with that and then we can optimize later.

Wesley Wiser (Jan 24 2019 at 17:38, on Zulip):

Thanks!

Zoxc (Jan 24 2019 at 17:40, on Zulip):

Maybe use HashMap<ThreadId, ProfilingState> so making it thread local later will be easy

Wesley Wiser (Jan 24 2019 at 17:40, on Zulip):

Will do

Wesley Wiser (Jan 24 2019 at 17:40, on Zulip):

BTW, are there any docs describing how parallel queries works? Or the vision for how they're supposed to work?

nikomatsakis (Jan 24 2019 at 17:42, on Zulip):

/me notes the topic about design docs in all hands discussion :)

Wesley Wiser (Jan 24 2019 at 17:43, on Zulip):

@nikomatsakis That's about writing more docs right? I'm not missing some already existing docs? :)

Wesley Wiser (Jan 24 2019 at 17:45, on Zulip):

Last I checked there wasn't anything about parallel queries in the rustc guide.

nikomatsakis (Jan 24 2019 at 19:32, on Zulip):

@Wesley Wiser correct =)

nikomatsakis (Jan 24 2019 at 19:32, on Zulip):

I'm basically saying "I want to try and do more design docs so that the answer is 'yes' and not 'no'"

nikomatsakis (Jan 24 2019 at 19:32, on Zulip):

that said, there were some internals threads that may be helpful

nikomatsakis (Jan 24 2019 at 19:33, on Zulip):

this is probably the main one: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606

Wesley Wiser (Jan 24 2019 at 19:53, on Zulip):

Thanks for the pointer! There's a lot of good info in that thread

mw (Jan 25 2019 at 08:30, on Zulip):

@Wesley Wiser I have a PR to the rustc-guide that will add a little bit about parallel queries: https://github.com/rust-lang/rustc-guide/pull/270/files#diff-7b2b8e856b6cf936c97794a1cc7ec846R212

mw (Jan 25 2019 at 08:30, on Zulip):

that does not go into how it's concretely implemented though

Wesley Wiser (Jan 25 2019 at 14:24, on Zulip):

Thanks @mw!! :tada:

Last update: Nov 16 2019 at 01:30UTC