Stream: t-compiler/wg-self-profile

Topic: measureme performance


Wesley Wiser (Mar 26 2019 at 11:59, on Zulip):

I have a rough prototype integrating measureme with rustc. Here's some quick measurements showing the performance including https://github.com/rust-lang/measureme/pull/7:

Test Case Time taken Profile output file size
regex baseline rustc 3.2s N/A
regex baseline rustc + -Z self-profile 6.1s 423mb
regex rustc + measureme + -Z self-profile 4.0s 19.6mb

The prototype doesn't yet deduplicate query names so the output size should shrink quite a bit more once that is implemented.

cc @mw

mw (Mar 26 2019 at 12:06, on Zulip):

Nice!

mw (Mar 26 2019 at 12:06, on Zulip):

That's already a big improvement.

mw (Mar 26 2019 at 12:09, on Zulip):

It would be great to have an end-to-end test case for event serialization & deserialization.

mw (Mar 26 2019 at 12:09, on Zulip):

then we could test the current serialization sink and later the mmap based one

mw (Mar 26 2019 at 14:45, on Zulip):

@Wesley Wiser, I opened https://github.com/rust-lang/measureme/pull/8 which contains a test case. Seems like the FileSerializationSink works as expected :)

mw (Mar 27 2019 at 10:34, on Zulip):

I've created an implementation of MmapSerializationSink on this branch: https://github.com/michaelwoerister/measureme/tree/mmap-sink-2

mw (Mar 27 2019 at 10:34, on Zulip):

I also added some benchmarks and it looks like the FileSerializationSink is actually faster :/

mw (Mar 27 2019 at 10:35, on Zulip):

at least on Windows. I'll test later on Linux too.

mw (Mar 27 2019 at 10:37, on Zulip):

Note that the current MmapSerializationSink implementation creates and anonymous memory map and then writes it out to disk in one go at the end. I tried having a memory map that is directly backed up by a file but that turned out to be more complicated than I have time for right now.

mw (Mar 27 2019 at 10:37, on Zulip):

I want to try that at some point though.

Wesley Wiser (Mar 27 2019 at 11:14, on Zulip):

I see similar results on macOS:

running 2 tests
test bench_file_serialization_sink ... bench: 188,348,802 ns/iter (+/- 3,373,099)
test bench_mmap_serialization_sink ... bench: 178,566,142 ns/iter (+/- 6,515,375)
Jake Goulding (Mar 27 2019 at 12:02, on Zulip):

a memory map that is directly backed up by a file

Gentle reminder that this in basically intrinsically unsafe in Rust terms, as anything on the system can write to that file, invalidating immutability and single-mutable notions with one quick cat.

mw (Mar 27 2019 at 12:13, on Zulip):

These are the Linux numbers for me:

test bench_file_serialization_sink ... bench: 129,967,832 ns/iter (+/- 12,956,926)
test bench_mmap_serialization_sink ... bench: 126,248,624 ns/iter (+/- 8,204,924)
mw (Mar 27 2019 at 12:13, on Zulip):

so mmap slightly faster

mw (Mar 27 2019 at 12:13, on Zulip):

on windows it was more like 188ms (file) vs 240ms (mmap)

mw (Mar 27 2019 at 12:15, on Zulip):

@Jake Goulding that's fine in this case, i.e. there's no real risk here

mw (Mar 27 2019 at 16:26, on Zulip):

I'd say we use the FileSerializationSink for now. It seems to be OK performance-wise and won't affect memory usage.

Wesley Wiser (Mar 27 2019 at 16:26, on Zulip):

Sounds reasonable to me

Wesley Wiser (Mar 30 2019 at 20:08, on Zulip):

So the earlier measurements didn't include the QueryCacheHit event and that turns out to be most of the events recorded in the profile. The correct measurements are:

Test Case Time taken Profile output file size
regex baseline rustc 3.2s N/A
regex baseline rustc + -Z self-profile 6.23s 416mb
regex rustc + measureme + -Z self-profile 6.8s 74mb

The captured file has 3,087,550 events of which 87% are QueryCacheHit. We should figure out a better way to capture that data.

Wesley Wiser (Apr 01 2019 at 02:31, on Zulip):

I've removed the unnecessary double locking which brings us to

Test Case Time taken Profile output file size
regex baseline rustc 3.2s N/A
regex baseline rustc + -Z self-profile 6.23s 416mb
regex rustc + measureme + -Z self-profile 5.7s 74mb
mw (Apr 01 2019 at 09:25, on Zulip):

yeah that's not good

mw (Apr 01 2019 at 09:26, on Zulip):

we'll also need to profile where exactly we are spending that time

mw (Apr 01 2019 at 09:27, on Zulip):

I'd also be interested in how Mmap sink performs in this scenario

Wesley Wiser (Apr 01 2019 at 12:00, on Zulip):

mmap sink performs much better:

Test Case Time taken Profile output file size
regex baseline rustc 3.2s N/A
regex baseline rustc + -Z self-profile 6.23s 416mb
regex rustc + measureme + -Z self-profile 5.7s 74mb
regex rustc + measureme + -Z self-profile mmap sink 3.7s 74mb
mw (Apr 01 2019 at 14:58, on Zulip):

awesome :)

mw (Apr 01 2019 at 14:59, on Zulip):

I have another version of the mmap sink locally which needs some cleanup but might be even faster.

Wesley Wiser (Apr 01 2019 at 14:59, on Zulip):

Awesome!

mw (Apr 01 2019 at 14:59, on Zulip):

this also tells us that the benchmarks in the crate are no good :)

mw (Apr 01 2019 at 15:00, on Zulip):

it still might be interesting to think about multiple verbosity levels for self-profiling

Wesley Wiser (Apr 01 2019 at 15:00, on Zulip):

Even a modestly sized crate generates an events file with 1,000,000+ events

mw (Apr 01 2019 at 15:01, on Zulip):

does that version already deduplicate strings?

Wesley Wiser (Apr 01 2019 at 15:01, on Zulip):

Yeah

mw (Apr 01 2019 at 15:01, on Zulip):

i.e. does it allocate a new label string per event?

mw (Apr 01 2019 at 15:01, on Zulip):

ok

Wesley Wiser (Apr 01 2019 at 15:01, on Zulip):

It dedups query names and event type strings

mw (Apr 01 2019 at 15:01, on Zulip):

that's a lot of data apparently :)

Wesley Wiser (Apr 01 2019 at 15:01, on Zulip):

The custom GenericEvent labels are not deduplicated

Wesley Wiser (Apr 01 2019 at 15:01, on Zulip):

But I don't think there's a lot to be gained there

mw (Apr 01 2019 at 15:02, on Zulip):

yes, I think so too

Wesley Wiser (Apr 01 2019 at 15:02, on Zulip):

At least currently

mw (Apr 01 2019 at 15:02, on Zulip):

maybe we only need to record query cache hit events when also recording individual query keys

mw (Apr 01 2019 at 15:03, on Zulip):

that will be super slow anyway

Wesley Wiser (Apr 01 2019 at 15:03, on Zulip):

Yeah

Wesley Wiser (Apr 01 2019 at 15:03, on Zulip):

We do something similar with the existing cache stats

mw (Apr 01 2019 at 15:03, on Zulip):

I haven't had time to look at that PR yet

mw (Apr 01 2019 at 15:03, on Zulip):

hopefully tomorrow

Wesley Wiser (Apr 01 2019 at 15:04, on Zulip):

No worries :)

Wesley Wiser (Apr 01 2019 at 15:04, on Zulip):

I think it's mostly working now there's just some unresolved questions

mw (Apr 01 2019 at 15:04, on Zulip):

:+1:

Wesley Wiser (Apr 05 2019 at 00:04, on Zulip):

For anyone following along, I posted some numbers for Windows on the PR

Last update: Nov 15 2019 at 20:35UTC