Stream: t-compiler/wg-self-profile

Topic: binary format versioning


mw (Apr 26 2019 at 07:02, on Zulip):

We might want to look into supporting different versions of the binary format. The specific encoding might change but it would probably not be too hard to make measure support different encodings at the same time. Then people would not need to be so careful about using tools matching the compiler.

mw (Apr 26 2019 at 07:02, on Zulip):

At a minimum this would allow us to give a sensible error message if a tool can't handle a file.

Alice Ryhl (Apr 26 2019 at 07:03, on Zulip):

Besides, it would seem that it's more important that the compiled summarize binary uses the same version of measureme, as the one in the compiler being measured, than that summarize is compiled by that compiler?

Alice Ryhl (Apr 26 2019 at 07:09, on Zulip):

Perhaps one should keep master up to date with the measureme in nightly/rust master, so this matches up every time you pull both from master?

Alice Ryhl (Apr 26 2019 at 07:10, on Zulip):

(or just don't mess up support of previous formats or something of that kind)

mw (Apr 26 2019 at 07:25, on Zulip):

I imagined that measureme would keep supporting old formats

mw (Apr 26 2019 at 07:27, on Zulip):

and yes, what you say is true: the measureme versions have to match up. what compiler the tools are compiled with doesn't matter

Alice Ryhl (Apr 26 2019 at 07:28, on Zulip):

But at least if we support older formats the measureme version would just have to be larger or equal in summarize

mw (Apr 26 2019 at 07:29, on Zulip):

yes

mw (Apr 26 2019 at 07:29, on Zulip):

but if we stick to a stable encoding of just the version identifier then older versions can at least tell that they are dealing with a newer encoding

Alice Ryhl (Apr 26 2019 at 07:30, on Zulip):

That would also be really nice

mw (Apr 26 2019 at 07:31, on Zulip):

I'd probably just add a small header to each of the files

mw (Apr 26 2019 at 07:31, on Zulip):

some magic 4 byte tag and 4 little-endian bytes for a version number

Alice Ryhl (Apr 26 2019 at 07:32, on Zulip):

Yeah, 32 bits for the version sounds enough

mw (Apr 26 2019 at 07:33, on Zulip):

definitely :)

Alice Ryhl (Apr 26 2019 at 07:36, on Zulip):

Do the files have any tag or similar now?

Alice Ryhl (Apr 26 2019 at 07:39, on Zulip):

Also, I was wondering: why is a profiling three files instead of bundling it in one? It seems like having one file would improve usability for the user.

mw (Apr 26 2019 at 07:43, on Zulip):

the files don't have a tag or any kind of header at the moment

mw (Apr 26 2019 at 07:43, on Zulip):

using three files allows us to be a bit more efficient when generating the data

mw (Apr 26 2019 at 07:44, on Zulip):

I think at least :) maybe the circumstances have changed since that decision was made, we should validate that

mw (Apr 26 2019 at 07:45, on Zulip):

however, ideally it shouldn't matter if there's one or three files because the tooling should hide this implementation detail

Alice Ryhl (Apr 26 2019 at 07:46, on Zulip):

I mean my point is you can't really hide that detail. There user has three files in their directory.

mw (Apr 26 2019 at 07:55, on Zulip):

yeah, I imagine that that can be inconvenient when having to get rid of those files manually

mw (Apr 26 2019 at 07:55, on Zulip):

but those binary files are not really meant to live long or be copied somewhere

mw (Apr 26 2019 at 07:56, on Zulip):

they are a means of serializing the profiling data while the compiler runs, in a way that has as little overhead as possible, which is important in order not to skew the measurements too much

mw (Apr 26 2019 at 07:57, on Zulip):

once all profiling is done and the system has resources for the postprocessing step, the binary files should be made obsolete asap

mw (Apr 26 2019 at 07:58, on Zulip):

all of this should be handled by a tool so that the user doesn't have to deal with it

mw (Apr 26 2019 at 07:59, on Zulip):

what exactly this tool looks like we don't know yet :)

Alice Ryhl (Apr 26 2019 at 08:08, on Zulip):

I mean, in principle you could just convert them immediately after the compiling finishes, at which point they no longer skew the profiling. On the other hand, doing IO at all during profiling sounds like it would skew it unnecessarily compared to just keeping it in memory? Maybe. I don't know how large the data is.

mw (Apr 26 2019 at 08:13, on Zulip):

I think for the regex crate (which is medium-sized) it was something like 20 megabytes

mw (Apr 26 2019 at 08:14, on Zulip):

it would certainly make sense to write this data to a RAM-disk of some kind

Alice Ryhl (Apr 26 2019 at 08:15, on Zulip):

You could even pre-allocate a large buffer like 100 MB and write it there, then flush to a file when we're done. Then if the buffer is filled, allocate another 100 MB buffer in some sort of rope to avoid the cost of copying during profiling.

Alice Ryhl (Apr 26 2019 at 08:16, on Zulip):

or if you really need to flush it to a file, try to profile how much time that took?

mw (Apr 26 2019 at 08:17, on Zulip):

we are using a memory map on Unix at the moment, which is a lazily allocated 1GiB buffer

mw (Apr 26 2019 at 08:17, on Zulip):

on Windows it turned out that writing to file immediately performs the best

mw (Apr 26 2019 at 08:18, on Zulip):

the memory map allows us to not do any locking during event recording

Wesley Wiser (Apr 26 2019 at 09:10, on Zulip):

I think for the regex crate (which is medium-sized) it was something like 20 megabytes

Regex is actually 75mb (at least on macOS but I'd imagine all of the platforms are similar)

Wesley Wiser (Apr 26 2019 at 09:10, on Zulip):

It zips down to 12mb though

Alice Ryhl (Apr 26 2019 at 09:11, on Zulip):

Uhh, on my linux laptop it is only 3.4 MB

mw (Apr 26 2019 at 09:11, on Zulip):

@Wesley Wiser was that before event filtering?

Wesley Wiser (Apr 26 2019 at 09:11, on Zulip):

Huh

Wesley Wiser (Apr 26 2019 at 09:11, on Zulip):

Yeah

Wesley Wiser (Apr 26 2019 at 09:11, on Zulip):

Is the default all events?

Alice Ryhl (Apr 26 2019 at 09:11, on Zulip):

My number is just the size of the three files created by cargo +nightly rustc -- -Z self-profile

Wesley Wiser (Apr 26 2019 at 09:12, on Zulip):

Or all - incremental hit?

mw (Apr 26 2019 at 09:12, on Zulip):

the default is everything except for query cache hits, I think

Wesley Wiser (Apr 26 2019 at 09:12, on Zulip):

Oh ok

Wesley Wiser (Apr 26 2019 at 09:12, on Zulip):

Ignore me then

Alice Ryhl (Apr 26 2019 at 09:13, on Zulip):

How did you get 75 MB though?

Wesley Wiser (Apr 26 2019 at 09:13, on Zulip):

That event is like 85% of the file

Wesley Wiser (Apr 26 2019 at 09:13, on Zulip):

When all events are turned on, it's something ike 3.5 million events out of ~4 million total events

Alice Ryhl (Apr 26 2019 at 09:13, on Zulip):

How do you turn on all events though?

Wesley Wiser (Apr 26 2019 at 09:14, on Zulip):

-Z self-profile -Z self-profile-events all

Wesley Wiser (Apr 26 2019 at 09:16, on Zulip):

(This is also a clean compilation which is the worst case)

mw (May 09 2019 at 11:14, on Zulip):

I'm going to look into implementing this...

mw (May 09 2019 at 11:15, on Zulip):

https://github.com/rust-lang/measureme/issues/40

mw (May 09 2019 at 13:08, on Zulip):

Hm, it looks like keeping to support older versions of the binary format is a bit more complicated than I want to implement right now.

mw (May 09 2019 at 13:13, on Zulip):

The StringRef is tied to the binary encoding of the string data. In order to solve that there are a few options:

mw (May 09 2019 at 13:14, on Zulip):

Since we are at version zero anyway, I'll take the easy route and just defer this decision.

mw (May 09 2019 at 13:15, on Zulip):

If we make the measureme suite a rustup component then we might decide to not support old formats at all.

Wesley Wiser (May 09 2019 at 13:52, on Zulip):

I had a thought about this yesterday after the meeting. Could we just make the first event in the file a "metadata" event by convention?

Wesley Wiser (May 09 2019 at 13:52, on Zulip):

We could then stuff a JSON blob or something in the string data for that event

Wesley Wiser (May 09 2019 at 13:53, on Zulip):

The deserializer would know to look at the first event and to try deserializing it's data

Wesley Wiser (May 09 2019 at 13:53, on Zulip):

If that fails, then we know it's an old style file

Wesley Wiser (May 09 2019 at 13:53, on Zulip):

If it succeeds, then we can pull the version from the JSON blob

simulacrum (May 09 2019 at 15:24, on Zulip):

Could we just make a header "event" that's always, say, the first 64 bits of all files and put (for now) just the version info in there?

mw (May 10 2019 at 09:17, on Zulip):

In https://github.com/rust-lang/measureme/issues/15 I suggested to have a string table entry with JSON metadata

mw (May 10 2019 at 09:18, on Zulip):

however, the version determines how events are encoded, so we can't put that information into an encoded event

mw (May 10 2019 at 09:19, on Zulip):

we'd have to decode the event in order to get the version and we have to know the version in order to know how to decode the event

mw (May 10 2019 at 09:20, on Zulip):

I think a simple scheme, as implemented in https://github.com/rust-lang/measureme/pull/41, where we reserve the first 8 bytes of each file for a header with just the version in it is a good choice.

mw (May 10 2019 at 09:21, on Zulip):

we only have to keep the encoding of the header stable, all other decoding can then dispatch on the version number

Last update: Nov 15 2019 at 20:45UTC