Stream: t-compiler/wg-self-profile

Topic: dumping sources of LLVM slowness


simulacrum (Sep 27 2019 at 12:24, on Zulip):

@mw So I experimented with the CGU dumping functionality already built-in to rustc, and dumped the CGUs for librustc -- as a case study -- while in incremental mode.

Here are the top CGUs by number of items in them for just librustc, though post-inlining in the mono collector.

Do you think it'd be useful to clean this up and make it a more easily obtainable output from rustc? I could imagine us integrating it with self-profile or just dumping into a text file via -Zdump-cgus or so...

9106 rustc.c0mi9ud5-dep_graph-graph.volatile
9339 rustc_data_structures.9qcb4nu3-in-rustc_interface.f5pc1f2p-box_region.volatile
9582 rustc.c0mi9ud5-ty-query
9792 core.d2krjpsn-in-rustc_mir.34aofsg3-iter-adapters.volatile
9942 hashbrown.dxqb203g-in-rustc_typeck.6k1cg5uj-raw.volatile
11002 rustc.c0mi9ud5-in-rustc_mir.34aofsg3-ty-context.volatile
11903 rustc.c0mi9ud5-in-rustc_typeck.6k1cg5uj-ty-context.volatile
13000 rustc.c0mi9ud5-ty-context
14853 hashbrown.dxqb203g-in-rustc_mir.34aofsg3-raw.volatile
15977 alloc.3yhu0pmp-in-rustc.c0mi9ud5-raw_vec.volatile
19555 alloc.3yhu0pmp-in-rustc_mir.34aofsg3-vec.volatile
19729 core.d2krjpsn-in-rustc.c0mi9ud5-iter-adapters.volatile
26826 rustc.c0mi9ud5-ty-context.volatile
27567 alloc.3yhu0pmp-in-rustc.c0mi9ud5-vec.volatile
32196 hashbrown.dxqb203g-in-rustc.c0mi9ud5-raw.volatile
simulacrum (Sep 27 2019 at 12:26, on Zulip):

I also don't quite understand how we get e.g. hashbrown.dxqb203g-in-rustc_mir.34aofsg3-raw.volatile as a codegen unit inside librustc, since it doesn't have a dependency on librustc_mir at all, so I wouldn't expect that to show up -- maybe I'm missing something though?

simulacrum (Sep 27 2019 at 12:31, on Zulip):

This is the documented syntax

<crate-name>.<crate-disambiguator>[-in-<local-crate-id>](-<component>)*[.<special-suffix>]
<local-crate-id> = <local-crate-name>.<local-crate-disambiguator>
simulacrum (Sep 27 2019 at 12:32, on Zulip):

So I guess one would read that as functions from hashbrown, inlined into rustc_mir due to being generic (volatile)

simulacrum (Sep 27 2019 at 12:33, on Zulip):

oh, wait, this is not from just librustc, this is across all crates in a standard build

simulacrum (Sep 27 2019 at 12:33, on Zulip):

This is from just librustc:

3243 arena.29va6d19-in-rustc.c0mi9ud5.volatile
4361 core.d2krjpsn-in-rustc.c0mi9ud5-slice-sort.volatile
4588 rustc.c0mi9ud5-ty-query-on_disk_cache.volatile
5275 alloc.3yhu0pmp-in-rustc.c0mi9ud5-collections-btree-node.volatile
5439 std.4uew1l7o-in-rustc.c0mi9ud5-thread-local.volatile
7962 core.d2krjpsn-in-rustc.c0mi9ud5-intrinsics.volatile
8535 rustc.c0mi9ud5-ty-query-plumbing.volatile
9106 rustc.c0mi9ud5-dep_graph-graph.volatile
9582 rustc.c0mi9ud5-ty-query
13000 rustc.c0mi9ud5-ty-context
15977 alloc.3yhu0pmp-in-rustc.c0mi9ud5-raw_vec.volatile
19729 core.d2krjpsn-in-rustc.c0mi9ud5-iter-adapters.volatile
26826 rustc.c0mi9ud5-ty-context.volatile
27567 alloc.3yhu0pmp-in-rustc.c0mi9ud5-vec.volatile
32196 hashbrown.dxqb203g-in-rustc.c0mi9ud5-raw.volatile
mw (Sep 27 2019 at 12:42, on Zulip):

So I guess one would read that as functions from hashbrown, inlined into rustc_mir due to being generic (volatile)

"instantiated in rustc_mir" would be more accurate than "inlined into rustc_mir", but otherwise that's correct, yes

mw (Sep 27 2019 at 12:43, on Zulip):

hashbrown seems to instantiate lots of generic code :)

mw (Sep 27 2019 at 12:45, on Zulip):

Regarding your more general question: Yes, I'd like to add some kind of monomorphization tracking into self-profiling

mw (Sep 27 2019 at 12:45, on Zulip):

I don't know what form it would take exactly but seems very useful

simulacrum (Sep 27 2019 at 12:50, on Zulip):

I guess to get a more interesting question: do we think this information is useful? How would one use it?

simulacrum (Sep 27 2019 at 12:50, on Zulip):

cc @Alex Crichton as well since you might be interested

simulacrum (Sep 27 2019 at 12:52, on Zulip):

I think answering that question can help drive figuring out how to integrate

mw (Sep 27 2019 at 12:52, on Zulip):

I think it is more interesting that the function level

mw (Sep 27 2019 at 12:53, on Zulip):

also for actual inlining

simulacrum (Sep 27 2019 at 12:53, on Zulip):

Sure - I have that info too

simulacrum (Sep 27 2019 at 12:53, on Zulip):

Though that I know even less what to do with

mw (Sep 27 2019 at 12:53, on Zulip):

inlining?

simulacrum (Sep 27 2019 at 12:54, on Zulip):

No, the functions - like, sure there's a lot of them, but it's not clear how to act on that

mw (Sep 27 2019 at 12:56, on Zulip):

I guess if you had the actual call graph (that the collector works with) then a tool could find parts of the graph that are instantiate by a single invocation and point you to that invocation

mw (Sep 27 2019 at 12:56, on Zulip):

saying: if you turn this into a trait object, the compiler has X thousand functions less to optimize

mw (Sep 27 2019 at 12:57, on Zulip):

something like that

simulacrum (Sep 27 2019 at 12:57, on Zulip):

Hm that would be interesting

mw (Sep 27 2019 at 12:57, on Zulip):

similar for function inlining

mw (Sep 27 2019 at 12:57, on Zulip):

a tool could tell you for each function how if it gets copied because of inlining

simulacrum (Sep 27 2019 at 12:57, on Zulip):

Though I wonder - e.g. hashbrown can't be, right?

mw (Sep 27 2019 at 12:57, on Zulip):

and how many other inline functions get pulled into each CGU because of it

mw (Sep 27 2019 at 12:58, on Zulip):

Though I wonder - e.g. hashbrown can't be, right?

what do you mean?

simulacrum (Sep 27 2019 at 12:58, on Zulip):

There's no way to have a non-generic hashmap

mw (Sep 27 2019 at 12:59, on Zulip):

true

mw (Sep 27 2019 at 13:00, on Zulip):

a tool might have to know which traits are object safe in order to give good suggestions?

mw (Sep 27 2019 at 13:01, on Zulip):

I don't know what the hashbrown code looks like, but it might still give the author an idea how to refactor things in order to have less generic code

simulacrum (Sep 27 2019 at 13:01, on Zulip):

I guess it would be useful to know that e.g. librustc compile time is 40% hashmaps

mw (Sep 27 2019 at 13:01, on Zulip):

indeed :)

mw (Sep 27 2019 at 13:01, on Zulip):

it's probably also interesting to know how big these functions are

simulacrum (Sep 27 2019 at 13:02, on Zulip):

Yeah I'm working on that

mw (Sep 27 2019 at 13:02, on Zulip):

I bet the hashbrown functions are all rather small

simulacrum (Sep 27 2019 at 13:02, on Zulip):

It's mostly things like ManuallyDrop::deref_mut, tbh

mw (Sep 27 2019 at 13:04, on Zulip):

hm

mw (Sep 27 2019 at 13:04, on Zulip):

I wonder how much of that is left after LLVM has done inlining and function merging

simulacrum (Sep 27 2019 at 13:06, on Zulip):

probably very little -- but that takes time

mw (Sep 27 2019 at 13:08, on Zulip):

is this for a debug or a release build

mw (Sep 27 2019 at 13:08, on Zulip):

I think we are a lot more aggressive about duplicating code for release builds

simulacrum (Sep 27 2019 at 13:08, on Zulip):

release build

simulacrum (Sep 27 2019 at 13:08, on Zulip):

(this is literally just an x.py build with some dumping built in)

mw (Sep 27 2019 at 13:08, on Zulip):

inline functions are treated differently and -Zshare-generics is on by default for debug builds

simulacrum (Sep 27 2019 at 13:14, on Zulip):

I wonder if it's worth trying to run this with -Zshare-generics in release mode, if that's a big win compile time and not a big loss at runtime it might be worth considering

mw (Sep 27 2019 at 13:21, on Zulip):

Seems worth benchmarking, yes

simulacrum (Sep 27 2019 at 13:22, on Zulip):

I'll try to allocate some time to a PR that does that to run it through perf

simulacrum (Sep 27 2019 at 13:31, on Zulip):

oof

1664 stmts for fn syntax::visit[0]::walk_expr[0]<syntax::util[0]::node_count[0]::NodeCounter[0]>
1791 stmts for fn syntax::parse[0]::parser[0]::ty[0]::{{impl}}[0]::parse_ty_common[0]
1832 stmts for fn syntax::print[0]::pprust[0]::{{impl}}[5]::print_item[0]
2073 stmts for fn syntax::ast[0]::{{impl}}[255]::fmt[0]
2073 stmts for fn syntax::print[0]::pprust[0]::{{impl}}[5]::print_expr_outer_attr_style[0]
2225 stmts for fn syntax::parse[0]::parser[0]::expr[0]::{{impl}}[2]::parse_bottom_expr[0]
2241 stmts for fn syntax::feature_gate[0]::check[0]::{{impl}}[0]::check_abi[0]
2738 stmts for fn syntax::feature_gate[0]::check[0]::{{impl}}[1]::visit_item[0]
3868 stmts for fn syntax::parse[0]::parser[0]::item[0]::{{impl}}[0]::parse_item_implementation[0]
4183 stmts for fn proc_macro::bridge[0]::server[0]::{{impl}}[18]::dispatch[0]<syntax::ext[0]::proc_macro_server[0]::Rustc[0]>
3240 stmts for fn rustc::ty[0]::query[0]::{{impl}}[0]::print_stats[0]
4007 stmts for fn rustc::ty[0]::query[0]::plumbing[0]::force_from_dep_node[0]
4744 stmts for fn rustc::ty[0]::layout[0]::{{impl}}[3]::layout_raw_uncached[0]
5092 stmts for fn rustc::ty[0]::context[0]::{{impl}}[14]::print_debug_stats[0]::inner[0]::go[0]
5685 stmts for fn rustc::ty[0]::print[0]::pretty[0]::PrettyPrinter[0]::pretty_print_type[0]<rustc::ty[0]::print[0]::pretty[0]::FmtPrinter[0]<&mut alloc::string[0]::String[0]>>
5685 stmts for fn rustc::ty[0]::print[0]::pretty[0]::PrettyPrinter[0]::pretty_print_type[0]<rustc::ty[0]::print[0]::pretty[0]::FmtPrinter[0]<&mut core::fmt[0]::Formatter[0]>>
6501 stmts for fn rustc::ty[0]::query[0]::{{impl}}[401]::fmt[0]
16401 stmts for fn rustc::dep_graph[0]::dep_node[0]::{{impl}}[14]::new[0]
16401 stmts for fn rustc::dep_graph[0]::dep_node[0]::{{impl}}[14]::new[0]
16401 stmts for fn rustc::dep_graph[0]::dep_node[0]::{{impl}}[14]::new[0]
simulacrum (Sep 27 2019 at 13:31, on Zulip):

in particular those 16,000 statement long new functions are a bit sad to see duplicated

simulacrum (Sep 27 2019 at 13:41, on Zulip):

@mw is there any point in us generating empty functions? At least size_estimate is 0... maybe that's not entirely accurate?

simulacrum (Sep 27 2019 at 13:41, on Zulip):

e.g. I get a lot of:

0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&u64>
0 stmts for fn core::mem[0]::needs_drop[0]<(syntax_pos::SpanData[0], u32)>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<hashbrown::map[0]::HashMap[0]<syntax_pos::SpanData[0], u32, core::hash[0]::BuildHasherDefault[0]<rustc_hash::FxHasher[0]>>>
0 stmts for fn core::mem[0]::align_of[0]<(&str, syntax_pos::symbol[0]::Symbol[0])>
0 stmts for fn core::mem[0]::size_of[0]<syntax_pos::SpanLabel[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&bool>
0 stmts for fn core::mem[0]::size_of[0]<(&str, syntax_pos::symbol[0]::Symbol[0])>
0 stmts for fn core::hint[0]::unreachable_unchecked[0]
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&alloc::vec[0]::Vec[0]<(syntax_pos::span_encoding[0]::Span[0], alloc::string[0]::String[0])>>
0 stmts for fn hashbrown::raw[0]::{{impl}}[5]::free_buckets[0]::{{closure}}[0]<((syntax_pos::hygiene[0]::SyntaxContext[0], syntax_pos::hygiene[0]::ExpnId[0], syntax_pos::hygien
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<syntax_pos::hygiene[0]::HygieneData[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<std::path[0]::PathBuf[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<std::collections[0]::hash[0]::map[0]::HashMap[0]<(syntax_pos::hygiene[0]::SyntaxContext[0], syntax_pos::hygiene[0]::ExpnId[0]
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&u8>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<std::ffi[0]::os_str[0]::OsString[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<std::collections[0]::hash[0]::map[0]::HashMap[0]<syntax_pos::SpanData[0], u32, core::hash[0]::BuildHasherDefault[0]<rustc_has
0 stmts for fn core::mem[0]::size_of[0]<u8>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<std::collections[0]::hash[0]::map[0]::HashMap[0]<&str, syntax_pos::symbol[0]::Symbol[0], core::hash[0]::BuildHasherDefault[0]
0 stmts for fn core::slice[0]::size_from_ptr[0]<syntax_pos::span_encoding[0]::Span[0]>
0 stmts for fn core::mem[0]::size_of[0]<(syntax_pos::span_encoding[0]::Span[0], alloc::string[0]::String[0])>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&syntax_pos::FileName[0]>
0 stmts for fn hashbrown::raw[0]::{{impl}}[5]::free_buckets[0]::{{closure}}[0]<(syntax_pos::SpanData[0], u32), i32, extern "rust-call" fn(()) -> (core::alloc[0]::Layout[0], usi
0 stmts for fn core::mem[0]::size_of[0]<((syntax_pos::hygiene[0]::SyntaxContext[0], syntax_pos::hygiene[0]::ExpnId[0], syntax_pos::hygiene[0]::Transparency[0]), syntax_pos::hyg
0 stmts for fn core::slice[0]::size_from_ptr[0]<syntax_pos::SpanLabel[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<syntax_pos::symbol[0]::Interner[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&std::path[0]::PathBuf[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<arena::DroplessArena[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<core::cell[0]::UnsafeCell[0]<syntax_pos::symbol[0]::Interner[0]>>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<syntax_pos::BytePos[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<core::cell[0]::RefCell[0]<syntax_pos::span_encoding[0]::SpanInterner[0]>>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&alloc::rc[0]::Rc[0]<syntax_pos::SourceFile[0]>>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<core::cell[0]::UnsafeCell[0]<syntax_pos::hygiene[0]::HygieneData[0]>>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&alloc::vec[0]::Vec[0]<syntax_pos::span_encoding[0]::Span[0]>>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&syntax_pos::BytePos[0]>
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<core::cell[0]::BorrowMutError[0]>
0 stmts for fn core::mem[0]::size_of[0]<syntax_pos::span_encoding[0]::Span[0]>
0 stmts for fn core::mem[0]::align_of[0]<(syntax_pos::SpanData[0], u32)>
mw (Sep 27 2019 at 14:04, on Zulip):

I don't remember how the size estimate is done. it might not be reliable

Alex Crichton (Sep 27 2019 at 14:27, on Zulip):

this does seem like very interesting data to me! I suspect that hashbrown is generating so much because it seems literally all functions in the crate are #[inline]...

mw (Sep 27 2019 at 14:28, on Zulip):

maybe not all of them need to be? the majority of the code seems to be generic though

Alex Crichton (Sep 27 2019 at 14:29, on Zulip):

yup...

mw (Sep 27 2019 at 14:29, on Zulip):

i.e. if it ends up in the volatile CGU, it means that it is not inline

Alex Crichton (Sep 27 2019 at 14:29, on Zulip):

hm what would end up in the volatile CGU?

mw (Sep 27 2019 at 14:29, on Zulip):

or maybe it is inline and pulled in by something not inline :)

Alex Crichton (Sep 27 2019 at 14:29, on Zulip):

just generics w/o #{inline]?

mw (Sep 27 2019 at 14:29, on Zulip):

yes

mw (Sep 27 2019 at 14:29, on Zulip):

and everything inline they pull in

Alex Crichton (Sep 27 2019 at 14:30, on Zulip):

that's... probably a lot

Alex Crichton (Sep 27 2019 at 14:30, on Zulip):

we did indeed measure compile time slowdown on perf benchmarks b/c of hashbrown

mw (Sep 27 2019 at 15:05, on Zulip):

and interesting question is: what kind of tool would help improving the situation here?

mw (Sep 27 2019 at 15:06, on Zulip):

/me goes crazy and thinks about profile-guided CGU partitioning :smiley: :explosion:

mw (Sep 27 2019 at 15:08, on Zulip):

/me goes crazy and thinks about profile-guided CGU partitioning :smiley: :explosion:

Although I'm not sure this makes sense. All the information is also available at compile time ??

Alex Crichton (Sep 27 2019 at 15:15, on Zulip):

I've posted https://github.com/rust-lang/rust/pull/64846 to investigate what happens if we remove #[inline] from hashbrown

Alex Crichton (Sep 27 2019 at 15:15, on Zulip):

it could also just very well be the case that hashbrown has a lot more code

simulacrum (Sep 27 2019 at 15:31, on Zulip):

@Alex Crichton Yeah, hashbrown today compiles to the following in a standalone sense

POST INLINING:
CodegenUnit hashbrown.dxqb203g-cgu.0 (3 items):
0 stmts for fn core::ptr[0]::real_drop_in_place[0]<&core::alloc[0]::Layout[0]>
9 stmts for fn core::fmt[0]::{{impl}}[47]::fmt[0]<core::alloc[0]::Layout[0]>
61 stmts for fn hashbrown::{{impl}}[0]::fmt[0]
simulacrum (Sep 27 2019 at 15:55, on Zulip):

Top 15 functions by our size_estimate across all of rust-lang/rust pretty much (not including tools):

total    # of copies   per copy size
278817   17            16401           rustc::dep_graph[0]::dep_node[0]::{{impl}}[14]::new[0]
177814   977           182             core::ptr[0]::swap_nonoverlapping_bytes[0]
85560    1240          69              core::intrinsics[0]::copy_nonoverlapping[0]<u8>
70180    605           116             hashbrown::raw[0]::imp[0]::{{impl}}[0]::load_aligned[0]
62370    462           135             core::alloc[0]::{{impl}}[0]::extend[0]
61920    180           344             core::char[0]::methods[0]::{{impl}}[0]::encode_utf8[0]
51072    532           96              core::alloc[0]::{{impl}}[0]::repeat[0]
44826    1446          31              core::num[0]::{{impl}}[17]::overflowing_mul[0]
33258    1446          23              core::num[0]::{{impl}}[17]::checked_mul[0]
31360    490           64              core::core_arch[0]::simd[0]::{{impl}}[95]::new[0]
31070    26            1195            rustc::dep_graph[0]::dep_node[0]::{{impl}}[13]::can_reconstruct_query_key[0]
29988    588           51              core::alloc[0]::{{impl}}[0]::from_size_align[0]
29951    491           61              rustc::hir[0]::def_id[0]::{{impl}}[36]::eq[0]
29187    423           69              core::intrinsics[0]::copy_nonoverlapping[0]<usize>
26505    855           31              core::num[0]::{{impl}}[17]::overflowing_add[0]
simulacrum (Sep 27 2019 at 15:56, on Zulip):

Some of these are really a bit sad to see so high up, though others mostly make sense

simulacrum (Sep 27 2019 at 15:58, on Zulip):

@Alex Crichton filtering all instantiations down to just hashbrown I see

total   # of copies   per copy size
70180   605           116             hashbrown::raw[0]::imp[0]::{{impl}}[0]::load_aligned[0]
18816   84            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(&rustc::ty[0]::sty[0]::RegionKind[0]
18468   486           38              hashbrown::raw[0]::{{impl}}[1]::next[0]
17696   79            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::ty[0]::UpvarId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16800   75            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::def_id[0]::DefId[0]
16576   74            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16576   74            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16576   74            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16576   74            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
16576   74            224             hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0]
simulacrum (Sep 27 2019 at 15:59, on Zulip):

I guess most of those have their "values" cut off -- but still

Alex Crichton (Sep 27 2019 at 15:59, on Zulip):

@simulacrum what does that mean # of copies?

Alex Crichton (Sep 27 2019 at 15:59, on Zulip):

is this for one rustc crate or the whole crate graph?

simulacrum (Sep 27 2019 at 16:00, on Zulip):

whole crate graph

Alex Crichton (Sep 27 2019 at 16:00, on Zulip):

do you perhaps have a compiler with debug assertions enabled?

simulacrum (Sep 27 2019 at 16:00, on Zulip):

yeah, I do

Alex Crichton (Sep 27 2019 at 16:00, on Zulip):

so load_aligned was instantiated 605 times?

simulacrum (Sep 27 2019 at 16:00, on Zulip):

across the whole crate graph (std, rustc, test, codegen, etc)

Alex Crichton (Sep 27 2019 at 16:00, on Zulip):

this'd be why it's so expensive -- https://github.com/rust-lang/hashbrown/blob/d1ad4fc3aae2ade446738eea512e50b9e863dd0c/src/raw/sse2.rs#L57

Alex Crichton (Sep 27 2019 at 16:01, on Zulip):

but nice data!

Alex Crichton (Sep 27 2019 at 16:01, on Zulip):

hashbrown::raw[0]::{{impl}}[1]::next[0] <- how would I interpret that?

simulacrum (Sep 27 2019 at 16:01, on Zulip):

I believe that's the next function in the 2nd impl in the raw module

simulacrum (Sep 27 2019 at 16:01, on Zulip):

probably an Iterator?

Alex Crichton (Sep 27 2019 at 16:02, on Zulip):

ok cool

Alex Crichton (Sep 27 2019 at 16:02, on Zulip):

looks like this in that case -- https://github.com/rust-lang/hashbrown/blob/d1ad4fc3aae2ade446738eea512e50b9e863dd0c/src/raw/mod.rs#L1185-L1208

Alex Crichton (Sep 27 2019 at 16:02, on Zulip):

which makes sense

Alex Crichton (Sep 27 2019 at 16:03, on Zulip):

not sure why that has 486 copies...

simulacrum (Sep 27 2019 at 16:03, on Zulip):

well, it's probably getting instantiated in every single module ever pretty much inside the compiler

Alex Crichton (Sep 27 2019 at 16:03, on Zulip):

FWIW building libstd with debug assertions, I think all bets are off in terms of compile time

Alex Crichton (Sep 27 2019 at 16:03, on Zulip):

like even really core things like ptr and mem methods have debug assertions

simulacrum (Sep 27 2019 at 16:03, on Zulip):

Is there a way to disable debug asserts for std but enable them for the compiler?

Alex Crichton (Sep 27 2019 at 16:03, on Zulip):

not currently

Alex Crichton (Sep 27 2019 at 16:03, on Zulip):

or at least not that I know of

simulacrum (Sep 27 2019 at 16:04, on Zulip):

we should probably expose that tbh

Alex Crichton (Sep 27 2019 at 16:05, on Zulip):

@simulacrum do you have a branch w/ this data collection I could poke at?

Alex Crichton (Sep 27 2019 at 16:05, on Zulip):

I'd be curious if I can get info out of cargo

simulacrum (Sep 27 2019 at 16:05, on Zulip):

yeah, give me one moment to push it up

simulacrum (Sep 27 2019 at 16:06, on Zulip):

@Alex Crichton https://github.com/Mark-Simulacrum/rust/tree/dump-codegen -- you probably want just the first commit

simulacrum (Sep 27 2019 at 16:06, on Zulip):

I currently have it setup to just dump to /tmp/cg-dumps/{crate name}/{label} if debug asserts are on for all crates

Alex Crichton (Sep 27 2019 at 16:07, on Zulip):

ah so this isn't actually instrumenting llvm

Alex Crichton (Sep 27 2019 at 16:07, on Zulip):

it's just using a size estimate

simulacrum (Sep 27 2019 at 16:07, on Zulip):

right, yeah

simulacrum (Sep 27 2019 at 16:08, on Zulip):

RUSTFLAGS_NOT_BOOTSTRAP="-Zhuman-readable-cgu-names" is also somewhat helpful to figure out where things are being emitted

simulacrum (Sep 27 2019 at 16:08, on Zulip):

but I believe this size estimate has a pretty good correspondence with what we end up codegen'ing -- some constant factor

simulacrum (Sep 27 2019 at 16:09, on Zulip):

I think it'd be feasible to do some more low-level instrumentation of LLVM but I jumped on the fact that we already had this infra mostly in place on master

simulacrum (Sep 27 2019 at 16:09, on Zulip):

i.e., the function was hooked up and such

simulacrum (Sep 27 2019 at 16:11, on Zulip):
cat /tmp/cg-dumps/*/POST_INLINING\:.codegen-units | rg 'stmts for' | sort -n -k1 --parallel=8 | uniq -c > /tmp/st
(printf 'total, # of copies, per copy size\n'; (rg hashbrown /tmp/st | awk '{ print $1*$2 ", " $1 ", " $2 ", " $6 }' | sort -rn -k1 | head -n15)) | column -t -s,
# or
(printf 'total, # of copies, per copy size\n'; (awk '{ print $1*$2 ", " $1 ", " $2 ", " $1=$2=$3=$4 }' /tmp/st | sort -rn -k1 | head -n15)) | column -t -s,
simulacrum (Sep 27 2019 at 16:11, on Zulip):

those are the 2 commands I've been using to get that chart output

simulacrum (Sep 27 2019 at 16:12, on Zulip):

if in non-incremental mode you probably want POST_MERGING instead of POST_INLINING, since we merge into 16 codegen units then or so

simulacrum (Sep 27 2019 at 16:26, on Zulip):

fixed the cutting of the values etc

total, # of copies, per copy size
69575                               605   115       605 115 stmts for fn hashbrown::raw[0]::imp[0]::{{impl}}[0]::load_aligned[0]
18816                               84    224        84 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(&rustc::ty[0]::sty[0]::RegionKind[0], rustc_data_structures::transitive_relation[0]::Index[0])>
17696                               79    224        79 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], ())>
17496                               486   36        486 36 stmts for fn hashbrown::raw[0]::{{impl}}[1]::next[0]
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::ty[0]::UpvarId[0], rustc::ty[0]::UpvarCapture[0])>
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], rustc::ty[0]::sty[0]::FnSig[0])>
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], rustc::infer[0]::canonical[0]::Canonical[0]<rustc::ty[0]::context[0]::UserType[0]>)>
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], alloc::vec[0]::Vec[0]<&rustc::ty[0]::TyS[0]>)>
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], (syntax_pos::span_encoding[0]::Span[0], syntax_pos::symbol[0]::Symbol[0]))>
16800                               75    224        75 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::def_id[0]::DefId[0], rustc::infer[0]::canonical[0]::Canonical[0]<rustc::ty[0]::sty[0]::Binder[0]<rustc::ty[0]::sty[0]::FnSig[0]>>)>
16576                               74    224        74 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], usize)>
16576                               74    224        74 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], rustc::ty[0]::binding[0]::BindingMode[0])>
16576                               74    224        74 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], core::result[0]::Result[0]<(rustc::hir[0]::def[0]::DefKind[0], rustc::hir[0]::def_id[0]::DefId[0]), rustc::util[0]::common[0]::ErrorReported[0]>)>
16576                               74    224        74 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], alloc::vec[0]::Vec[0]<rustc::ty[0]::adjustment[0]::Adjustment[0]>)>
16576                               74    224        74 224 stmts for fn hashbrown::raw[0]::{{impl}}[11]::new[0]<(rustc::hir[0]::item_local_id_inner[0]::ItemLocalId[0], &rustc::ty[0]::TyS[0])>
simulacrum (Sep 27 2019 at 16:27, on Zulip):

(this is with debug asserts in std off, I think)

mw (Sep 30 2019 at 07:29, on Zulip):

@simulacrum There are quite a few methods in DepNode that could be made const fns:

mw (Sep 30 2019 at 07:30, on Zulip):

That might improve things a little.

mw (Sep 30 2019 at 07:34, on Zulip):

(because the compiler might be able to "inline" const-fns away before the monomorphizer sees them)

Last update: Nov 15 2019 at 20:40UTC