@nikomatsakis figured it'd be a good idea to slowly separate a topic for this work - is that fine?
I spent some time to understand the code and slowly tried to convert more stuff on the expansion/hir/analysis boundary only to just now understand recent's @Zoxc (e.g. allocating HIR on arena, pushing more checks to incremental infra) great work :grinning_face_with_smiling_eyes:
so I decided not to touch much stuff from those PRs and instead slowly chip at Session/pass-based code elsewhere
I have a question - I decided to at first querify
Session::proc_macro_decls_static which worked out okay but I found out there's a
proc_macro_decls_static query that is defined for the external crates via cstore, in the librustc_metadata crate
and so I created a separate
proc_macro_decls_static_local crate, which only works for
LOCAL_CRATE - is there a way to combine those into a single query, where for
LOCAL_CRATE key it traverses the AST like before but for other crate nums it tries to pull them from the externally loaded crates?
or is such split still beneficial from data flow visualization/analysis point of view?
for the context, this is my current diff: https://github.com/rust-lang/rust/commit/16bd7050b53f95882eaef0aa2f975316e969b081
You should be able to use the
proc_macro_decls_static query. It will dispatch to the metadata if the crate is external, otherwise it will run the regular provider
oooh, since this exists on
Providers but we can specify functions there separately either locally or via extern providers
makes sense, thank you!
Sorry @Igor Matuszewski just saw this message, but yeah -- "what @Zoxc said", basically =)
I now have a concrete plan on how to bring the queries I defined in https://internals.rust-lang.org/t/making-the-compiler-interface-on-demand-driven/9099/6 into the proper query system without causing performance regressions.
no_hash (added in https://github.com/rust-lang/rust/pull/57770) and
eval_always on the queries, we can avoid the overhead of incremental compilation. We also avoid the need to introduce
Creating the HIR map will be a single query, which also sets up the
HirBody and Hir dep nodes. These will no longer be treated as inputs and forcing them will result in the HIR map query being run.
We can think of this as the HIR map query producting multiple return value from the perspective of incremental compilation.
tcx.hir() will invoke the HIR map in an ignore context and return a reference to the HIR map. Accessing the HIR map will add the required read edges to the dep graph. Since the HIR map is accessed frequently, it will be cached in
GlobalCtxt to avoid the overhead of the query system.
This won't make anything more incremental, but when this is in place we can start splitting these larger queries into smaller incremental ones.
hir_map would be a
no_hash query and then there would be additional, regular queries for specific lookups?
(kind of like the codegen_units query?)
@mw No. We'd just use the HIR map as is (when returned by the query which creates it), as it does it's own dependency tracking.
Yeah, that would work too. But it would be nice to get rid of the special casing in the medium term.
hm, although, how would that work?
a query would do something like
let hir_map = tcx.hir_map(); and then
hir_map.expect_item(...) or something like that.
tcx.hir_map() is a
no_hash query, so every query calling it would always be re-evaluated.
or maybe I am misunderstanding something?
No query will actually call
hir_map though, they'll use
tcx.hir() instead, which calls
hir_map in an ignore context, so no dependency to the
hir_map query will be added
wait what :)
why do we need an ignore context exactly?
you could also imagine that the hir-map is just treated as 'always dirty'
so that it is always constructed?
is that the same thing?
(in general, I'd definitely prefer to have fewer "exceptions" where possible)
If there's no change to the crate, the HIR map will be green, but still executed since we don't save the result
I'm not entirely convinced the HIR map makes a ton of sense, but it seems like if we move the HIR map methods to queries, we can avoid having any exceptions at all, no? (except maybe a "volatile" flag on the base map, purely for efficiency and/or to avoid adding impls / serialization?) maybe it results in some overhead, that is the fear?
We need an ignore context since any change to the crate will result in the HIR map query dep node being red. Any queries which has a dependency on it will need to be executed again.
that seems like it's sort of "by design"
that is, most things should not directly interact with the HIR map
but rather queries on the HIR map
but ok I see, you are aiming not to modify the code that is doing
I feel like I'd rather that the HIR map as an entity goes away, at least eventually, and we just move to
I am aiming to not modify existing code and also avoid the performance overhead of adding queries which just forward information from the HIR map.
it feels like those queries potentially play an important role, though, in terms of helping to propagate changes
right now, at least IIRC, we have special dep-nodes for sub-parts of the HIR map, basically, but we could sort of remove that whole concept and instead just be talking about the results of queries
(but cost is obviously a concern, we may indeed want some special treatment for optimization purposes -- though I feel like I'd sort of prefer if these still "feel like" queries in the end, just specially implemented ones)
it might be workable temporary solution but I really want to avoid the hir map having to duplicate all the dep-tracking and hashing logic.
regarding existing code:
tcx.hir() could return small wrapper struct that provides the interface of the hir map but forwards to regular queries.
I imagine it things to look kind of like this: there's an
hir_map() query that does all the mapping for the crate. then there is a regular
item_hir_map(DefIndex) that returns a map from
LocalId to the actual HIR. and then there's the mentioned wrapper behind
tcx.hir() that keeps providing the existing interface. Something like that. There might have to be a separate query for bodies?
@nikomatsakis I've been thinking about Lark's "fast" parse that was described in the video and was wondering how we might apply the laziness in our case and how expansion comes into play
I imagine we could do a fast parse without doing macro expansion to get a guaranteed, but incomplete set of 'definitions' per given file but when we get to a name resolution on a definition not on that list (meaning it must've been produced during expansion) we might then try to macro-expand a given file and cache any intermediate results?
(I hope this topic is good for general idea/discussion and not specifically the existing query system details :grinning: )
@Igor Matuszewski applying to Rust definitely seems harder in some ways. Something I hadn't fully appreciated before is how having had an "IDE-focused compiler" earlier on might have affected the language design in various ways -- closures, impls, and nested items are all examples of things I might have done differently.
@mw @Zoxc so I was thinking more about this. I agree that as a temporary measure, what @Zoxc proposed makes sense, but for a longer term plan I definitely think we should not be doing anything "fancy" but just leveraging the incremental system.
I am not convinced we want a
hir_map query at all longer term. We should see if we can make HIR lowering more "targeted" -- e.g., I think you can get the HIR for a struct without having to also create the HIR for its sibling items.
I guess that we always have to lower an entire "hir owner" at a time so that we can properly assign
I would be OK with doing @Zoxc's proposal as an intermediate step, but I'd really want to get rid of the manual dep-tracking for the HIR soon thereafter.
I'd prefer to even skip this intermediate step but that might mean maintaining a branch for the refactoring for quite a while.
I think we should remove NodeIds before trying to split the HIR map up. We also need some way to deal with non-item local things like
Agreed, it’d be good to move https://github.com/rust-lang/rust/pull/57578 over the finish line and merge
NodeId and other such refactorings makes sense
I found yet another reason to want my end-to-end queries PR. I should make a to do list of things blocking on it
@Zoxc actually, that would be a good idea :) (a list of things blocked etc)