I’ve been thinking about emplacement lately and how we could "fix" the optimisation-related issue that led to us removing the feature altogether. I’ve come to a conclusion that it is necessary for the whole call graph starting at a function, which is RHS of the emplacement operator, to return its result via an
sret for emplacement to work. This can be done either by:
1. Implicitly creating copies of functions that use the desired calling convention;
2. Requiring users to explicitly annotate functions that are able to participate in emplacement;
3. Changing the "Rust" ABI to always be emplacement-capable.
All of the solutions have their problems, namely: (1) results in a binary bloat; (2) is just inconvenient to use; (3) has a cost when emplacement is not used.
Oh, and in addition to all that, it is also fairly critical that we do not copy all over our stack in debug mode somehow (which requires some sort of a MIR-level propagation to finally happen)
Although this specific lack of optimisation is slightly orthogonal IME.
I'm not sure I understand what trade-off you're getting at. What's the cost with (3), are you talking about small return values that are more profitable to return in registers?
Sure. That’s specifically a tradeoff for the "solution number 3"
oh, another problem with number 3 is that it won’t work if there's a non extern-Rust function call anywhere in the callgraph
I don't understand why we'd force an out-pointer based ABI on functions returning register-sized data. If the data is this small, storing it to the destination it in the caller, if necessary, is basically just as good as storing it there in the callee to begin with
the problem we want to avoid is repeatedly memcpy'ing aggregates of significant size, not moving any data at all twice
This is the orthogonal problem to the emplacement feature itself that I’ve mentioned above.
Also, in my personal opinion being able to avoid moving any data at all twice is a very, very useful property
Its just that achieving this in C++ is super involved, and I’d like to have something better for Rust.
I am not even talking about avoiding memcpy's in code like
let x = foo(); let y = x;. I'm saying, for something like
place <- (foo(), bar()); we'd generally want
bar to write their return values directly to space allocated for the tuple, without any large memcpy's, but no change in ABI is needed if they return something small like
call foo; mov DWORD PTR [tuple.0], eax is basically as good as
lea rdi, [tuple.0]; call foo