So, a brief update as to my progress after this morning's meeting:
DW_AT_GNU_dwo_name(which is GNU's pre-standard split dwarf, if DWARF 5 is enabled then LLVM emits
DW_AT_dwo_name), see example below.
The code needs some cleaning up, but so far the changes aren't all that invasive. The example below contains a dwo file path that would be per-codegen-unit, I don't know if that's going to be the case in the end, it's just the strings that I had to hand.
<0><9b>: Abbrev Number: 1 (DW_TAG_compile_unit) <9c> DW_AT_stmt_list : 0xe8 <a0> DW_AT_comp_dir : (indirect string, offset: 0x13b): /home/david/Projects/rust/rust0 <a4> DW_AT_GNU_dwo_name: (indirect string, offset: 0x15b): foo.foo.7rcbfp3g-cgu.0.rcgu.dwo <a8> DW_AT_GNU_dwo_id : 0x357472a2b032d7b9 <b0> DW_AT_low_pc : 0x0 <b8> DW_AT_ranges : 0x40 <bc> DW_AT_GNU_addr_base: 0x0
Once I've got it emitting the dwo files correctly, I'll start looking into platform checks and things like that.
I now have split dwarf working - still needs some tidying up but compiling with
-Zsplit-dwarf=split will output a
.dwo file per codegen-unit which is referenced by the DWARF in the binary. If I run gdb then it appears to load them and if I move them then gdb complains.
I want to clean up the code a little bit, make some changes to the flag I've added, check for appropriate targets (unsure what LLVM does when I pass it this on Windows for example), figure out whether to write to a single dwo file rather than a per-codegen-unit file or if I can link them together or something like that, and check that the single mode works.
Oh, and I need to figure out how to write tests for this.
-Zsplit-dwarf=split be equivalent to
-Zrun-dsymutil=no on macOS?
I'm not familiar with
-Zrun-dsymutil=no on macOS.
On macOS the linker doesn't add the debuginfo to the generated executable. Instead it adds a section that specifies which part of which object file ended up where in the executable. dsymutil is then run to take all the debuginfo for used functions and rewrite it into a
.dSYM file. If
-Zrun-dsymutil=no is used, no
.dSYM file is generated. Instead the temporary object files are kept to make debuggers still able to get the debuginfo.
It's similar, I think:
My understanding is that Split DWARF partitions the debuginfo sections into those that require link-time relocation and those which don't. Those which don't are typically larger. The debuginfo that doesn't require link-time relocation is processed by the linker and that wastes time and memory under normal circumstances, but Split DWARF makes it so that debuginfo won't be seen by the linker. There are two ways it can do that - clang calls them split and single ("kinds of dwarf fission", which comes from the name of the original project to do this in gcc land).
Split fission creates DWO (dwarf object) files containing the debuginfo that doesn't require link-time relocation and the linker doesn't look at them at all; the objects contain
DW_AT_GNU_dwo_id DWARF attributes which have a path to the file (it's relative currently). Those attributes change if LLVM thinks we're doing DWARF 5 but it all works the same as far as I can tell.
Single fission still writes the debuginfo to the relocatable object but in such a way that its ignored by the linker - I don't know more about it than that.
So, compared with
-Zsplit-dwarf=split will put debuginfo in a separate file, but whether or not that's one file or many depends on how I implement this - currently it outputs a dwo file per-codegen-unit - I suspect that'll change and I'll just output a
foo.dwo alongside the
foo binary, and using save-temps might keep the original per-codegen-unit files but I don't know exactly yet, not looked into how to do any of that part yet.
Does that make sense?
Opened draft PR at #77117 with what I've got so far.
Updated the PR today to resolve the linking issue that I described at our last issue, turns out that LLVM has a tool for doing what I needed which I wasn't aware of.
cc #t-compiler > split dwarf and dependencies
Last updated: Oct 21 2021 at 21:20 UTC