@Riccardo D'Ambrosio reports that https://github.com/domenicquirl/cstree, a variation of rowan much closer to libsyntax (and to the early versions of rowan!) significantly (2x) outperforms rowan for their use-case, both in speed and memory.
I haven't looked into this closely yet. However, I tried building rowan with the thread-local node cache completely disabled... and it is somewhat faster.
This ... makes me question design choices :-)
haha, yeah there are some questionable decisions in rowan, i think the biggest of which being cloning and regenerating node data way too much
however, there is a drawback in dropping the tree all at once, if anything is holding stray nodes or tokens then the tree will still be kept alive and you may run into memory leaks. For me it isnt a big problem because my analyses are pretty simple but for RA im not sure if the approach will work
cc @Domenic Quirl who is the dev of the fork btw
Hello everyone! This comes a bit unexpectedly for me, especially since I just created the repository this week. So maybe first of all be aware that there still are a few things not yet put into place, and that I've been working on this in my free time. I was previously using rowan for my current project, which is more or less building new language tooling for a dynamic DSL, and am very happy with it in terms of the functionality it offers.
That said, I originally started looking into the implementation of tree building, i.e. green trees, as that showed up unexpectedly much in my performance data and I was curious. So a lot of the performance improvements I got from using what has now/will now become
cstree stem from changes that happened there before I then went further and also looked at the syntax layer. I am fairly sure there is at least one issue/PR on rowan about only allocating new nodes if they are not deduplicated, and I think @Riccardo D'Ambrosio has a nice picture somewhere of recursive subtree hashing being very visible in his flamegraphs, this is where I started. At some point I brougt in an interner to also deduplicate the token strings, which replaced SmolStr and also had a noticable impact.
Regarding the free list, it was never my primary goal to remove it, except that at one point I had been annoyed that you can't really have generics in the cursor layer. I only looked at it when, after the changes described above, cloning and dropping Rc was the thing that still showed up in benchmarks. So the removal of clones became the introduction of references, which meant the children I want to give out references to have to live somewhere and the tree became persistent, so the free list wasn't used that much anymore. I think it's quite nice that this lets you basically remove the API layer, which has allowed ideas like custom data in tree nodes. However I am not so sure about the touted magnitude of the memory improvements - a lot of them also came from the original changes to the green node allocation and the savings might be somewhat offset by persisting the tree (I know you are aware of that tradeoff, since memoizing red nodes is discussed in the syntax docs). In particular, I don't know how memory performs on very large trees. I have a reasonably large test case, but so far I have only benched parsing (and tree building) and tree traversal speed on that consistently.
Overall I hope it's fine for you that this is now out on the web. I'm happy if this is in any way useful to you, please don't hesitate to ask further questions if you have any!
@Domenic Quirl do you think interning is required even if tokens are deduplicated? I'd expect that token deduplication to be more or less equivalent to interning. (But SmalStr dosn't make too much sense, the text should just be stored in tokens allocation)
I think @Riccardo D'Ambrosio has a nice picture somewhere of recursive subtree hashing being very visible in his flamegraphs
Required is a strong word. In my benchmarks it made a 20-25% difference (depending on whether you look at time or throughput) for the whole parsing process. I expect I'm getting above average mileage because the DSL tends to be fairly repetitive for larger inputs (it has "fixed" actions, so no new function definitions and calls etc.), but also the thing with the text is that even if the token is deduplicated and you don't end up _storing_ the text, you still need to make a SmolStr, which gets made into a token, and only _then_ do you check the token against the cache. I.e. the tree sink makes the SmolStr either way, but maybe it gets dropped again.
This also shows that rowan methods (my fork which just used an arc instead of an rc) were most of the run time Screenshot_1908.png
The first function is descendants_with_tokens
With the interner, you only "unnecessarily" make a key (or actually intern the string, but only in cases where you will also make a new token), and you also only hash that key I believe instead of the actual string
haha, lol, I figured why thread-local cache dosn't help
I originally wanted to allow for different text storage options, so the user can choose what he needs. But that's kinda complicated both because of the type erasure (packed pointers) and because the mechanisms are so different
it helds up to 128 node concurrently -- the thinking is that's enough for dfs
the problem is... we do a bfs, so therer are 100k nodes. oopsie
Yeah for me the biggest impact was iterating over the tree many times, and every time you iterate, every single node and token is cloned. I didn't quite like how rowan did not allow you to have children, descendants, and other things by reference :/