Stream: t-compiler/wg-rls-2.0

Topic: rowan-next (bonsai)


Christopher Durham (Feb 26 2020 at 03:20, on Zulip):

cc @matklad

Context links: rust-analyzer/rowan; rust-analyzer/bonsai; cad97/sorbus

In sorbus (sorbus is the genus that the rowan tree belongs to), I've written out a new version of a space-optimized rowan-style green tree. This is, to the extent of my knowledge, fully sound (unlike bonsai, which is definitely unsound in layout calculation as currently written). sorbus currently publicly takes advantage of erasable and slice-dst to do the heavy unsafe lifting, and rc-borrow for +0 Arcs in iteration. It also currently uses the str-index sketch from a bit back, but that can be easily changed to a vendored newtype or text_unit. Additionally, it currently relies on two nightly features hopefully targeted to stabilize in 1.43 (still FCP-merge): a subset of alloc_layout_extras and associated int consts. The former can be shimmed trivially (I've done it before) and the latter is just an import away from not having a feature gate, but if I'm blocked on 1.43 anyway....

Before I put further time into this new internals rewrite of rowan, I have a two-pronged question:

1) Is sorbus the direction we should go with rowan? I find erasable/slice-dst to be a much more coherent story around custom slice DSTs than our previous approaches (thin-dst and scuffed custom VLAs (even a mixed size one at one point)). If so, it'd probably be nice to move it into rust-analyzer, either as rust-analyzer/bonsai or a next branch of rust-analyzer.
2) What index newtype do we want to use (if any)? Do we revisit and clean up str-index as a successor to text_unit, do we just use text_unit as is, do we find a middle ground new release for text_unit, do we vendor a newtype, or do we even just use u32 everywhere (or even u32 internally but usize for API)?
- If we decide to go for str-index as a replacement to text_unit, it should just be a matter of making sure any new API that's been added to text_unit is represented, and a final bikeshed pass. Again, probably a good idea to migrate the repo to the rust-analyzer org if we adopt it.

matklad (Feb 26 2020 at 09:34, on Zulip):

wow, that's some impressive infra @Christopher Durham ! So let me write down some thoughts:

On the one hand, I do want to use a set of reusable abstractions for rowan. I like how swift just uses llvm_trailing_objects to do all the heavy lifting.

On the other hand, I do rationally fear explosion of complexity here (it is hard for me to wrap my head around all those pointers). I also irrationally fear this not being in a single repository. It _seems_ easier when you can read a single create to learn about this stuff.

But I actually do trust you more than myself in the ability to wrangle those DSTs :D

matklad (Feb 26 2020 at 09:35, on Zulip):

Did you manage to get rid of that double indirection in sorbus, when we used references to arc unions?

matklad (Feb 26 2020 at 09:36, on Zulip):

I think it makes sense for me to bite the bullet and just commit to moving to your slice-dst infra :)

matklad (Feb 26 2020 at 09:38, on Zulip):

for 2, I actually was thinking about this the other day and left a couple of fixmes for a breaking change to text-unit:

https://github.com/rust-analyzer/text_unit/commit/0258341dcfb5e7f3074490e058652b36e0631abf#diff-b4aea3e418ccdb71239b96952d9cddb6

matklad (Feb 26 2020 at 09:41, on Zulip):

On think I've realized that I don't like StrIndex any more:

matklad (Feb 26 2020 at 09:41, on Zulip):

So, I'd maybe game for text-size crate which defines TextSize and TextRange.

matklad (Feb 26 2020 at 09:43, on Zulip):

I also am unsure about how to construct TextSize from from chars and strings....

Just now I've though about adding impl TextSize { fn len(what: impl LenTextSize) -> TextSize } to have function overloading....

matklad (Feb 26 2020 at 09:45, on Zulip):

Sorry for all that indecisiveness on my part, I imagine it is very frustrating to you :(

matklad (Feb 26 2020 at 09:49, on Zulip):

Ok, let's start small :)

I want to commit to having 1.0 version of text-unit/str-index/text-size crate in three months. To that end:

matklad (Feb 26 2020 at 09:51, on Zulip):

For sorbus I am maybe not ready to do commitments right now. One specific big design bit I am unsure about is should we keep green/red layers in separate crates, or should they be one crate? The red layer is also tricky, with those thread-local arenas.

matklad (Feb 26 2020 at 09:52, on Zulip):

Btw, @Christopher Durham have you looked at red trees in rowan? I'd love to hear your feedback -- these thread-local object poool shenanigans were just something I've come up with one day, I haven't discussed this broadly and I don't know if there are better ways to do that.

matklad (Feb 26 2020 at 12:04, on Zulip):

@Christopher Durham you might also want to take a look at https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Fwg-rls-2.2E0/topic/Design.20question.3A.20identity.20in.20syntax.20trees

Christopher Durham (Feb 26 2020 at 19:10, on Zulip):

On the other hand, I do rationally fear explosion of complexity here (it is hard for me to wrap my head around all those pointers). I also irrationally fear this not being in a single repository. It _seems_ easier when you can read a single create to learn about this stuff.

This is somewhat already served by pointer-utils having master documentation of all the crates in the collection. You can tab around there and observe how the different parts (which are decently decoupled) work and interact. The nice part is that though geiger shows 71 lines of unsafe in sorbus, they're all trivial to verify against the requirements put forward by pointer-utils.

Did you manage to get rid of that double indirection in sorbus, when we used references to arc unions?

Yep, a green::Element in sorbus is an alignment-tagged union of Arc<green::Node> and Arc<green::Token>. &Element only exists as a temporary to make a single-indirection NodeOrToken<ArcBorrow<'_, Node>, ArcBorrow<'_, Token>>. Doing this in a principled way was actually the primary driver for pointer-utils.

Christopher Durham (Feb 26 2020 at 19:15, on Zulip):

I want to commit to having 1.0 version of text-unit/str-index/text-size crate in three months. To that end: [snip]

TextSize sounds fine to me. I can do a pass over on text_unit to propose how it could look like as text-size. I like the idea of trait TextSized { fn text_size(&self) -> TextSize } and fn TextSize::of(&impl TextSized) for getting the text size of things, modulo exact naming of course.

Though I should mention that 0.99.0 is technically against the semver spec (with 0.MAJOR.PATCH extension), as semver makes no provision for skipping versions :stuck_out_tongue_wink:

matklad (Feb 26 2020 at 19:46, on Zulip):

as semver makes no provision for skipping versions :stuck_out_tongue_wink:

I don't belive semver specifies the initial version, and by how much you should increment

Christopher Durham (Feb 26 2020 at 19:47, on Zulip):

have you looked at red trees in rowan?

I actually have a draft of what the thread-local freelist design could look like rebuilt from first principles over in my rowan/first-principles experimental branch, currently with deployed docs. It manages to unify the "cursor" types in rowan master with the Language-parameterized ones by having a Generic language. Also it currently carries around a Language handle, to enable runtime semantics for interpreters if desired. (Statically known languages would use a ZST, of course.) And hey, that offers an injection point for e.g. span origin information! (Though, maybe a questionable one?)

One specific big design bit I am unsure about is should we keep green/red layers in separate crates, or should they be one crate?

Ideally? The syntax layer could be offered as a layer on top of the green layer, and this would even allow pick-your-abstraction, using no cache, freelist cache, arena cache, etc. Practically? The only bit of "internals" of the green tree that the first-principles experiment uses (IIRC) is the moral equivalent of Union2<ArcBorrow<'_, green::Node>, ArcBorrow<'_, green::Token>>. As of current sorbus does not expose a packed version of NodeOrToken; this is exclusively an internal (to the green tree) type.

Should it be a separate crate, or a separate module? My preference recently has been to treat crates as compilation units; if a clear directional dependency edge (including coherence!) can be drawn, _and publishing on crates-io isn't the primary target_, there should be a split. But I also would put them in a singular repository.

Christopher Durham (Feb 26 2020 at 19:50, on Zulip):

semver tangent:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable; so per the official spec 0.MINOR.PATCH can do anything. Thus the 0.MAJOR.PATCH extension.

But I do believe the initial is specified along with the increment:

  1. Version 1.0.0 defines the public API. 2. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.
Christopher Durham (Feb 26 2020 at 19:52, on Zulip):

You can argue that 5 -> 99 is a numeric increase, and thus follows the spec. So, ok. But you can also argue that a numeric increment is always +1.

matklad (Feb 26 2020 at 19:52, on Zulip):

I concede, you pedantry is stronger than my pedantry, "numeric increment" does sound like a roundabout way to say +1 :D

Christopher Durham (Feb 26 2020 at 19:57, on Zulip):

Looking at Identity in syntax trees makes me feel the first-principles design for the red tree interface even more, though I'd rename the Language trait to LanguageSemantics or something else slightly more general.

Christopher Durham (Feb 26 2020 at 19:58, on Zulip):

I'll start at the base with text-size, though.

Christopher Durham (Feb 26 2020 at 20:00, on Zulip):

(Also, I just like the idea of e.g. sorbus/sorbus (rowan), sorbus/aria (whitebeam), playing on genus/subgenera for splitting crates for the tree impl)
(S. chamaemespilus is even commonly called a "dwarf whitebeam")

Christopher Durham (Feb 26 2020 at 20:00, on Zulip):

(But calling a crate Chamaemespilus is probably brand recognition suicide tbh)

Last update: Sep 30 2020 at 16:30UTC