Stream: rustdoc

Topic: Case insensitive RFC take two


view this post on Zulip Manish Goregaokar (Mar 30 2021 at 22:57):

I wrote up a kinda-RFC, @T-rustdoc please have a look! https://hackmd.io/@nbw0Mih0RLieoV6JBAtI3Q/rkXNEQ-Bd/edit

Feel free to comment there or here, and fill in some of the blank sections. As I said in the intro: this is currently written in RFC format but at present its purpose is to be discussed amongst the team; and there's a chance we may not need an RFC.

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:46):

Currently rustdoc simply overwrites one file with the other, which means that doc generation on most Windows and Mac systems simply drops some files. This is suboptimal.

lol understatement of the year :laughing:

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:47):

Ideally, rustdoc will _never_ link to disambiguation pages on its own; these exist purely to make the URL scheme continue to work.

:+1:

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:48):

For this use case, rustdoc (and cargo doc?) will have a --generate-case-insensitive option that forces the disambiguation behavior regardless of the filesystem being used.

I don't think this common enough to need a cargo flag. I don't see anyone using it besides the standard library.

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:49):

It is a hard error for #[doc(filename)] files to ever clash with other files when compared case insensitively, so you cannot have #[doc(filename = "Bar")] struct Foo; and struct Bar; in the same module.

this is internally consistent, but did you mean to write doc(filename = "bar") to show that it's case-insensitive?

view this post on Zulip Manish Goregaokar (Mar 30 2021 at 23:49):

I didn't but good idea

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:50):

rustdoc::differing_only_by_case

ooh I am so tempted to bikeshed this name but I'll hold off for now :laughing:

view this post on Zulip Manish Goregaokar (Mar 30 2021 at 23:50):

FWIW if --generate-case-insensitive is permaunstable internal-only and if we don't do the lints and such we don't need an RFC (and we can RFC the lints and such later), but I think it would be good to do so

view this post on Zulip Manish Goregaokar (Mar 30 2021 at 23:50):

yeah idgaf about the lint name

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:51):

Manish Goregaokar said:

FWIW if --generate-case-insensitive is permaunstable internal-only and if we don't do the lints and such we don't need an RFC (and we can RFC the lints and such later), but I think it would be good to do so

hmm, do lints need an RFC? a lot of lints have been added recently and we only did an FCP when they were stabilized

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:51):

although in general I'm not a fan of "unstable" lints since the names still leak onto stable, I'd rather do an FCP right away

view this post on Zulip Joshua Nelson (Mar 30 2021 at 23:55):

(And in general rustdoc's stability story is not great because docs.rs uses nightly)

view this post on Zulip Mario Carneiro (Mar 31 2021 at 00:14):

It is a hard error for #[doc(filename)] files to ever clash with other files when compared case insensitively, so you cannot have #[doc(filename = "bar")] struct Foo; and struct Bar; in the same module.

This seems inconsistent with the behavior without doc(filename). Nothing about the doc(filename) feature implies that it has to do with case-insensitive filesystems, although of course that's the motivation here. I think it would make more sense for #[doc(filename = "bar")] struct Foo; to act exactly the same as if the user had written struct bar; except it only affects the URL scheme. So you would get exactly the same disambiguation behavior: a disambiguation page at struct.bar.html is created (if --generate-case-insensitive is on), and the lint is triggered warning about the conflict (not sure if this lint is dependent on --generate-case-insensitive).

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:37):

I'm okay with that, I guess we should just emit another lint.

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:38):

@Joshua Nelson no, lints do not need an RFC. I am morally against lints which cannot be _fixed_ (and this is a rough policy rust has), which is why the lint would need to be paired with doc(filename) -- that _does_ need an RFC

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:38):

also: RFC does not imply stability trains

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:38):

RFC just means we're getting input

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:39):

but i'm totally happy with landing disambiguation first, and figuring out the lint and stuff later

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 05:39):

and doing that as a separate RFC, or a single joined RFC.

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 08:59):

So in the end, what I was afraid of and didn't want to happen is happening: we add a new doc() attribute...

Anyway, commented on the document.

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 17:34):

Oh also, how will it work with "--extern-html-root-url"?

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:00):

(posting this here instead of on the hackmd because of the unresolved question)

Should rustdoc ever link to a disambiguation page directly?

External crates would need to use the same defid map. It shouldn't be difficult to come up with a numbering scheme that can be reliably replayed: for example sort all items in the case insensitive equivalence class by ascii order and number them in increasing order. Unresolved question: this numbering scheme has to either include private items, in which case adding private items changes public URLs, or else private items need a separate name mangling scheme.

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:07):

@Mario Carneiro It doesn't work for the option "--extern-html-root-url" like I said above

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:08):

I'm not sure how those are related

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:09):

I haven't used it but it looks like "--extern-html-root-url" is supposed to set the base path for extern rustdoc links, while the stuff in this RFC is about the name mangling scheme for links inside a (possibly extern) crate, relative to the root path for the crate, whatever it is

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:11):

Also @GuillaumeGomez you responded twice on that RFC to doc(filename) with a complaint that can be paraphrased as "no, just no." Could you elaborate on the reasons behind the disapproval?

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:13):

From my POV it is strictly better to give people an option to tweak behavior rather than not, especially if they don't have to use it and still get reasonable behavior

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:13):

Because, once again, users shouldn't fix tools limitations. If the code is valid, the users shouldn't have to add such things. I used as comparison doc(cfg): we are trying to remove it and instead directly extract the information from the code

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:14):

Well, this proposition has a lot of flaws

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:14):

They can also silence the lint if they are happy with the disambiguation page

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:15):

Adding a lint for that sounds like a really bad thing. "Your code is perfectly fine but our tool cannot handle it because it was poorly deisgn so please fix your perfectly fine code"

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:15):

I'm really unhappy with the proposed solution. The only thing that makes me accept it is that at least it will generate the disambiguator only on windows and macOS

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:16):

Also, the implementation will be a nightmare at some many level...

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:17):

I guess you are assuming that the existing naming scheme doesn't need to be respected? I don't think we can move to change it in any significant way until intra-doc links are around for a while longer and everyone migrates

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:18):

There are still tons of manual intra-doc links out there and changing the naming scheme will break all those links

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:18):

And the naming scheme was even directly taught to people who had to write those links (like me circa 6 months ago)

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:19):

The fact that there is a naming conflict is a direct consequence of the combination of the naming scheme + case insensitive file systems. Tools have nothing to do with it

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:20):

Mario Carneiro said:

I guess you are assuming that the existing naming scheme doesn't need to be respected? I don't think we can move to change it in any significant way until intra-doc links are around for a while longer and everyone migrates

The whole point of this solution is that it allows to keep the current URL scheme. If we don't care about it, then let's fix it once and for all XD

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:20):

Mario Carneiro said:

The fact that there is a naming conflict is a direct consequence of the combination of the naming scheme + case insensitive file systems. Tools have nothing to do with it

It's the fault of the tool in the first place for not taking this limitation into consideration.

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:20):

The tool is not the naming scheme

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:21):

The tool generates the URL scheme

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:21):

But the naming scheme is ossified in the minds of programmers who use it and URL links across the web

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:21):

the tool can't do anything about this

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:21):

It can. The alternative solution I suggested fixed it.

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:22):

No more naming conflicts on windows/mac and no possible conflicts on modules either

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:22):

but it would break all the current URLs, and that's why it was rejected

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:22):

Maybe in an alternate reality where you proposed that pre-1.0 that would work

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:22):

At the time, I was focusing mostly on compiler errors unfortunately :)

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:23):

This is certainly a patch on a suboptimal naming scheme, but that's the price of backward compatibility

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:23):

But I still think it would be possible to switch to this new URL scheme. To make it better, we could simply add a scheme converter on docs.rs and doc.rust-lang.org so that old URLs still work in new docs (it's pretty easy in fact)

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:25):

Backward compatibility could be maintained with a URL scheme converter on docs.rs and doc.rust-lang.org like I suggested above. But not locally unfortunately

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:25):

Even setting aside backward compatibilty, I think Foo-2 looks better than -foo though

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:25):

But it's incoherent. This can change without prior notice and it's a nightmare to implement

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:26):

And we didn't even try to look at modules conflicts yet. I'm very curious at how we will fix it in the current suggestion.

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:26):

Wasn't that addressed in the RFC as well?

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:26):

aa-1/index.html and such

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:27):

For one level eventually, it's fine, but for multiple levels? Oh boy

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:27):

?

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:27):

aa-1/bb-2/struct.FoO-3.html

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:27):

I'm not seeing the issue

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:27):

When you change the name of a module, all its children have to take this change into account because it changes their URL too

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:28):

yeah, that is true for normal module renames too

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:29):

Yep, but then it's expected

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:29):

But you are changing the name of a module, so the name changes

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:29):

this seems quite expected

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:29):

if you have a::b::c::d but a is a has a name conflict, then d is at completely different location

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:30):

different than what?

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:30):

a/b/c/d/index.html doesn't exist anymore if any part of the path has a name conflict

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:31):

Anyway, to try to see all differences, I'm writing a little gist: https://gist.github.com/GuillaumeGomez/a9d5e0d92987599a7966f5363543f81f

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:31):

we'll see where it leads us to

view this post on Zulip Mario Carneiro (Mar 31 2021 at 18:33):

Oh I see, you want a disambiguation page there too. I guess you could make disambiguation pages at all combinations of disambiguated and undisambigated names, although for backward compatibility you only need a/b/c/d/index.html for the disambiguation page and e.g. a-1/b-3/c-2/d-1/index.html for the real thing

view this post on Zulip GuillaumeGomez (Mar 31 2021 at 18:34):

Yes, but you start to see the issue there, no? XD

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:04):

No, you don't need to have disambiguation pages for every combo

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:15):

@GuillaumeGomez Ah, I think I see your issue with modules: Is your worry that _rustdoc_ won't be able to link to disambiguated modules from other crates?

I don't think we have to worry about it -- we can teach rustdoc how to link to this scheme, and for efficiency we can store lists of clashing DefIds in metadata or in a rustdoc cache.

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:15):

basically, if rustdoc needs to link to foo::bar::Baz, it looks up if bar has clashes, then if Baz has clashes

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:16):

this is complicated but doable

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:20):

@GuillaumeGomez so from my POV the purpose of the disambiguation page is simple: it's so that you can type the URL you expect and it _sometimes_ works. We don't actually have to even implement it. The purpose of the disambiguation page is emphatically _not_ for rustdoc to ever link to it

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:20):

and rustdoc should be capable of linking to the right thing directly without going via a disambiguation page

view this post on Zulip Manish Goregaokar (Mar 31 2021 at 22:21):

this is doable


Last updated: Oct 11 2021 at 22:34 UTC