Stream: t-lang/wg-unsafe-code-guidelines

Topic: validity of str / &str


gnzlbg (Nov 25 2018 at 17:50, on Zulip):

@RalfJ I don't think a &str to a non-UTF8 is undefined behavior - why would you say that ?

gnzlbg (Nov 25 2018 at 17:53, on Zulip):

some operations on &str might be undefined behavior if it does not point to an UTF8 string, but others are definitely not (e.g. str.len() )

gnzlbg (Nov 25 2018 at 17:54, on Zulip):

i see str as just another custom DSTs that we have, and as a custom DST it can impose arbitrary preconditions on how it is supposed to be constructed, its methods, etc.

gnzlbg (Nov 25 2018 at 17:56, on Zulip):

the only way in which I think that constructing a &str that points to an UTF8 string could be undefined behavior is if you want to have a model in which, if you have an object of a type T, then all its safe methods _must_ be safe to call, therefore, T must be valid

gnzlbg (Nov 25 2018 at 17:57, on Zulip):

under that model, creating a string to point to a non-UTF8 string would be undefined behavior, but I wonder, how could we catch that in miri?

gnzlbg (Nov 25 2018 at 17:58, on Zulip):

we could hardcode how to catch that for &str, but thinking of &my_custom_dst with arbitrary validity invariants...

RalfJ (Nov 25 2018 at 18:07, on Zulip):

@RalfJ I don't think a &str to a non-UTF8 is undefined behavior - why would you say that ?

that was just me interpreting what I heard people say

RalfJ (Nov 25 2018 at 18:07, on Zulip):

I am perfectly fine with it not being UB :D miri encodes it as UB currently, but since there is no validation behind references, that doesn't mean much

RalfJ (Nov 25 2018 at 18:08, on Zulip):

i see str as just another custom DSTs that we have, and as a custom DST it can impose arbitrary preconditions on how it is supposed to be constructed, its methods, etc.

ack. that works fine as long as rustc does not exploit this during compilation -- which it could, because ty::Str is a thing it can recognize.

gnzlbg (Nov 25 2018 at 19:27, on Zulip):

we could do &str-specific optimizations, but if we aren't doing them right now I'd rather try to keep all DSTs as simple as possible

gnzlbg (Nov 25 2018 at 19:32, on Zulip):

ideally &str would just be something defined in the core library once we have custom DSTs as a thin wrapper over [u8]

Matthew Jasper (Nov 25 2018 at 19:36, on Zulip):

It's a language type due to literals, not lack of dst support.

rkruppe (Nov 25 2018 at 20:25, on Zulip):

some archeology: https://github.com/rust-lang/rust/pull/19612

gnzlbg (Nov 26 2018 at 09:20, on Zulip):

It might be possible to not make it a language type while still keeping literal support for it (not necessarily via user-defined literals, but some rustc hook for that)

nikomatsakis (Nov 29 2018 at 16:24, on Zulip):

the key thing here is that there are safe methods on &str that require utf-8 representation (or have been in the past). So while it would at minimum be part of the "safety invariant" for str, I think?

gnzlbg (Nov 29 2018 at 16:25, on Zulip):

yes, that would be party of safety

gnzlbg (Nov 29 2018 at 16:25, on Zulip):

but that's something that the API of &str can specify however it wants

gnzlbg (Nov 29 2018 at 16:25, on Zulip):

it wouldn't really need to be spelled out in the UCG

RalfJ (Nov 29 2018 at 16:25, on Zulip):

yes, it's certainly part of safety

RalfJ (Nov 29 2018 at 16:26, on Zulip):

but if the compiler wants to exploit it in any way for optimizations or layout, it must be more than that

nikomatsakis (Nov 29 2018 at 16:27, on Zulip):

it wouldn't really need to be spelled out in the UCG

this is not clear to me, but it's more a question of "scope" of UCG I guess. I think that fundamental types like str (which is a lang item, even) seems to be worth documenting outside of rustdoc, but it's a long term thing.

gnzlbg (Nov 29 2018 at 16:29, on Zulip):

Ah yes, I didn't mean to suggest that the API of &str shouldn't be safe, but rather that this work would be similar to making sure that Vec is safe. We could tackle these types here, but probably at this point it might be better to specify things such that the safety of these types, which is already documented, doesn't break.

nikomatsakis (Nov 29 2018 at 16:30, on Zulip):

yep

Last update: Nov 19 2019 at 18:40UTC