TLDR: I'm wondering if we should keep the "lazy constant pattern" approach around, which ensures that we don't expand a
const FOO: &[u8] = include_bytes!(super_large_file); if used in a pattern. It has a significant cost to keep around, but saves some compile time for large constants.
While working on https://github.com/rust-lang/rust/pull/70699 I've tried to analyze our entire setup for how constants are used in patterns. We have a very whacky scheme that I've just bolted on more parts over the years. I think we should implement a really thought through scheme once and for all instead of playing whack-a-mole with ICEs and weird "unreachable pattern" bugs. Right now we have both https://github.com/rust-lang/rust/blob/master/src/librustc_mir_build/hair/pattern/const_to_pat.rs#L20 and https://github.com/rust-lang/rust/blob/master/src/librustc_mir_build/hair/pattern/_match.rs#L260 which expand different parts of constants to patterns. Amusingly nowadays both of these are just invoked from https://github.com/rust-lang/rust/blob/master/src/librustc_mir_build/hair/pattern/check_match.rs#L136 right after each other, so merging them will strictly be an improvement.
But even beyond that, we don't actually expand all constants fully into their components. There is/was a worry that if we expanded a
const FOO: &[u8] = include_bytes!(some_huge_file); that compilation would get very slow (as we'd create a slice pattern with one sub-pattern per byte in that file. As const eval gets more powerful, other, similarly large constants may exist. Out of that reason we have a lazy way to expand constants only when needed, but that is a rather buggy scheme (whether a const pattern comes before or after a slice pattern makes compilation behave differently) that I don't really grok (as in I understand all parts of it but I have never managed to hold the entire thing in my head at once).
So... one way to go forward would be to just expand all constants to their component patterns. This would be the easiest way, as we have all the support for user-written patterns already existing. It would be a nicely modular way to keep handling of constants out of the pattern match code.
This also gets very problematic in the presence of slices of zsts, as the user can specify a constant with a repeat expression, which creates a very small constant, but expanding it to a full slice pattern gives us a sub-pattern per slice element.
If we want to keep a lazy scheme, everything becomes a bit more complex. The problem is our exhaustiveness checks. The exhaustiveness checks need to look at all fields. Right now they do this in the hybrid system where some things are expanded and some are not. This is also where the ordering problem comes from. If a constant is used before a slice pattern, the constant did not get expanded and was treated as a single unknown value. So even if a following slice pattern had the exact same value as the constant, we treated it as separate and did not emit an unreachable pattern lint. If the constant comes after, we already know that we are looking at slice patterns of certain lengths and check the constant against that. If we want to fix these things correctly, we'll need to fetch the length of const slice patterns without looking at their elements. The current implementation treats constants as non-aggregate (https://github.com/rust-lang/rust/blob/548afdbe1a600ba868ac00acc4e94cca0242b001/src/librustc_mir_build/hair/pattern/_match.rs#L956).
So my question is whether I should put in work into fixing the lazy const scheme we have or make it dead code by expanding all constants into their smallest components and then removing it.
wouldn't it better, for byte slices specifically, to represent their elements as
Vec<Pat> | &'tcx [u8] or w/e it is?
wrap that in a type and provide element querying methods
you can easily generate other huge slice patterns
ZST arrays in constants created via repeat expressions are one thing
okay so repeats and byte slices
but with const eval you can basically generate any huge constant you want at some point
oh I see that's the concern
and you can convert
include_bytes! (which give you an array) to any super huge type of your own
the last thing works with unions right now
okay but I guess that gets into how we rely on "structural match"
do we use
== and assume we don't know the pattern? do we only try to convert the const into a pattern if we can polymorphically evaluate it?
yes and no
it's fairly arbitrary
that's what I'm trying to say, the current status quo is a mess
sometimes you can break things just by introducing another level of indirection in the type that you match on
also we error if we can't evaluate a constant
so generic parameters of any kind are not a concern
unless we want to support this at some point
then we do need the lazy approach anyway
are people doing things like
const FOO: &[u8] = include_bytes!(super_large_file);?
this seems like something that ought not to be a problem in practice
it likely isn't
that is, it would be a nice surprise to users if it turned out rustc optimised for it, but there's no good reason it should
considering how long pattern matching on consts has been buggy, I think it'd be better to drop the lazy handling and just fix up the existing issues
we could also warn on constants in patterns that get expanded to more than SOME_LIMIT elements
if loading in large consts is the only concern
ah, that's a good idea too
it's going to be obvious the compiler is taking a long time to do something if a user accidentally includes something gargantuan too
I can only really see this happening by accident
I've seen a few people do it intentionally
it seems like a pretty obvious way to bundle large blobs with an application
In priroda I did it intentionally to include all resources in a static bundle: https://github.com/oli-obk/priroda/blob/753b539752b7267ff76bea7366035cc1b97ea789/src/main.rs#L217-L241
yeah but not used for patterns
It's not just slow in patterns
Or, well, was
so generic parameters of any kind are not a concern
Do you mean generic constants here, e.g.,
const Foo<T>: u8 = expr;?
Presumably those would be handled as associated constants are, namely, they result in an error
No I mean both associated constants and const generic parameters of the current function
But in the RFC for https://github.com/rust-lang/rust/issues/31434 there's talk about allowing such constants
Ah; If associated or generic constants would be mentionable in patterns, then they'd need to be some sort of opaque pattern; but that's I think something that definitely would need an RFC
@oli so I wouldn't worry about it this year
so far, all const generics are disallowed in patterns, so it's something we can not think about entirely for now