Stream: wg-traits

Topic: infinite stack growth in chalk


matklad (Oct 24 2019 at 08:47, on Zulip):

I've got a bug report today about rust-analyzer using 200 gigabytes of ram (sic!).

I've tried to catch some stack-traces when this happens, and they seem to point at chalk: https://gist.github.com/matklad/c6320c19b08afc8e6d8ca8612354d5b8

matklad (Oct 24 2019 at 08:48, on Zulip):

I haven't dived into this yet, but I wonder if anyone with a better chalk understanding can think of a reason for such behavior? cc @Florian Diebold

matklad (Oct 24 2019 at 08:52, on Zulip):

Note that I don't think this is a recent chalk regression: we updated chalk 14 days ago, and the issues seems much fresher. My hypothesis is that we just started to run chalk even more aggresively recently (due to find usages), and it hit some nasty code somewhere. Specifically, I can reproduce this problem if I try to find usages of Iterator::next

matklad (Oct 24 2019 at 08:56, on Zulip):

One interesting bit here is that the stacktrace, if I read it right, looks like basically a tight loop which does stacker allocation. This is strange: I would expect more stuff to happen in between calls to stacker::maybe_grow

detrumi (Oct 24 2019 at 08:58, on Zulip):

Looks like the stack_protection feature toggle is on, so somehow the stacker crate doesn't work correctly here

detrumi (Oct 24 2019 at 09:00, on Zulip):

Oh nvm, the 200gb is on the heap, so that's not the issue

matklad (Oct 24 2019 at 09:01, on Zulip):

I am almost sure it is the stacker allocated stack

matklad (Oct 24 2019 at 09:03, on Zulip):

Or maybe not? The stacktrace looks like it's basically allocatiing in a tight loop, but maybe not each maybe_grow_stack is an allocation?

detrumi (Oct 24 2019 at 09:05, on Zulip):

Isn't it growing the stack and storing parts of the stack on the heap when it gets too big?

detrumi (Oct 24 2019 at 09:06, on Zulip):

But looking at those 3 line numbers in logic.rs, it seems like it's recursively looking at the same (sub)goal in a loop

detrumi (Oct 24 2019 at 09:14, on Zulip):

It's weird that the issue started in the last 14 days, though. The only relevant change in Chalk seems to be adding support for Ty::Dyn/Ty::Opaque, which rust-analyzer isn't using yet

matklad (Oct 24 2019 at 09:20, on Zulip):

I think the reason why we are seeing this is that we've started to exercise chalk more, via find usages, which happned yesterday

detrumi (Oct 24 2019 at 09:23, on Zulip):

Ah, this one

nikomatsakis (Oct 24 2019 at 09:50, on Zulip):

Ah, this one

what are you highlighting here, @detrumi ?

nikomatsakis (Oct 24 2019 at 09:50, on Zulip):

(ps this would be a perfect case where it'd be so cool if we had a facility to dump out chalk-ir into a format we can reproduce)

nikomatsakis (Oct 24 2019 at 09:51, on Zulip):

@matklad -- do you have a few more details? like, what are you analyzing?

matklad (Oct 24 2019 at 09:53, on Zulip):

@nikomatsakis just created a non-minimized reproduction:

cargo run --release -p ra_cli -- analysis-stats --with-deps .
matklad (Oct 24 2019 at 09:53, on Zulip):

(--with-deps flag was added moments ago)

nikomatsakis (Oct 24 2019 at 09:53, on Zulip):

ok

matklad (Oct 24 2019 at 09:54, on Zulip):

pasted image

detrumi (Oct 24 2019 at 09:54, on Zulip):

what are you highlighting here, detrumi ?

Just the 'find usages' PR that @matklad was talking about, I didn't highlight any code

matklad (Oct 24 2019 at 09:54, on Zulip):

It says try_reduce_with is a problem, it should be somewhere almong stdlib or analyzer's deps

matklad (Oct 24 2019 at 09:56, on Zulip):

So, it's try_reduce_with from rayon

matklad (Oct 24 2019 at 09:56, on Zulip):

https://github.com/rayon-rs/rayon/blob/fc69e50f298b2f5fa2ce9be27827a0850f3bc8f2/src/iter/try_reduce_with.rs#L7-L19

nikomatsakis (Oct 24 2019 at 09:57, on Zulip):

hmm

nikomatsakis (Oct 24 2019 at 09:58, on Zulip):

I have to run now to get breakfast etc ready for the day, but I guess the obvious thing to do is to try and make a minmal test case from

    fn try_reduce_with<T, OP>(self, op: OP) -> Option<Self::Item>
    where
        OP: Fn(T, T) -> Self::Item + Sync + Send,
        Self::Item: Try<Ok = T>,
    {
        try_reduce_with::try_reduce_with(self, op)
    }

I'm wondering if it's something to do with the Self::Item normalization stuff

nikomatsakis (Oct 24 2019 at 09:59, on Zulip):

er, sorry, wrong try_reduce_with I guess?

nikomatsakis (Oct 24 2019 at 09:59, on Zulip):
pub(super) fn try_reduce_with<PI, R, T>(pi: PI, reduce_op: R) -> Option<T>
where
    PI: ParallelIterator<Item = T>,
    R: Fn(T::Ok, T::Ok) -> T + Sync,
    T: Try + Send,
{
    let full = AtomicBool::new(false);
    let consumer = TryReduceWithConsumer {
        reduce_op: &reduce_op,
        full: &full,
    };
    pi.drive_unindexed(consumer)
}
matklad (Oct 24 2019 at 09:59, on Zulip):

Are more minimal reproduction is

cargo run --release -p ra_cli -- --only "try_reduce_with" ~/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.2.0
Florian Diebold (Oct 24 2019 at 10:01, on Zulip):

@matklad there's an analysis-stats missing there, right

matklad (Oct 24 2019 at 10:01, on Zulip):

yup, fixed

matklad (Oct 24 2019 at 10:03, on Zulip):

I must say, I love how useful analysis_stats is for pin-pointing an offending function: we didn't have quite as good infra in IntelliJ Rust

matklad (Oct 24 2019 at 12:02, on Zulip):

minimized example: https://gist.github.com/matklad/612f7c6142e0b27030711e6047cb55fc to a single file

matklad (Oct 24 2019 at 12:02, on Zulip):

it doesn't compile for various reasons, but still throws chalk into an allocating loop

matklad (Oct 24 2019 at 12:14, on Zulip):

So, the repro looks strange:

matklad (Oct 24 2019 at 12:24, on Zulip):

:tada: , minimized to a rust-analyzer unit-test: https://github.com/matklad/rust-analyzer/commit/d907b00f440ca0a127e6975994a54391ce5a976d

matklad (Oct 24 2019 at 12:27, on Zulip):

@Florian Diebold would you like to look at it closer at some point? I feel you'll be much more effective than me there. On the rust-analyzer side, I've fixed the semi-unrelated bug in find-usages which was the primary trigger for this issue, so it's not urgent anymore

matklad (Oct 24 2019 at 12:30, on Zulip):

I wonder if we should spend some time building a c-reduce-like infra, which can go from analysis-stats --with-deps . to a single-small-file test automatically?

Florian Diebold (Oct 24 2019 at 12:48, on Zulip):

Florian Diebold would you like to look at it closer at some point?

sure, I can probably take a look on the weekend

Florian Diebold (Oct 24 2019 at 12:59, on Zulip):

(the first thing to check here would be the logs from CHALK_DEBUG=1, btw)

nikomatsakis (Oct 24 2019 at 13:00, on Zulip):

I'm going to try to run the unit test

nikomatsakis (Oct 24 2019 at 13:06, on Zulip):

definitely related to associated types

nikomatsakis (Oct 24 2019 at 13:06, on Zulip):

@Florian Diebold I see a number of things like

ProjectionEq((TypeId(2))<!0_1, StructId(0)<(TypeId(0))<!0_2>, (TypeId(0))<!0_2>>> = !0_18446744073709551615)

are those gigantic indices normal for rust-analyzer?

nikomatsakis (Oct 24 2019 at 13:06, on Zulip):

I'm referring to !0_18446744073709551615

nikomatsakis (Oct 24 2019 at 13:07, on Zulip):

I'm guessing that is some kind of internal id

nikomatsakis (Oct 24 2019 at 13:09, on Zulip):

something else I see:

It seems like we're seeing duplicate program clauses, leading us to have duplicate strands. Kind of wasteful.

Example:

     62:: | program clause = for<type> Implemented(^0: TraitId(1)) :- Compatible, DownstreamType(^0), ¯\_(ツ)_/¯
     95:: | program clause = for<type> Implemented(^0: TraitId(1)) :- FromEnv(^0 as TraitId(1))
    116:: | program clause = for<type> Implemented(^0: TraitId(1)) :- Compatible, DownstreamType(^0), ¯\_(ツ)_/¯
    149:: | program clause = for<type> Implemented(^0: TraitId(1)) :- FromEnv(^0 as TraitId(1))
nikomatsakis (Oct 24 2019 at 13:09, on Zulip):

not sure what gives rise to that

Florian Diebold (Oct 24 2019 at 13:16, on Zulip):

I'm referring to !0_18446744073709551615

yeah, that's our representative for "Unknown" :neutral:

Florian Diebold (Oct 24 2019 at 13:17, on Zulip):

we use a placeholder type with max index for that currently

Florian Diebold (Oct 24 2019 at 13:17, on Zulip):

maybe you have a better idea how to represent that? ideally of course, we'd like to have an Error type on Chalk side

nikomatsakis (Oct 24 2019 at 13:17, on Zulip):

maybe you have a better idea how to represent that? ideally of course, we'd like to have an Error type on Chalk side

yeah maybe we should just add that

nikomatsakis (Oct 24 2019 at 13:21, on Zulip):

OK, I think I know what the problem is

nikomatsakis (Oct 24 2019 at 13:21, on Zulip):

let me see if I can make a minimized repro for now

Florian Diebold (Oct 24 2019 at 13:43, on Zulip):

yeah maybe we should just add that

I could try doing that, if we are agreed that it should be added :slight_smile:

nikomatsakis (Oct 24 2019 at 13:44, on Zulip):

(distracted, but I think it has something to do with processing of answers causing new projection-eq subgoals)

nikomatsakis (Oct 24 2019 at 13:44, on Zulip):

I thought we avoided doing that, but something's going wrong

nikomatsakis (Oct 24 2019 at 13:44, on Zulip):

I could try doing that, if we are agreed that it should be added :slight_smile:

I think it makes sense, yeah

nikomatsakis (Oct 29 2019 at 14:31, on Zulip):

I looked a bit more into this test this morning

nikomatsakis (Oct 29 2019 at 14:31, on Zulip):

I haven't yet produced the "chalk example" that I want to

nikomatsakis (Oct 29 2019 at 14:31, on Zulip):

but if you change try_reduce_with to

fn try_reduce_with<PI, R, T>(pi: PI, reduce_op: R) -> Option<T>
where
    PI: ParallelIterator<Item = T>,
    R: FnOnce(T::Ok) -> T, // <-- changed this line from FnOnce(T::Ok, T::Ok)
    T: Try,
{
    pi.drive_unindexed()
}

it seems like everything works more like I expect

nikomatsakis (Oct 29 2019 at 14:32, on Zulip):

I think the next obvious step would be to put a bit of energy into making the standalone chalk test, which shouldn't be that hard

nikomatsakis (Oct 29 2019 at 14:33, on Zulip):

although some of the things that rust-analyzer are doing

nikomatsakis (Oct 29 2019 at 14:33, on Zulip):

are not trivial for us to do in a test

nikomatsakis (Oct 29 2019 at 14:33, on Zulip):

and may or may not play an important role

nikomatsakis (Oct 29 2019 at 14:34, on Zulip):

in particular rust-analyzer is inserting "normalized" associated types into the environment (for T::Ok), which I don't think chalk's testing infrastructure alows us to do right now

nikomatsakis (Oct 29 2019 at 14:34, on Zulip):

maybe worth making a syntax for that

nikomatsakis (Oct 29 2019 at 14:44, on Zulip):

it may be key; my naive attempt to recreate the problem failed -- this test passes

matklad (Nov 08 2019 at 21:51, on Zulip):

@nikomatsakis let's create an inssue on the chalk repo about this, lest it slips through the cracks. I can do that if you like.

nikomatsakis (Nov 08 2019 at 21:51, on Zulip):

@matklad thanks. I had indeed forgotten about it

nikomatsakis (Nov 08 2019 at 21:51, on Zulip):

I remember being close to understanding the problem at some point

nikomatsakis (Nov 08 2019 at 21:51, on Zulip):

I'll have to look back in the history...

matklad (Nov 09 2019 at 10:01, on Zulip):

filed https://github.com/rust-lang/chalk/issues/280

Jack Huey (Nov 09 2019 at 13:25, on Zulip):

(deleted)

Jack Huey (Nov 09 2019 at 13:25, on Zulip):

(deleted)

Jack Huey (Nov 09 2019 at 13:25, on Zulip):

Whoops wrong thread

nikomatsakis (Nov 11 2019 at 20:21, on Zulip):

@matklad or @Florian Diebold : remind me how I redirect to use chalk from a local directory?

matklad (Nov 11 2019 at 20:24, on Zulip):

There’s commented out patch section in the root cargo.toml

matklad (Nov 11 2019 at 20:26, on Zulip):

Smth like this should work: https://github.com/rust-analyzer/rust-analyzer/commit/54f77ad68dc47ca8d99a12f1467bebe8e5441d7e#diff-80398c5faae3c069e4e6aa2ed11b28c0

matklad (Nov 11 2019 at 20:26, on Zulip):

(We should just keep commended out salsa and chalk replacements there I think)

nikomatsakis (Nov 11 2019 at 20:48, on Zulip):

OK, so the another relevant factor is that rust-analyzer sets the max-size to 4

nikomatsakis (Nov 11 2019 at 20:48, on Zulip):

I still haven't reproduced this, but if I se the max-size to 10, it goes away

nikomatsakis (Nov 11 2019 at 21:09, on Zulip):

ok, I reproduced the problem :tada:

Florian Diebold (Nov 11 2019 at 22:21, on Zulip):

OK, so the another relevant factor is that rust-analyzer sets the max-size to 4

I set that so low because of the performance problems, it might not be necessary anymore with floundering

nikomatsakis (Nov 11 2019 at 23:39, on Zulip):

@Florian Diebold well it caught a useful edge case :)

Last update: Nov 12 2019 at 15:30UTC