Stream: t-compiler

Topic: stack growing


nagisa (Nov 02 2018 at 15:51, on Zulip):

@Oli btw I have thought of a way to have virtually unlimited stacks in a fairly portable manner that also properly supports stack guards.

nagisa (Nov 02 2018 at 15:51, on Zulip):

on any plaform that has a virtual/physical memory distinction, which is, I believe, everything we care about

nagisa (Nov 02 2018 at 15:52, on Zulip):

Basically mmap without commiting a huge area of address space and then commit pages explicitly on specifically annotated points.

nagisa (Nov 02 2018 at 15:52, on Zulip):

cc @Alex Crichton

nagisa (Nov 02 2018 at 15:53, on Zulip):

If windows did overcommit like posixes, then the explicit points of commit would not be necessary even

nagisa (Nov 02 2018 at 15:57, on Zulip):

I was planning to develop a crate around that concept as soon as I was done with optimize attribute PR, but I got stuck with testing optimize and now I have a headache too :frown:

oli (Nov 02 2018 at 17:10, on Zulip):

I have partially understood what you are saying, but the last time I came in contact with mmap and paging was in uni. I can start reading up on it, but it'll take me a while to get up to speed. If you have it in the cache, I'd rather wait for your concept crate and see how to integrate that. Now that we have an entry point in librustc, we can replace the implementation at any time

nagisa (Nov 02 2018 at 17:14, on Zulip):

Indeed. I want to make a prototype myself anyway.

nikomatsakis (Nov 02 2018 at 18:25, on Zulip):

@nagisa that still requires -- iiuc -- some sort of "maximum size", right?

nikomatsakis (Nov 02 2018 at 18:25, on Zulip):

I remember, at a company I worked at, somebody added a buffer that would reallocate automatically on overflow by intercepting the segv handler

nikomatsakis (Nov 02 2018 at 18:26, on Zulip):

this worked great until we ran into problems where we had to relocate the base address

nikomatsakis (Nov 02 2018 at 18:26, on Zulip):

I remember, at a company I worked at, somebody added a buffer that would reallocate automatically on overflow by intercepting the segv handler

that is, it would allocate the page once you overflowed

nikomatsakis (Nov 02 2018 at 18:26, on Zulip):

(by putting a guard page at the end)

nikomatsakis (Nov 02 2018 at 18:26, on Zulip):

these days, overcommit will kind of do it for you...

nikomatsakis (Nov 02 2018 at 18:27, on Zulip):

anyway maybe I didn't really understand what you are saying

blitzerr (Nov 03 2018 at 00:53, on Zulip):

@nagisa

Basically mmap without commiting a huge area of address space and then commit pages explicitly on specifically annotated points.

What do you mean by committing, do you mean write something on the page so that system is bound to allocate you the memory as linux will lazily allocate memory ?

nagisa (Nov 03 2018 at 05:06, on Zulip):

Yes, that would still require a maximum size of some sort… And, indeed, since we must support 32-bit systems, we’d have to be careful about virtual address space exhaustion… hmm.

nagisa (Nov 03 2018 at 05:09, on Zulip):

@blitzerr Windows does not support overcommit, so it is necessary to "reserve" the virtual memory and "commit" it in two separate calls of VirtualAllocEx. "commit" here refers to reserving actual physical space in RAM and the page file.

nagisa (Nov 03 2018 at 05:11, on Zulip):

@nikomatsakis I did not intend to require use of a sigv signal handler (but I would not prohibit it, as long as mmap is signal safe) and I specifically want to avoid reallocation by making users to allocate all of the necessary virtual address space up-front.

nagisa (Nov 03 2018 at 05:12, on Zulip):

At first I wasn’t thinking of 32-bit systems, and figured that something like 128GB of stack limit ought to be enough to everyone :slight_smile:

nikomatsakis (Nov 08 2018 at 15:39, on Zulip):

I believe @nagisa that the 'main thread' linux stack already operates in this way, right? -- though the compiler runs on a thread, of course.

nagisa (Nov 08 2018 at 15:42, on Zulip):

@nikomatsakis that’s right.

mw (Nov 08 2018 at 15:42, on Zulip):

Seems like a good solution for 64 bit processes (I want to do something like that for serialization buffers too some time)

nagisa (Nov 08 2018 at 15:43, on Zulip):

It might be sensible to implement some sort of hybrid between what stacker does and what I’m suggesting, so that we would not use too much of virtual memory on 32-bit while also virtually never hitting the turns-out-pretty-expensive stacker code paths on 64-bit targets.

mw (Nov 08 2018 at 15:44, on Zulip):

we could also just make 32 bit rustc just use stacker?

nagisa (Nov 08 2018 at 15:45, on Zulip):

Sure, but implementing a common API would still be desirable.

mw (Nov 08 2018 at 15:45, on Zulip):

but yeah, if there's a good hybrid solution, that would be even better

nagisa (Nov 08 2018 at 16:04, on Zulip):

@nikomatsakis well, one difference that comes to mind is that "growing" the stack would still be manual

nagisa (Nov 08 2018 at 16:04, on Zulip):

from the API standpoint

nagisa (Nov 08 2018 at 16:05, on Zulip):

even if on Linux that is not necessary it may be necessary on the other targets.

nagisa (Nov 08 2018 at 16:05, on Zulip):

(such as Windows)

nagisa (Nov 08 2018 at 16:05, on Zulip):

And that conveniently also allows easily integrating hybrid approaches as well.

nikomatsakis (Nov 08 2018 at 16:16, on Zulip):

ah @nagisa so the idea would be that we "reserve" space for things to grow "in place" but not actually allocate that space until later? (essentially doing "overcommitment" of a sort?)

oli (Nov 08 2018 at 17:33, on Zulip):

the turns-out-pretty-expensive stacker code paths

oli (Nov 08 2018 at 17:33, on Zulip):

I'm working on figuring this out

oli (Nov 08 2018 at 17:34, on Zulip):

it might not be as bad as projected in the perf run, as that was using the most naive algorithm available

nagisa (Nov 09 2018 at 11:10, on Zulip):

@nikomatsakis yes, my plan was to allocate a dozen gigs of virtual memory on 64-bit systems and only back it with physical pages on demand.

nagisa (Nov 09 2018 at 11:11, on Zulip):

making the stack… ahem, virtually unlimited

nagisa (Nov 09 2018 at 11:11, on Zulip):

doesn’t work on 32-bit of course, as there ain’t that much virtual memory there

nagisa (Nov 10 2018 at 18:49, on Zulip):

@Oli the first thing I’m doing right now is making a cross-platform stack manipulation library that could replace the stacker’s arch module

nagisa (Nov 10 2018 at 18:49, on Zulip):

this should end up unblocking at least the "only x86" part of the problem that stacker has.

nagisa (Nov 10 2018 at 18:51, on Zulip):

I had hoped to circumvent this issue altoghether by not needing to switch stacks at all, ever, but that is unfeasible (1) on 32-bit targets as discussed above; and (2) there’s nothing that can avoid switching stacks while also integrating nicely with the std’s thread and everything that is built on it.

nagisa (Nov 10 2018 at 18:53, on Zulip):

(2) is because you cannot simply provide your own stack buffer when creating a thread.

oli (Nov 11 2018 at 09:27, on Zulip):

Is 2 fixable by changing the internal thread API?

nagisa (Nov 11 2018 at 09:43, on Zulip):

Maybe? we use rayon though, which has another layer of API over the std's Thread.

nikomatsakis (Nov 12 2018 at 17:55, on Zulip):

@Oli which is the PR that uses stacker?

nikomatsakis (Nov 12 2018 at 17:56, on Zulip):

Maybe? we use rayon though, which has another layer of API over the std's Thread.

we need to revisit this

oli (Nov 13 2018 at 08:17, on Zulip):

https://github.com/rust-lang/rust/pull/55617 is the PR, but I need to update it to use the less-stack-thrashing PR I made to stacker and rerun perf

Jake Goulding (Nov 16 2018 at 02:11, on Zulip):

Where can I learn more about the why behind all this?

oli (Nov 16 2018 at 08:42, on Zulip):

What kind of why are you interested in? I made a list of issues in the PR that are fixed by it. Or are you wondering how those issues came to be?

Jake Goulding (Nov 16 2018 at 14:31, on Zulip):

I made a list of issues in the PR that are fixed by it.

That's a good start for me, thank you

nagisa (Dec 05 2018 at 13:55, on Zulip):

@Oli so you know why your fix didn’t help?

oli (Dec 05 2018 at 13:55, on Zulip):

nope

nagisa (Dec 05 2018 at 13:55, on Zulip):

because the stack that is overflowing for mach is… intense drumming

nagisa (Dec 05 2018 at 13:55, on Zulip):

for the main thread! ba dum tsh

oli (Dec 05 2018 at 13:56, on Zulip):

huh

nagisa (Dec 05 2018 at 13:56, on Zulip):

and that is obviously not affected by the stack size stuff, because that only affects the threads, not main stack

nagisa (Dec 05 2018 at 13:56, on Zulip):

if it was a proper rustc thread the error message would say thread 'rustc' overflowed stack or something not thread 'main' overflowed stack.

nagisa (Dec 05 2018 at 13:57, on Zulip):

This is related to https://github.com/rust-lang/rust/pull/48575

oli (Dec 05 2018 at 14:00, on Zulip):

I even saw that PR after it got merged... I guess it's revert time (+adding a flag)

Pietro Albini (Dec 05 2018 at 14:01, on Zulip):

ugh, and we backported the increase to 1.31.0

nagisa (Dec 05 2018 at 14:12, on Zulip):

@Pietro Albini it is not harmful in most ways though

Pietro Albini (Dec 05 2018 at 14:13, on Zulip):

sure

nagisa (Dec 05 2018 at 14:13, on Zulip):

other than just being an observable increase in memory usage

nagisa (Dec 08 2018 at 21:38, on Zulip):

@Oli omg I dread having to deal with delay slots for mips in its implementation of stack manipulation :D

nagisa (Dec 08 2018 at 21:39, on Zulip):

(good news are is that I have all ARMs and x86(_64) covered)

nagisa (Dec 08 2018 at 21:39, on Zulip):

(I believe, anyway)

nagisa (Dec 08 2018 at 21:41, on Zulip):

(MIPS also has like 20 different ABIs...)

nagisa (Dec 08 2018 at 21:42, on Zulip):

/me goes to re-learn how powerpc assembly works

nagisa (Jan 08 2019 at 18:47, on Zulip):

FWIW I made a PR against stacker that implements support for all the platforms I could muster so far and some interesting questions regarding the maintainership of the crate arose

nagisa (Jan 08 2019 at 18:49, on Zulip):

See https://github.com/alexcrichton/stacker/pull/13 for the discussion

nagisa (Jan 08 2019 at 18:49, on Zulip):

but in summary, if it seems that rustc will become the primary user of the crate… then perhaps it should be maintained by T-compiler?

nagisa (Jan 08 2019 at 18:50, on Zulip):

however we do not per-se have a nice place to put crates like these anywhere, other than in the compiler’s own tree

nagisa (Jan 08 2019 at 18:50, on Zulip):

which I would rather not do.

nagisa (Jan 08 2019 at 18:50, on Zulip):

Also @Oli I’d love to see your comment on the matter (both the maintainership & the idea of psm itself)

nagisa (Jan 08 2019 at 18:54, on Zulip):

Perhaps the ideal solution in my eyes currently is to have a repository in a nursery or a similar group that would contain the crates we end up developing to manipulate the stack as directories

nagisa (Jan 08 2019 at 18:54, on Zulip):

with everyone on T-compiler as maintainer of it.

Taylor Cramer (Jan 08 2019 at 18:59, on Zulip):

i'm really curious what this "infinite stack" idea you have is

Taylor Cramer (Jan 08 2019 at 19:00, on Zulip):

but i guess that's unrelated to the maintainership question you're asking ;)

Taylor Cramer (Jan 08 2019 at 19:01, on Zulip):

On that note, it seems like it would be a worthwhile thing to pull into the nursery at least

nagisa (Jan 08 2019 at 19:04, on Zulip):

i'm really curious what this "infinite stack" idea you have is

My plan is to abuse virtual memory and allocate enough virtual memory to be virtually (ahem…) infinite for anybody involved…

nagisa (Jan 08 2019 at 19:04, on Zulip):

of course there are considerations and issues to solve on 32-bit, but the idea is very sound for 64-bit systems.

nagisa (Jan 08 2019 at 19:05, on Zulip):

That should get rid of almost any overhead that the original stacker aproach might have.

Taylor Cramer (Jan 08 2019 at 19:16, on Zulip):

hmm

Taylor Cramer (Jan 08 2019 at 19:16, on Zulip):

I'm a bit skeptical

Taylor Cramer (Jan 08 2019 at 19:16, on Zulip):

but maybe I'm missing info

Taylor Cramer (Jan 08 2019 at 19:17, on Zulip):

so on modern 64-bit hardware the actual address space is usually still limited to 48 bits or so, right?

Taylor Cramer (Jan 08 2019 at 19:18, on Zulip):

which is something like 280 terabytes

Taylor Cramer (Jan 08 2019 at 19:18, on Zulip):

if each thread grabs a terabyte stack

Taylor Cramer (Jan 08 2019 at 19:18, on Zulip):

you're not that far off from running out

nagisa (Jan 08 2019 at 19:18, on Zulip):

Recently it has been bumped to 57 on x86_64, and other architectures may or may not have their own limits as well

nagisa (Jan 08 2019 at 19:19, on Zulip):

I don’t see a reason to leave the user of such solution out of the decision on what size to use, and also a combination of multiple approaches may be used too.

nagisa (Jan 08 2019 at 19:20, on Zulip):

For a 100MB-sized stack that might in some very rare cases get stacker-switched to another 100MB-sized virtual memory region still amortizes the costs of stack switching greatly

Taylor Cramer (Jan 08 2019 at 19:21, on Zulip):

oh, cool-- i'd be curious how many of the platforms we build rustc for have 57-bit

Taylor Cramer (Jan 08 2019 at 19:21, on Zulip):

but yeah, the idea of grabbing 100MB chunks for a known program like rustc where you can specifically limit the number of per-process threads seems pretty sound

nagisa (Jan 08 2019 at 19:22, on Zulip):

The most important tidbit here is that it ought to grab virtual memory, commiting actual pages on access.

nagisa (Jan 08 2019 at 19:23, on Zulip):

which would not impact the resident-set for your usual-case.

nagisa (Jan 08 2019 at 19:23, on Zulip):

I’m still hazy on how to make that work on Windows as well… but I’ll figure something out

Zoxc (Jan 08 2019 at 19:30, on Zulip):

You only need to allocate enough virtual address space to exhaust physical memory, so you don't need terabyte stacks unless you also have a terabyte of RAM

nagisa (Jan 08 2019 at 19:33, on Zulip):

wouldn’t be good if we couldn’t spawn more than a single thread on one of those supercomputers that are running out of those 48 bits for actual physical memory :slight_smile:

nikomatsakis (Jan 09 2019 at 21:10, on Zulip):

I forked off a discussion about where the crate should live into the "rust-lang crates owned by compiler" topic

nagisa (Jan 18 2019 at 15:56, on Zulip):

One question: what do we do about the procedural macros? We cannot possibly tell people to "invoke this awesome function to make your stack problems go away", right?

nagisa (Jan 18 2019 at 15:57, on Zulip):

weren’t there an idea to run procedural macros in some sort of isolated environment or something, so that it does not crash compiler accidently etc?

nagisa (Jan 18 2019 at 15:59, on Zulip):

I guess it would be less of a problem with stacker being an independent thing... but then it would be ideal if procedural macro authors didn’t need to think about it at all

nikomatsakis (Jan 18 2019 at 16:18, on Zulip):

@nagisa @eddyb did that work, I believe, but we wound up with the same process for perf reasons

Vadim Petrochenkov (Jan 18 2019 at 16:24, on Zulip):

wound up with the same process for perf reasons

Not even same process, same thread.

nagisa (Jan 18 2019 at 16:25, on Zulip):

Right, so it would definitely benefit from at least having a fresh stack

oli (Mar 18 2019 at 14:16, on Zulip):

@nagisa I don't remember, what are the next steps for https://github.com/rust-lang/rust/pull/55617 ? Are you still planning to implement basic windows support in https://github.com/alexcrichton/stacker/pull/13 ?

oli (Mar 18 2019 at 14:17, on Zulip):

Or should we preemptively start moving stacker and psm into one repo and then into the rust-lang org?

nagisa (Mar 18 2019 at 15:51, on Zulip):

The Windows support is there... technically the only thing that does not work is 32-bit windows which can be emulated by using fibers still

nagisa (Mar 18 2019 at 15:51, on Zulip):

The things to do are to figure out why the CI does not pass and merge everything together, move it into the T-compiler purview...

nagisa (Mar 18 2019 at 15:52, on Zulip):

I still am currently adapting to my new job, and dealing with finishing my involvement with the previous one

nagisa (Mar 18 2019 at 15:53, on Zulip):

so I don’t really have much time to dedicate to anything open source currently

Zoxc (Mar 18 2019 at 15:53, on Zulip):

What are you using instead of fibers on x64 Windows?

nagisa (Mar 18 2019 at 15:53, on Zulip):

@Zoxc Nothing, the code just carefully preserves the SEH unwinding information, so it "just works".

nagisa (Mar 18 2019 at 15:53, on Zulip):

on 64-bit anyway.

Zoxc (Mar 18 2019 at 15:54, on Zulip):

Are you just allocating a large stack? =P

nagisa (Mar 18 2019 at 15:55, on Zulip):

Oh, currently we are still allocating stack in chunks, like before. The windows code properly implements rust_psm_on_stack (which switches the stack to a new one) in a way that works on 64-bit windows.

Zoxc (Mar 18 2019 at 15:56, on Zulip):

You can't switch stacks on Windows without fibers though

nagisa (Mar 18 2019 at 15:57, on Zulip):

You very much can, you just need to maintain the necessary information in TIB updated and correct.

nagisa (Mar 18 2019 at 15:57, on Zulip):

i.e. the regular mov %rax, %rsp is not enough.

Zoxc (Mar 18 2019 at 15:57, on Zulip):

No, that doesn't work. I've tried. Leads to fun subtle bugs.

nagisa (Mar 18 2019 at 15:58, on Zulip):

What kinds of bugs have you encountered?

nagisa (Mar 18 2019 at 15:58, on Zulip):

Either way windows fiber code is not gone, so it is always possible to make stacker switch to either method

nagisa (Mar 18 2019 at 15:59, on Zulip):

I’m fine with the cost of fibers if that means we are sure that stuff works.

nagisa (Mar 18 2019 at 15:59, on Zulip):

but I’m fairly confident in my implementation of 64-bit stack switching -- it was made after reverse-engineering the fibers after all ;)

Zoxc (Mar 18 2019 at 16:00, on Zulip):

Well it's possible I missed something, but MS can break it when they want to anyway.

nagisa (Mar 18 2019 at 16:00, on Zulip):

@Zoxc one interesting thing that I managed to make work is regular stack unwinding for backtraces -- it works fully and very well as well

nagisa (Mar 18 2019 at 16:01, on Zulip):

So instead of getting the backtrace for just the "current" stack you get the backtrace for the whole chain.

Zoxc (Mar 18 2019 at 16:01, on Zulip):

I wonder how you managed that? MS added security checks which stops unwinding outside the current stack.

Zoxc (Mar 18 2019 at 16:01, on Zulip):

Err, backtraces should work, actual SEH unwinding won't

nagisa (Mar 18 2019 at 16:02, on Zulip):

yeah, SEH won’t look past the current stack

nagisa (Mar 18 2019 at 16:02, on Zulip):

I still need to put a catch-all handler that restores the stack and reinvokes SEH unwinding...

nagisa (Mar 18 2019 at 16:03, on Zulip):

Ok, gotta run to walk doggo

nagisa (Nov 14 2019 at 15:31, on Zulip):

@oli fwiw I fixed psm, should be good to just rebase the repo and update the dep for the PR

Last update: Nov 22 2019 at 05:35UTC