Stream: t-compiler/major changes

Topic: Never Rollup When Changing the `compiler`… compiler-team#407


triagebot (Feb 15 2021 at 16:40, on Zulip):

A new proposal has been announced: Never Rollup When Changing the compiler Directory #407. It will be announced at the next meeting to try and draw attention to it, but usually MCPs are not discussed during triage meetings. If you think this would benefit from discussion amongst the team, consider proposing a design meeting.

nagisa (Feb 15 2021 at 16:42, on Zulip):

thought: this could also have a milder alternative: when people create a rollup (via https://bors.rust-lang.org/queue/rust I believe is the state of art today), make it obvious to people rolling up that they should prefer rolling up certain PRs but not others.

nagisa (Feb 15 2021 at 16:42, on Zulip):

and possibly warn that they are rolling up PRs that generally should go through queue normally.

nagisa (Feb 15 2021 at 16:43, on Zulip):

I believe we already have a rollup=iffy, so we could piggy-back on that.

Joshua Nelson (Feb 15 2021 at 17:21, on Zulip):

The problem is that even reviewers will mark the PRs as rollup=always

Joshua Nelson (Feb 15 2021 at 17:22, on Zulip):

I don't think the people rolling up are the ones who need to change, they don't have time to look at each PR individually

nagisa (Feb 15 2021 at 17:41, on Zulip):

yeah I meant to suggest that we'd apply rollup=iffy by default. Maybe even disallow rollup=always for changes touching compiler/. That would be less strict than forcing rollup=never.

nagisa (Feb 15 2021 at 17:41, on Zulip):

I don't know if the people doing rollups take the iffy flag into account when rolling up, but if they don't then I think the homu/bors interface could be adjusted to make it more apparent that these are iffy.

tm (Feb 15 2021 at 17:42, on Zulip):

Do we know what impact this will have on time to land compiler related PR? Is the average daily number of successful bors runs > average daily number of compiler related PRs that land currently?

nagisa (Feb 15 2021 at 17:42, on Zulip):

My primary concern is that we may get into a situation where we don't have any other choice but to rollup something, because the queue is full but everything that could be rolled up already was.

rylev (Feb 15 2021 at 17:47, on Zulip):

I'm not familiar enough with the bors queue to say how frequently we run into an issue where we need to roll changes up to have any hope of getting the queue down. rollup=iffy is definitely another choice. It seems that many PRs are being labeled as "maybe", and still be rolled up despite (in my very humble opinion) they being very poor candidates for rollup.

rylev (Feb 15 2021 at 17:48, on Zulip):

Encouraging those performing rollups to be cautious with certain PRs would be another positive change IMO

nagisa (Feb 15 2021 at 17:54, on Zulip):

Is maybe a separate rollup flag? I only know of never, iffy, default, always (in order from least rollable-up to the more rollable-up PRs)

nagisa (Feb 15 2021 at 17:55, on Zulip):

I think maybe might just be the default value?

Jonas Schievink [he/him] (Feb 15 2021 at 17:55, on Zulip):

maybe is the default yeah

rylev (Feb 15 2021 at 18:03, on Zulip):

Yes maybe is the default. I'd be curious to hear from those that do rollups on how they treat maybes - is there a general tendy to roll them up, do they check each one, etc.

Jonas Schievink [he/him] (Feb 15 2021 at 18:04, on Zulip):

I tend to decide based on what code they touch

Jonas Schievink [he/him] (Feb 15 2021 at 18:04, on Zulip):

If they have a small diff, don't touch too many existing tests, and don't touch code that isn't tested during PR CI, I roll them up

rylev (Feb 15 2021 at 18:08, on Zulip):

Yea it's tough. We've seen small refactorings (simply moving some code from one file to another) impact performance (usually do to inlining changes).

Jack Huey (Feb 15 2021 at 18:42, on Zulip):

I think my thoughts on this MCP are: by default, if compiler code is touched, then it should be rollup=iffy; other changes can stay rollup=maybe by default. But otherwise, the policy around manually changing from to rollup=always stays mostly the same: only in cases where it's obviously not going to affect perf, i.e. doc changes, variable renaming, typos, adding tests (and compiler-touching code can be a bit stricter here; small cleanups in library code might also be okay to be marked as rollup=always).

This way, the policy around rollup=never stays mostly the same: PRs that have a high likelihood of perf regressions, regressions, high impact, etc. This might be a good compromise between allowing small (3-5 PR) rollups when CI time is strained (where those might be a mix of rollup=iffy and rollup=maybe) (while also ensuring that PRs that really should never be rolled up aren't) and generally preferring compiler-touching code to not be rolled up.

Eric Huss (Feb 15 2021 at 19:17, on Zulip):

Just a little roughly scanned data: In the past 30 days there have been 171 merges (5.7/day). 31 of those were rollups, totaling 347 PRs (487 total PRs this past 30 days). Of the rollups, there were 157 total PRs that touched /compiler/ (so roughly 5 compiler PRs per rollup).

(This doesn't count beta or stable PRs.)

My instinct is that it will be difficult to reduce the number of compiler PRs in rollups. There's only going to be more PRs over time, and fewer merge slots per day.

rylev (Feb 16 2021 at 09:29, on Zulip):

Thanks for the data. So ~45% of PRs involved in rollups touch the compiler folder. This certainly would decrease our ability to get through the bors queue. Perhaps we could first experiment with rollup=iffyas @nagisa has suggested, but my worry would be that those doing rollups would learn to ignore the iffy labelling.

Perhaps we can come up with ways to more actively encouraging following the guidelines for which PRs to include in rollups. For instance, if a PR labeled as "maybe" or "iffy" contains more than n number of ".rs" files, we could require that the rollup paricipant actively opt into the "risky" rollup and we could specifically mention these PRs in the rollup PR.

rylev (Feb 16 2021 at 09:51, on Zulip):

It's also interesting to note that I mostly see r+ rollup which defaults to rollup=maybe even for changes to documentation or other changes that are IMO very rollup friendly and should be marked rollup=always. Perhaps the entire rollup system is not being used as it was intended and needs some tweaking.

nagisa (Feb 16 2021 at 10:01, on Zulip):

My guess is that we can achieve almost anything with high precision by adjusting the interface people creating roll ups use. Right now it's checking a number of boxes and clicking a button. It is pretty easy to ignore the roll-up flag. If we grouped the prs by roll-up status in separate tables that would already make it way more what is safe to roll-up and what should be left alone. Grouping would also allow us to add some brief informational blurbs directly on the roll-up interface.

nagisa (Feb 16 2021 at 10:03, on Zulip):

Basically what I'm trying to say is we should put the responsibility on getting it right (more often) on the tooling not the people using it.

nagisa (Feb 16 2021 at 10:04, on Zulip):

(the other side of this equation is then automatically marking things as iffy ir never)

Joshua Nelson (Feb 16 2021 at 14:09, on Zulip):

@rylev r+ rollup is the same as rollup=always

Joshua Nelson (Feb 16 2021 at 14:10, on Zulip):

https://forge.rust-lang.org/compiler/reviews.html?highlight=Rollup#rollups

rylev (Feb 16 2021 at 14:11, on Zulip):

Ah the documentation is wrong in another part then:

rollup=maybe: This is the default if you do not specify a rollup status.

Joshua Nelson (Feb 16 2021 at 14:12, on Zulip):

I think it's just worded poorly and referring to r+ without a rollup keyword

rylev (Feb 16 2021 at 14:13, on Zulip):

Ok makes sense. We should clarify that.

rylev (Feb 16 2021 at 14:14, on Zulip):

I'm not sure what good next steps are. I've seen PRs marked as r+ rollup that seem in direct contradiction to the guidelines for when to rollup. And it's clear that we're not doing a good job of rolling up because we almost always have at least one rollup regression in performance triage.

scottmcm (Feb 16 2021 at 18:16, on Zulip):

Note that the prioritization of rollup=never meant that last time I tried rollup=iffy it meant that it took a really long time to get merged, since all the nevers went first, and if it wasn't one of those, then the rollup got priority over it.

nikomatsakis (Feb 24 2021 at 10:55, on Zulip):

I find myself a bit uncertain about how to use the rollup flags

nikomatsakis (Feb 24 2021 at 10:55, on Zulip):

I feel like we could probably automate the default a lot better by looking at the size of the diff and/or changed files

nikomatsakis (Feb 24 2021 at 10:56, on Zulip):

My hunch is that some simple heuristics would get us very far

rylev (Feb 24 2021 at 12:24, on Zulip):

FYI: I'm going to revisit this to include suggestions based on the feedback so far.

rylev (Feb 25 2021 at 14:52, on Zulip):

So to recap: we have a fundamental conflict of wanting to rollup as much as we can (to get through the queue faster) but to never rollup anything that breaks the build or hurts performance. I think an iterative approach to fixing rollups might be best. Some ideas:

Joshua Nelson (Feb 25 2021 at 14:54, on Zulip):

diffs larger than 10 lines in either compiler or bootstrap

I don't think any heuristic for bootstrap can be accurate - half the time it's fixing typos, and the other half one-line changes have enormous impact: https://github.com/rust-lang/rust/pull/81214

rylev (Feb 25 2021 at 15:01, on Zulip):

@Joshua Nelson that's true though this is only for PRs where an explicit rollup status was not specified. In the case of only changing typos, the hope is that the reviewer explicitly marks it for rollup.

Joshua Nelson (Feb 25 2021 at 15:01, on Zulip):

sure, I guess I just don't think the heuristic will have any benefit over the reviewer's judgement

Joshua Nelson (Feb 25 2021 at 15:02, on Zulip):

usually when I leave out a status, it's because I mean rollup=maybe

nagisa (Feb 25 2021 at 15:48, on Zulip):

In my experience performance changes in libstd are almost universally very small PRs.

rylev (Feb 26 2021 at 09:46, on Zulip):

Ok so you're saying that you don't believe any heuristics will be able to better judge whether a PR should be rollup=maybe vs rollup=iffy?

rylev (Feb 26 2021 at 09:48, on Zulip):

Having better metrics on this would help. We should be tracking how often rollups fail (either fail to build) or have performance regressions of some sort and ideally we could track it back to the PR that caused this and how that PR was labeled (rollup=always, rollup=maybe, etc.). It would be nice to know how often a PR labeled rollup=$label causes a failure.

Joshua Nelson (Feb 26 2021 at 14:35, on Zulip):

rylev said:

Ok so you're saying that you don't believe any heuristics will be able to better judge whether a PR should be rollup=maybe vs rollup=iffy?

specifically for src/bootstrap, yes

rylev (Mar 02 2021 at 15:34, on Zulip):

When rollups fail because of a particular PR is the rollup status of that PR changed? Presumably if a PR causes a rollup build to fail, it should be labeled at least rollup=iffy, no?

oli (Mar 02 2021 at 15:41, on Zulip):

the problem is that we don't know that automatically, so it would need to be done manually. Right now, a comment is left on the PR and it is unassigned

oli (Mar 02 2021 at 15:42, on Zulip):

Anecdotally I would say that these PRs aren't more likely to fail future rollups, but... I guess I could measure that by looking at the history of some rollups

rylev (Mar 02 2021 at 16:17, on Zulip):

I'd like to do an analysis of what rollupm status failed PRs received and how likely they are to fail future PRs.

rylev (Mar 03 2021 at 10:57, on Zulip):

Some interesting patterns I've seen:

rylev (Mar 03 2021 at 11:06, on Zulip):

Some concrete suggestions:

rylev (Mar 03 2021 at 11:07, on Zulip):

We can take start here and then improve from there. In particular, I think it would be useful to better track that a PR has failed bors before. Those PRs should most likely be automatically labeled as at least rollup=iffy.

rylev (Mar 03 2021 at 11:45, on Zulip):

(I've closed the MCP issue)

Last update: May 07 2021 at 06:00UTC