Stream: t-compiler/cargo-bisect-rustc

Topic: Use github api


pnkfelix (Feb 26 2020 at 18:00, on Zulip):

I have a prototype branch of cargo-bisect-rustc that uses Github API directly. It is pretty cool.

pnkfelix (Feb 26 2020 at 18:03, on Zulip):

I originally tried to build it atop the hubcaps crate, but I eventually gave up on that (due to the current release of hubcaps not having support for requesting lists of commits) and made an ad-hoc solution atop reqwest. Luckily we really don't need to use a very large API surface at all.

Chris Simpkins (Feb 26 2020 at 18:40, on Zulip):

Excellent! I was just taking a look at the API last night to do the same thing. It looks like much of the git module can go away, including the clone of the rust repository.

Chris Simpkins (Feb 26 2020 at 18:41, on Zulip):

I submitted a PR last night that updates reqwest to the current release. The GET request API changed in the current release. Async is default now so imports will change to the blocking library.

Chris Simpkins (Feb 26 2020 at 18:56, on Zulip):

This part of the API: https://developer.github.com/v3/repos/commits/#list-pull-requests-associated-with-commit

looks like it might also support automation of the regression commit SHA -> PR in the final report

Chris Simpkins (Feb 26 2020 at 19:01, on Zulip):

@pnkfelix If you don't mind pushing a branch, I'd be interested to see how you are approaching it

pnkfelix (Feb 26 2020 at 20:27, on Zulip):

@Chris Simpkins : https://github.com/pnkfelix/cargo-bisect-rustc/tree/use-github-api

Chris Simpkins (Feb 26 2020 at 20:28, on Zulip):

pnkfelix said:

Chris Simpkins : https://github.com/pnkfelix/cargo-bisect-rustc/tree/use-github-api

Tyvm! I will take a look tonight.

pnkfelix (Feb 26 2020 at 20:28, on Zulip):

I'm going to do a bunch of cleanup of that commit series before I make any PR for it

pnkfelix (Feb 26 2020 at 20:28, on Zulip):

but you can at least see what I settled on in my final approach

pnkfelix (Feb 27 2020 at 12:06, on Zulip):

Chris Simpkins said:

This part of the API: https://developer.github.com/v3/repos/commits/#list-pull-requests-associated-with-commit

looks like it might also support automation of the regression commit SHA -> PR in the final report

interesting. In practice I don't think we need this, because we only bisect down to the bors commits, which have the PR's in their commit message. (I.e. that mapping isn't happening, I think, in cargo-bisect-rustc; its just an artifact of how bors formats its merge commit messages.)

simulacrum (Feb 27 2020 at 12:26, on Zulip):

@pnkfelix one thing about using github API -- it might make sense for us to stand up a proxy or something to it that authenticates the specific requests we need. the rate limits without a token are pretty painful...

simulacrum (Feb 27 2020 at 12:27, on Zulip):

e.g. we can add an endpoint to triagebot that'll return exactly what bisect needs, but without requiring auth on the bisect end

Pietro Albini (Feb 27 2020 at 12:27, on Zulip):

couldn't that be used to dos triagebot?

pnkfelix (Feb 27 2020 at 12:27, on Zulip):

eh, I figured if one hits the rate limit the code in bisect could just fall back on the old strategy?

simulacrum (Feb 27 2020 at 12:27, on Zulip):

hm, how many requests do we make?

Pietro Albini (Feb 27 2020 at 12:27, on Zulip):

the unauthenticated rate limit is basically unusable, 60 req/hour from an ip

pnkfelix (Feb 27 2020 at 12:27, on Zulip):

(I haven't implemented that strategy yet, but it was my plan)

pnkfelix (Feb 27 2020 at 12:28, on Zulip):

((by "old strategy", I mean making a local checkout of the repo.))

Pietro Albini (Feb 27 2020 at 12:28, on Zulip):

so if you're under a corp nat or on ci it's practically 0 req/forever

simulacrum (Feb 27 2020 at 12:28, on Zulip):

triagebot will likely soon have access to 12,500 requests/hour, roughly, and presumably we can even go further than that (i.e., cache the results -- we don't have that many PRs)

simulacrum (Feb 27 2020 at 12:30, on Zulip):

I would be very hesitant on relying on the non-authenticated API though to be clear, even if we have fallback

simulacrum (Feb 27 2020 at 12:30, on Zulip):

I'd guess we'd hit that fallback a ton and it's pretty expensive fallback

Pietro Albini (Feb 27 2020 at 12:31, on Zulip):

hmm, what we could do is store a plaintext mapping of commit => pr somewhere

simulacrum (Feb 27 2020 at 12:32, on Zulip):

well sure that's basically the same as triagebot caching it in the db, imo

simulacrum (Feb 27 2020 at 12:32, on Zulip):

(well, worse, to be honest)

Pietro Albini (Feb 27 2020 at 12:33, on Zulip):

isn't that out of scope for triagebot though?

Pietro Albini (Feb 27 2020 at 12:33, on Zulip):

it seems to be like it's drifting to be more of the "everything the team needs bot"

simulacrum (Feb 27 2020 at 12:33, on Zulip):

I consider triagebot our single solution to triage (including things like this). But of course we can stand up a separate service too...

Pietro Albini (Feb 27 2020 at 12:34, on Zulip):

which is not... bad, and I sort of like that, but we would need to at least rename it :)

simulacrum (Feb 27 2020 at 12:34, on Zulip):

well, I mean, at github it's "rustbot" already :)

simulacrum (Feb 27 2020 at 12:34, on Zulip):

it'll probably be a github app soon which'll also mean it's in the bot namespace of usernames

Chris Simpkins (Feb 27 2020 at 17:58, on Zulip):

Thoughts about implementing @pnkfelix approach until the triagebot support is implemented? Even with the GH API rate limit issue, this will be a significant improvement for those of us who are using the tool for one off bisect diagnostics on ICE reports. These are low volume API hits and would stay well within the 60/hr range in my case/IP. And the only "downside" for users who hit the rate limit is that they fall back to the current approach to bare clone the Rust git repository that is the current default in the tool

simulacrum (Feb 27 2020 at 20:14, on Zulip):

I think it's fine to merge it in, but I'd want to contemplate it in a release at least a bit

simulacrum (Feb 27 2020 at 20:15, on Zulip):

(maybe gather some feedback from internal dogfooding)

pnkfelix (Feb 27 2020 at 20:28, on Zulip):

My current implementation puts github support up under a command line flag that you use to choose which strategy you want to use. (And there's no fallback from github to local git yet). Would people prefer the local git checkout strategy to remain the default?

simulacrum (Feb 27 2020 at 20:29, on Zulip):

I think we should have the github be the default and no auto fallback, personally

simulacrum (Feb 27 2020 at 20:30, on Zulip):

(we should tell people: "you ran up against rate limits, please pass --clone-repo or wait until $DATE"

Chris Simpkins (Feb 27 2020 at 20:33, on Zulip):

I think we should have the github be the default and no auto fallback, personally

Is it worth considering GH API authentication support in that case?

simulacrum (Feb 27 2020 at 20:34, on Zulip):

I wouldn't, I think

simulacrum (Feb 27 2020 at 20:34, on Zulip):

I guess we could in theory

simulacrum (Feb 27 2020 at 20:34, on Zulip):

but it's a bit not great to have that in e.g. an environment variable (for security reasons)

pnkfelix (Feb 27 2020 at 21:03, on Zulip):

oh, is it not standard practice to access the GITHUB_TOKEN via an env var?

Pietro Albini (Feb 27 2020 at 21:03, on Zulip):

I have a command that pulls it out of my password manager when needed

Pietro Albini (Feb 27 2020 at 21:03, on Zulip):

to avoid every single thing you run on your computer to have access to it

pnkfelix (Feb 27 2020 at 21:04, on Zulip):

I see. I guess I'm too trusting.

pnkfelix (Feb 27 2020 at 21:04, on Zulip):

@Pietro Albini what is the interface for your thing like, from the viewpoint of an app that wants access to the github token?

pnkfelix (Feb 27 2020 at 21:04, on Zulip):

does it take it on the command line, or via stdin, or ... ?

Pietro Albini (Feb 27 2020 at 21:04, on Zulip):

still the env var

Pietro Albini (Feb 27 2020 at 21:05, on Zulip):

I basically do eval $(pass vars/github-token)

pnkfelix (Feb 27 2020 at 21:05, on Zulip):

oh. So you just decide whether or not to set the env var on the case by case basis

Pietro Albini (Feb 27 2020 at 21:05, on Zulip):

(with a nicer alias on top of that)

pnkfelix (Feb 27 2020 at 21:05, on Zulip):

So it probably is okay for me to set this up so that it queries the GITHUB_TOKEN env var?

Pietro Albini (Feb 27 2020 at 21:05, on Zulip):

yep, on most of the shells it's not set, and when I need it I have to explicitly want that and press the physical button on the yubikey

pnkfelix (Feb 27 2020 at 21:06, on Zulip):

(the other option I was considering was to make that env var access condtional on the user passing in some command line flag.)

Pietro Albini (Feb 27 2020 at 21:06, on Zulip):

well if the env var is there I don't see why not use it

Pietro Albini (Feb 27 2020 at 21:07, on Zulip):

my interpretation of mark's comment is that people shouldn't just stick it in their bashrc

pnkfelix (Feb 27 2020 at 21:07, on Zulip):

simulacrum said:

I think we should have the github be the default and no auto fallback, personally

Well maybe for the initial landing I'd make github be opt-in, with the intention of switching the default after everyone's had a chance to stress it a little...

pnkfelix (Feb 27 2020 at 21:07, on Zulip):

Pietro Albini said:

my interpretation of mark's comment is that people shouldn't just stick it in their bashrc

okay now I see. I think.

pnkfelix (Feb 27 2020 at 21:08, on Zulip):

boy and here I was thinking I was a good citizen for making sure it (my setting of the env var) wasn't part of the dotfiles that I have under version control in a public server.

simulacrum (Feb 27 2020 at 21:16, on Zulip):

heh :)

simulacrum (Feb 27 2020 at 21:16, on Zulip):

yeah I basically don't want to encourage a bashrc type configuration of it

simulacrum (Feb 27 2020 at 21:16, on Zulip):

but reading it opportunistically seems good

simulacrum (Feb 27 2020 at 21:17, on Zulip):

maybe warning if it's not set and in github api mode?

Chris Simpkins (Feb 29 2020 at 13:09, on Zulip):

pnkfelix said:

Chris Simpkins said:

This part of the API: https://developer.github.com/v3/repos/commits/#list-pull-requests-associated-with-commit

looks like it might also support automation of the regression commit SHA -> PR in the final report

interesting. In practice I don't think we need this, because we only bisect down to the bors commits, which have the PR's in their commit message. (I.e. that mapping isn't happening, I think, in cargo-bisect-rustc; its just an artifact of how bors formats its merge commit messages.)

This may be dated information, but it looks like Niko wanted to see this format in the final report that is generated for IR threads by bisect-rustc:

* "blah blah blah", PR #123 by @so-and-so

I don't believe that we have support to dive into the commits in a rollup PR yet so we may not be ready to automate the individual PR link and PR author notification anyways. Could we improve the final bisect-rustc report by adding the regressed commit message? This would get us to a report that includes the SHA range tested, regression commit SHA, and the commit message. A human still needs to be involved to review the PR's identified in the rollups but the information is all in the final bisect-rustc report pushed to the tracker. With your recent changes to support --regress we should be able to automate the stderr stacktrace message in the regressed commit build too? That gets us much closer to a direct copy/paste report

pnkfelix (Mar 02 2020 at 21:07, on Zulip):

okay I posted the rebased PR: cargo-bisect-rustc#63

pnkfelix (Mar 02 2020 at 21:08, on Zulip):

unfortunately I have not been able to test this rebased one as thoroughly as my original version. I'm pretty sure it all works, but I kind of had to make some semi-guesses at how to handle the switch from reqwest 0.9 to reqwest 0.10

pnkfelix (Mar 02 2020 at 21:08, on Zulip):

(I have tested it, and I even have come up with a more general testing strategy. I'll post in that topic about that.)

Chris Simpkins (Mar 11 2020 at 13:37, on Zulip):

I came across this issue when I used the new Github API query in bisect-rustc for an ICE that fell outside of the 167 day CI commit window: https://github.com/rust-lang/cargo-bisect-rustc/issues/69

Chris Simpkins (Mar 11 2020 at 13:41, on Zulip):

IIRC, when the Rust repository is cloned the tool reports the git commit SHA's that are included in the nightly? Thoughts about the more informative/useful approach?

pnkfelix (Mar 11 2020 at 14:59, on Zulip):

seems reasonable enough.

pnkfelix (Mar 11 2020 at 15:00, on Zulip):

the main issue I can imagine (and this may be what stopped me from doing this myself) is how to handle presentation: if you emit the SHA before the description, then that is yet more text that appears before the description, causing it to flow into the next line (or off the window if one is not line wrapping the output)

pnkfelix (Mar 11 2020 at 15:01, on Zulip):

but if you emit the SHA after the description, then the SHA's don't line up in the output. :laughing:

Chris Simpkins (Mar 11 2020 at 15:04, on Zulip):

We may be able to shorten the SHA? I think that Github still recognizes it in issue threads when the SHA is ~6-7 chars?

Chris Simpkins (Mar 11 2020 at 15:05, on Zulip):

and auto links / generates hover popup of the commit IIRC

pnkfelix (Mar 11 2020 at 15:19, on Zulip):

okay. and we could also get rid of the current commit[{}] part of the output, replace that with the SHA

pnkfelix (Mar 11 2020 at 15:19, on Zulip):

or maybe write it as {SHA}[{I}] instead of commit[{I}]

Chris Simpkins (Mar 11 2020 at 15:40, on Zulip):

checking the length of sha comment now to confirm that this actually works

Chris Simpkins (Mar 11 2020 at 15:47, on Zulip):

It looks like GH formats down to 7 characters. When I drop to 6, it does not auto link. When I increase to 8 it shows 7 of them.

2020-03-11_11-46-00.png

Chris Simpkins (Mar 11 2020 at 15:49, on Zulip):

this is without the full URL string, only the initial SHA hex characters

simulacrum (Mar 11 2020 at 15:55, on Zulip):

GitHub will autocondense long SHAs in markdown, please use at least 8 characters or so to avoid ambiguity :)

Chris Simpkins (Mar 11 2020 at 15:59, on Zulip):

Excellent point!

Last update: Apr 03 2020 at 18:30UTC