This keeps coming up: https://github.com/rust-secure-code/wg/issues/29
Since this is a second line of defence and you're pwned regardless of whether we have it or not, I'm not sure I'm on board with this being a 2019 goal. There is already so much on our plates, and I'd expect better authenticating downloaded packages in the first place to take priority.
you're pwned regardless of whether we have it or not
Are you referring to having the resulting build contaminated whether there is a sandbox or not?
I was encouraged by seeing crater disable network access
that leads me to believe that crates aren't presently doing things like hitting the network during builds
for a "do no harm" sandbox, the longer you wait, the harder it will be to retrofit restrictions
and yes... it seems people have pretty polarized opinions on this
on the one hand, there's the "RCE? game over man, game over" argument https://users.rust-lang.org/t/how-does-crates-io-differ-from-npm/22658/21
Attacking at build-time allows for a fly-by-night attack that leaves no forensic evidence and potentially permits lateral movement
as such, it's also the superset of the alternative, which is to trojan the target artifact
to me, one of these things is clearly worse than the other
speaking as someone who used to work on a DFIR team for several years... a non-build-script attack leaves forensic evidence not just in the target binary, but in the original source code
that makes finding the payload a matter of examining the source of the original crate
you don't get that guarantee with build scripts
they could grab a malicious payload off the Internet, and do it in such a way that thwarts attempts by researchers to discover it
build-time attacks are much, much more worrisome to me
that leads me to believe that crates aren't presently doing things like hitting the network during builds
I've seen crates that download code using git to build a C dependency.
huh... do those pass crater?
I'm going to guess no
No idea. The only example I know of is in crosvm's new vTPM code.
happen to know a crate name?
dtolnay might. I got the impression he got the idea from another crate.
to me the "proper" way to do that is a git submodule in the original repo, and then that can just package the source at the commit the submodule is pinned to into the resulting crate...
The idea is that the source is downloaded and built only when the platform hasn't already installed a static lib.
Agreed that crates should have a more graceful way of using third-party code than downloading in build.rs.
if you happen to know the name of such a crate, I'd be interested in investigating how crater handles those crates
the fact crater is shutting off things like network access to operate kind of speaks to the need for the general feature, IMO...
but maybe I'm just a fan of POLA :stuck_out_tongue_wink:
I just looked through a few pages of crates that depend on git2 and this was the first one I found that used it in build.rs for downloading.
it's listed as "skipped" in crater
Where can I get this crater information?
it doesn't exactly have the world's most usable/browsable UI, heh
hmmm some discussion of external dependency handling here https://internals.rust-lang.org/t/external-dependencies-in-declarative-format/9372
I'm not wading into this thread, but if everyone is on the same page that we should sandbox, I'm happy to review/write whatevers needed by way of a sandboxing library.
I think it'd be at least be interesting to build a prototype with something like gaol
having something tangible to talk about at least breaks the endless pontificating cycle
Is depending on containers (at least for right now) out of the question?
I know that could make for a heavy-weight dependency, and it could be a headache for supporting multiple platforms, but until gaol is considered to be production ready, containers may be a good option.
Containers are chroot+cgroups with a bunch of tooling on top. They do not really introduce much in the way of dependencies - they just need a kernel with those facilities, such as Linux, FreeBSD or Solaris/Illumos. This means that MacOS and Windows would be missing out.
see also namespaces+seccomp, which are IMO the more interesting tools for security. as it were, gaol provides a cross-platform abstraction to those and, well... not entirely dissimilar facilities on other operating systems (not entirely similar either, but "best sandbox for the current OS" is probably a reasonable goal)
spitballing her: for tackling the cross-platform issue: what if the build script were compiled to wasm and run with an OS-agnostic abi?
I saw there was a WASM + CloudABI project as it were... but that seems like a substantially larger change than just sandboxing the build script
what's nice about using something like gaol is crate consumers could opt into giving the build script all of the access ones today have
aah yeah this: https://github.com/CraneStation/wasmtime
fun name :wink:
apparently Amazon wrote a tool of this ilk for Firecracker: https://github.com/firecracker-microvm/firecracker/blob/master/docs/jailer.md
Seems similar to minijail which we use in crosvm and in Chrome OS
hi there, considering that most people are importing dependencies without reading their code and that these can contain build scripts or procedural macros that can compromise one's computer, I was thinking we should maybe provide a sandboxed offline compilation process by default in Cargo (SELinux, Hyper-V APIs, jail). What is your opinion on such a thing?
I'm also thinking about RLS that automatically compiles dependencies, which means it's automatically running untrusted code unsandboxed.
There is already a topic for this that you might want to read through and repost your question: "build-time sandboxing"
Yes sorry, I have realized, happy to see it's already being discussed.
So I wanted to insist on the fact that this has to be default in everyone's development environment else the risk wont be mitigated
There's also a problem that I see, it's that we couldnt flag crates that perform network access in build scripts or macros on crates.io because these can always set up a buffer with shellcode and run networked malware there.
I don't think that any build script or procedural macro code should ever be allowed to access the network.
I mean, I'm sure someone will come up with reasons why they need their build script to do so. Off-hand, I'm thinking of cases where you want to ask the build script to automatically update to the upstream version. But by default, no, no build-time code should be able to access the network or arbitrary places on the FS or even arbitrary syscalls. Whatever solution is engineered will need an escape hatch.
Is downloading a C library if it's not installed locally still a thing? I imagine it would be pretty big on Windows
It was a thing?!
Yeah, it's pretty convenient for ssl and zlib wrappers that are in the ecosystem.
Oh, you mean for -sys crates.
Probably. submodules would be better, but people don't use/understand them.
What kind of submodules? I hope you don't mean git submodules?
But yeah, this is a thing, and this practice is especially horrifying for something like OpenSSL that gets a new batch of vulnerabilities every 3 months or so, because you get this thing statically linked with no record of which version compiled or even downloaded. And the worst part is that while on Linux you have an easy way to install these libs, on Windows you don't - or at least, it didn't have anything like that back when I last saw it 10 years ago.
Yes, git submodules? Why not?
Opinion/rant: git submodules solves its problem domain in such a confusing way that it poisons any future attempts to improve on its problem domain. Kind of like PGP.
No submodules would be way better than what git ended up with.
Although compared to the rest of user-facing parts of git it doesn't even stand out all that much
Frankly I do not see how git submodules are relevant to crates published on crates.io
I guess the equivalent idea is one could upload sidecar zips to crates.io that could be optionally downloaded if the build script wanted it.
You can't have real networking, but you can have safety-scissors that are impossible to cut yourself with.
This... actually sounds like a pretty great idea.
Quick, jot it down in https://github.com/rust-secure-code/wg/issues/29
Well, I like submodules as implemented in the plumbing (the porcelain for them is terrible). Most importantly, they propagate the core git guarantee -- if you download a specific sha1 you get exactly the same tree.
I don't think cargo downloads crates with git though. I thought it downloaded zips.
Ah, that I did not know. Why not have people package up their C dependencies then?
I mean, first question is how hard is that to do? If it's hard to do, we should fix that first.
(I have never published a crate...)
Depending on the source being packaged, it may be very large.
It's also not always necessary if pkg-config can just give them a lib from the system.
Hence sidecar zips?
Alternatively one could specify URL+hash pairs somewhere.
I guess. Honestly I'm not certain it's a good solution. I was only spit-balling ideas.
It's not a terrible idea. But I'm reading the crates.io docs and they have a strict 10MB limit on .crate files.
URL+hash seems reasonable, assuming the URLs are pinned down to trusted domains
Are you concerned about leaking information about who's downloading?
Yeah. If cargo contacted a domain controlled by an attacker, that would probably violate what people imagine a "sandbox" is good for.
I also want to reserve the option of caching/rewriting/mirroring domains for the purposes of hermetic build systems, like the kind in Chrome OS.
Although I suppose the "hash" part solves the issue well enough.
You get that from the hash, though. Your build system can look up that hash wherever it wants
The URL is just a hint
But most people won't have a pre-mirrored system but would still like a sandbox.
Most people won't care about the leak.
Most people don't care about build time sandboxing either.
They want a sandbox to protect their systems, not their privacy.
Who are we building this sandbox for?
Time for a product requirements document!
Also, malware that is being distributed through crates.io can be purged. Malware that is distributed from a attacker controlled server that cargo is directed to download from is harder to purge.
So I would argue it's not only privacy being preserved.
you also get an audit trail that way, since crates.io is supposedly immutable
Audit trail of... the malware?
The people who downloaded it?
Sorry, getting distracted.
Who is the sandbox for?
The first tranche is obviously the majority -- don't care until they get malware on their box or in their product.
I would think it is for everybody to run by default unless they have a very good reason not to.
We can't do anything about the product side, obviously.
Good reasons might be legacy crates or the crate is from a trusted party (i.e. an internal team wrote it).
(We should also model the attack side, which means we're really building a full threat model.)
I guess we can start with the easy stuff:
The attacker has control over the Cargo.toml/Cargo.lock and can upload arbitrary zips to crates.io under crate names that they own.
The attacker can perform a
cargo build but can not do a
cargo test or
Control over my Cargo.toml? How?
They gave you a really cool demo on hackernews and explained how you too can make your own ray-traced doom clone with this nice crate.
Sure, so you copy-pasted their Cargo.toml including some weird directives they said were needed to make it work.
Why can the attacker build but not run?
How are we going to sandbox arbitrary rust programs?
Oh, I see. You're saying only build is in scope.
The rust program is probably going to be doing something useful most of the time. We can't hope to sandbox it without everybody disabling it all the time.
Yeah, most build scripts do a common subset of simple things.
I think "download this external archive so I can compile it" is a pretty common thing and easy to provide a safe(r) API for.
Probably what we need to do is make a simple sandbox and crater it.
See what breaks.
Crater already prohibits network access, so there's that
I believe it is pretty common for people to do it (see e.g. https://github.com/danburkert/prost/commit/e0317f83958892d716e99423f07525db5c7469e6#diff-3457fb1ebde739813ad9692cad895f1f) and people don't understand why (for so many reasons) why they shouldn't. I see that as a quite distinct issue from sandboxing the build, though.
Note in particular that I ended up embedding like 20MB of executables into that crate in order to avoid the network access.
...so you've embedded OpenSSL and now you need to issue a security update to your crate every time they find yet another CVE in that thing?
I don't think we embedded OpenSSL into PROST directly or indirectly since i think those executables don't do network I/O.
But, in the case of rust-openssl or similar, yes you would.
"Importantly, this eliminates very heavy and brittle non-Rust dependencies
including in particular curl and OpenSSL." says the commit message
Right, because OpenSSL was used to download the files.
Since the downloading was removed, OpenSSL dep was removed.
Anyway, I think that to realistically have a chance at implementing a sandbox that blocks network I/O by default (and if not by default, why bother?) and/or all the time (ideally), one would need to implement a new build stage, separate from build.rs, that can be used for downloading dependencies.
Similarly, people embed executables in build.rs, there would probably need to be some mechanism for whitelisting/approving the execution of such embedded executables, if blocking them is to be blocked by default.
I don't actually mind embedding executables because proc macros can do literally anything anyway. You already got arbitrary code in, congratulations.
I would assume that people expect proc macros to be safer because they can read the code.
I seem to remember that when the executables in PROST get updated, some doc enumerating their SHA-256(?) hashes and the source of the executable gets updated.
Well, you can always embed a long hex string with a compiled binary in your source code
Let's assume the download of source code is handled safely. The next thing that is typically done is executing gcc/clang/make on it. That seems really hard to sandbox because essentially arbitrary and highly complex binaries are being fed attacker controlled input.
/me scrolls up
people who want to hit the network to download some external asset/what have you
that seems like a gross hack
what is the justification as opposed to packaging the thing you'd otherwise download in the crate itself?
it seems like there's a boring KISS solution to this problem and it's "put the thing you'd otherwise hit the network to download in the crate and you're done"
shelling out to git/curl/etc from build.rs seems like a cargo-culted antipattern which is probably best eliminated
Probably the best way to eliminate it is to find the commonality and make it easier to do something better.
1) add git submodule for the code you'd otherwise use
build.rs to go clone
2) package said code into your crate
3) you're done
[img:thereisnospoon] "there is no
or any need for new cargo features, barring a legitimate need to microoptimize bandwidth in the case crates have, umm, "optional assets"
The code is usually C, so a build.rs is needed to build it.
that sort of thing feels like playing with fire in terms of things like reliable or better yet reproducible builds... or binary transparency efforts (for, say, a hypothetical future community build server)
The size restrictions of 10MB on crates.io means you can't put all the code in.
haha yeah sure I'm not talking about the
cc crate :wink:
so the counterpoint to concerns about the feasibility of shoving everything into a crate are...
or reproducible builds
I guess my question about 10MB is: what isn't fitting into 10MB at the moment?
some option for optional external assets would be interesting, but I'd also be curious what the real-world use cases are
I guess the next question is "who is the custodian of these assets who is volunteering to indefinitely and reliably host them for free?"
but that said, aside from the 10MB limit the right answer to me is to put the relevant external artifacts directly into the published crate
Usually the external artifacts are optional.
sure, but nobody's forcing you to use them
That is, if your system already has LLVM or openssl or whatever, you can simply use that.
if you download them and don't use them, that's what I was describing as "optimizing bandwidth"
My thinking is that as long as people want to optimize bandwidth, which they seem to do, we need to make it easiest to do it in a safe manner.
which is a concern I would rate lower than "
build.rs is shelling out to random tools to attempt to obtain essential build artifacts that may have disappeared from wherever they were originally supposed to be hosted"
yes, but that's a microoptimization...
I can understand that perspective.
if a Rust-friendly CDN provider were to volunteer to be the custodian of the "optional crate asset archive" I could see something like that happening
But on the other hand, it's usually downloading several times the size of the original crate.
I would assume the custodian would have to be crates.io
but anything short of that seems kinda gross to me
well I assume that 10MB limit is there for a reason
but perhaps it could be raised?
It would be nice to find out. Does anybody have contacts on the team?
can look in either the infra IRC or Discord channels I guess? I assume the latter is the current place
but I'm really wondering now how much crazysauce abuse of
build.rs to go grab random code from git could be trivially replaced with a git submodule whose contents are published as part of a crate
say, small, infrequently updated libraries
there might be a system version to link to, but if not, use the source
I imagine there might be a surprising amount of that sort of thing
@Tony Arcieri Downloading things in build scripts is what makes Google Chrome's build system horrible and what makes NPM packages unportable. It's literally impossible to port some Electron applications to a new platform even if Electron itself was ported because most NPM packages blatantly download x86 binaries to compile or do other various tasks. It clearly should be eliminated, if you require third party stuff tell the user to install a package from their distribution and add to PATH variable, just like openssl-sys or libcurl-sys crates.
Also for the 10MB limit, can't they create another crate and set it as a dev-dependency?
I feel like creating as many reusable crates as possible is the way to go.
oh, reusability is a good point, I have not considered it
yeah for sure
@Leo Le Bouter I am definitely not a fan of using ad hoc mechanisms for fetching build dependencies exactly for those reasons. Right now, aside from that, Cargo and crates.io are quite good at making sure all build artifacts remain available indefinitely (and are all checksummed in the index)
This also ties into auditing binaries for vulnerable libs: as soon as you get those statically linked C blobs versioned and checksummed, you can have an audit trail and check your binaries for vulnerable dependencies even if those came from C
made a crate for this: https://github.com/rust-secure-code/cargo-sandbox
like cargo-repro it's empty/vaporware
here's an issue to discuss: https://github.com/rust-secure-code/cargo-sandbox/issues/3
Just to be sure I understand it correctly again! Is it actually about building in a sandboxed environment? Or about running the binary in a sandboxed environment?
If it's the first one!: Do we mean that we should be able to download crates in the sandboxed environment? Or that everything should already be there and isolated?
/me thinking the former: crates get downloaded in advance, and then subsequent build-time code execution occurs in a sandbox
gaol specifically has a set of capabilities the build process is allowed to do. they could be tweakable, but denying network access by default, well... that's what crater does already
Ahh I think I understand now. It's mainly because there can be a build.rs script that can potentially do anything it would like right? I hope to check the mentioned issue/project and contribute some ideas somewhere this week!
build.rs, proc macros, I think there's stuff beyond that
More reasons why builds having access to the network is a bad idea: https://twitter.com/lukejacksonn/status/1131506699356037121
So pm2 (a node process manager package on npm) just caused thousands of CI builds to fail because of an "optionalDependency" on a package called gkt which is requested as a tarball from a server that was returning 503. That package consists of one file which contains this: https://twitter.com/lukejacksonn/status/1131506699356037121/photo/1- Luke Jackson (@lukejacksonn)
"domain.com/not-a-targeted-backdoor.js". I've seen popular packages in npm using installation telemtry too
hey look, another case-in-point :wink: https://github.com/RustSec/advisory-db/pull/104/files
some fun discussion on this thread https://internals.rust-lang.org/t/pre-rfc-procmacros-implemented-in-wasm/10860/75?u=bascule
Whoops wrong thread
The desire to do build-time sandboxing also came up a few times in the enterprise rust meetup at rustconf. one of the other people there even broke down a list of things build.rs files tend to do:
1. search for or otherwise wrangle external libraries
2. check rustc version and enable/disable features
3. generate some code to be included in the build
4. (there were 4 of these but i don't remember the fourth one, unfortunately).
most of these are amenable to some form of sandboxing, although number 1 is probably the hardest. number 2 could hypothetically be avoided in more cases if there were a way to query the compiler version info via a builtin cfg(...) thing, which i think there's an RFC for that just nobody has implemented.
there's been some interesting discussion of using WASM/WASI for this on rust-internals
which I hadn't seriously considered before but it seems there's a great deal of interest
a new cfg_if_rusct without build.rs showed up on Reddit a couple of days ago
without a build.rs? how?
There's been a crate that does this via a proc-macro attribute for some time: https://crates.io/crates/rustversion
ah right, that makes sense. not really that meaningfully different than build.rs though, although i guess proc macros will probably be easier to sandbox
(OTOH proc macros that depend on the target environment are probably not very well behaved from the standpoint of related goals like being able to download prebuilt artifacts from crates.io)
https://crates.io/crates/if_rust_version this one, doesn't depend on syn either so probably not a proc macro?
oh it comes with build.rs too, just pushes it one level down
this looks interesting: https://github.com/rust-lang/rustwide
wrapper for a Docker sandbox used by crater and docs.rs
https://github.com/dtolnay/watt - this sure is a step in the right direction, with proc macros being walled off in a small, 100% safe code WASM environment
Also helps with reproducible builds
Yeah saw the relevant rust-internals thread. WASM certainly seems like an interesting way to do sandboxing
I still want to play with
We had a long talk about sandboxing the other day at work. I'm really excited for wasms potential there
build.rs + WASI might be interesting
been playing with Rustwide. It seems pretty cool
Nice! I'm thinking of making Clippy security lints into their own category and then running it rustwide
Docker provides a stable CLI for both Linux and Windows
yeah, but there are also at least two promising crates which wrap up docker-based builds (
the latter seems really cool for reproducible builds of released crates
not so sure about the sandboxing use case
I think that it's been a while and something should be written even if it's as simple as a wrapper over the Docker/Podman CLI
Perfect solutions certainly do not exist w.r.t sandboxing
The issues with the command I posted are:
- Does not work with "path" dependencies
- Does not make sources readonly
rustwide are fairly new
rustwide is old, but that's in terms of it being used to drive crater
the extraction into a crate is relatively recent
I guess the question is: what would yet another docker wrapper do differently?
Aim at the sandboxing use case specially
...and what does that entail in your opinion?
the tricky things to me are the little bits like caching intermediate build artifacts
which it looks like
cargo-wharf does well
cargo-wharf looks overcomplicated, my command line uses the current directory's target and caches that and it works well, what else do you want exactly?
ok, would you like to submit a PR?
I will do so ASAP! I propose to integrate something simple first then change when better solutions or motivated people come up.
cargo-wharf looks better suited for CI use cases
for caching artifacts across workspaces (which cargo doesnt currently provide by default), the cargo documentation says you can use sccache (cargo install sccache)
OK, I get it better now. It builds each and every crate in its own sandbox to avoid them affecting each other's build artifacts.
Either way, a single artifact can get control over the final binary's execution
I have to think about that. A TOML parser/rewriter for passing in path dependencies, few ro/rw bind mounts, some CLI args to control network access with it off by default, a way to provide a custom Dockerfile, or find a way to make the container be based on a CoW version of the host's file system?
@Tony Arcieri ^ To answer on what I think it entails
yeah, sounds like the right general direction