Stream: wg-secure-code

Topic: preallocation in serde


Stuart Small (Aug 28 2019 at 17:24, on Zulip):

I bring this up because of https://github.com/3Hren/msgpack-rust/issues/151 but I've seen this a couple times now. It seems to be a reoccurring bug in serde crates that have a binary stream with the style of [length][datastream]. I fixed a similar bug in the bson crate but luckily with that one the spec has a small maximum size defined.. I looked at fixing the msgpack one with https://github.com/rust-lang/rust/issues/48043 but that might not be the best path since an attacker can still use the bug to gobble RAM but just won't be able to hit the same DOS panic bug. Does anyone have a better approach to this class of bugs? Should I look to the serde comunity?

Tony Arcieri (Aug 28 2019 at 17:37, on Zulip):

hahahaha

Tony Arcieri (Aug 28 2019 at 17:38, on Zulip):

deep rabbit hole

Tony Arcieri (Aug 28 2019 at 17:38, on Zulip):

some would argue this is a problematic family in formal language theory

Tony Arcieri (Aug 28 2019 at 17:38, on Zulip):

it wasn't even properly classified until a few years ago

Tony Arcieri (Aug 28 2019 at 17:38, on Zulip):

http://spw17.langsec.org/papers/grosch-taming-length-fiels.pdf

Stuart Small (Aug 28 2019 at 17:39, on Zulip):

Thanks for the link on more reading

Tony Arcieri (Aug 28 2019 at 17:39, on Zulip):

Screen-Shot-2019-08-28-at-10.39.13-AM.png

Tony Arcieri (Aug 28 2019 at 17:39, on Zulip):

it's a really good read if you're interested in this particular problem

Tony Arcieri (Aug 28 2019 at 17:40, on Zulip):

lots of language theory, but they do boil it down to some practical advice about how to parse this family of languages

Tony Arcieri (Aug 28 2019 at 17:42, on Zulip):

in general I would say making efficient zero-copy parsers for these sorts of data formats is very much a classical source of vulnerabilities

Tony Arcieri (Aug 28 2019 at 17:42, on Zulip):

and there are tons of sharp edges that are rarely considered

Stuart Small (Aug 28 2019 at 17:43, on Zulip):

Looking at this in the larger context of serde is making a lot of this sharp edges jump to the forefront too.

Tony Arcieri (Aug 28 2019 at 17:44, on Zulip):

yeah, I think in general serde is designed with context-free grammars in mind

Tony Arcieri (Aug 28 2019 at 17:44, on Zulip):

or if not that, "packed structs"

Tony Arcieri (Aug 28 2019 at 17:45, on Zulip):

I've written a serde parser/serializer for a format that's a combination thereof (which I didn't design), although that makes things a little easier: packed struct in front, with some messages having a variable length (length-prefixed) payload at the end

Tony Arcieri (Aug 28 2019 at 17:46, on Zulip):

that avoids nested length fields, which are where things generally start to go awry

Stuart Small (Aug 28 2019 at 17:49, on Zulip):

Makes sense. For most of these I'm looking the format is already defined and I just want to be able to handle safely. I was thinking about peeking ahead to see if the length value in the protocol is longer than what is left in the stream. Last I played with this was in the winter but IIRC serde didn't make that easy. This is all just coming up again because of that thread getting bumped.

Stuart Small (Aug 28 2019 at 17:50, on Zulip):

I think I need to do some reading and catch back on what I was playing with last time I looked at this. It makes me feel better that this is an active research problem.

Alex Gaynor (Aug 29 2019 at 01:09, on Zulip):

This isn't really just a Rust problem -- fuzzing ImageMagick/GraphicsMagick turned up tons of ooms of the shape "allocate however many X the length prefix said!"

RalfJ (Aug 29 2019 at 06:58, on Zulip):

http://spw17.langsec.org/papers/grosch-taming-length-fiels.pdf

thanks for sharing!

RalfJ (Aug 29 2019 at 06:58, on Zulip):

funny that they claim "(1) Length-prefix languages arenot context-free" is a contribution of theirs, AFAIK that as well-known and I saw talks about it many years ago... but it might be just clumsy wording in the abstract

RalfJ (Aug 29 2019 at 06:59, on Zulip):

oh wait, they claim to disprove (1)? Now I am confused.^^

Tony Arcieri (Aug 29 2019 at 07:01, on Zulip):

@RalfJ it's a little more nuanced than that. If you look at the image I pasted above (Fig 3), their novel contribution is that "Calc-Regular" languages exist outside the normal confines of the Chomsky Hierarchy

RalfJ (Aug 29 2019 at 07:02, on Zulip):

that makes sense

RalfJ (Aug 29 2019 at 07:02, on Zulip):

but then what do they mean by "disproving (1)"...

RalfJ (Aug 29 2019 at 07:02, on Zulip):

oh, they mean the "parsing is inefficient" part

RalfJ (Aug 29 2019 at 07:02, on Zulip):

got it :D

Tony Arcieri (Aug 29 2019 at 07:03, on Zulip):

interpreting a "Calc-Regular" language requires a sort of "recursive regular" automaton

Tony Arcieri (Aug 29 2019 at 07:06, on Zulip):

which first parses and comprehends the length prefix and then uses that as its Kleene star when consuming the message, rinse repeat. but in addition to that, those sort of recursive sub-machines allow you to recursively descend into nested length-prefix data, providing properties closer to a context-free language, but without the need for a pushdown automaton, i.e. while it's popular to use a stack machine for comprehending nested length prefix data, the paper shows it isn't strictly necessary

RalfJ (Aug 29 2019 at 07:07, on Zulip):

cool

Tony Arcieri (Aug 29 2019 at 07:08, on Zulip):

Calc-regular languages are almost as easy to parse as regular languages, using finite state machines with additional accumulators.

Shnatsel (Aug 29 2019 at 17:50, on Zulip):

AFAIK this is basically unsolved as of right now. Every single image or audio decoder has also bumped into this, see e.g. https://libpng.sourceforge.io/decompression_bombs.html
So far the approach in C libraries is "put checks everywhere and hope you got them all and nothing ever overflows"
I've found a number of such issues in Lewton, the Vorbis decoder in Rust, and the author came up with an interesting idea: make the memory allocator keep track of memory used by the crate and panic or abort or return error if someone tries to exceed the limit. This way deserialization of crafted input fails safely and in an isolated manner, without impacting the rest of the application. Here's their implementation of that approach, but sadly it's not complete enough to be actually used in lewton: https://github.com/est31/balloc
More info on the issue and their approach: https://github.com/RustAudio/lewton/issues/34

Tony Arcieri (Aug 29 2019 at 17:53, on Zulip):

there's a simple solution. it's just not always applicable.

Tony Arcieri (Aug 29 2019 at 17:54, on Zulip):

make the maximum field length a knob in the parser

Shnatsel (Aug 29 2019 at 17:55, on Zulip):

Well, that's what they're trying to do, but actually implementing that is a game of whack-a-mole.

Tony Arcieri (Aug 29 2019 at 17:56, on Zulip):

https://github.com/clasp-lang/veriform/blob/master/rust/src/parser.rs#L18

Tony Arcieri (Aug 29 2019 at 17:56, on Zulip):

heh

Tony Arcieri (Aug 29 2019 at 17:56, on Zulip):

I'd like to rewrite that into a zero copy pull parser based on iterators

Tony Arcieri (Aug 29 2019 at 17:56, on Zulip):

using the techniques in the Calc-Regular paper

Shnatsel (Aug 29 2019 at 17:57, on Zulip):

libpng has a knob, a bunch of Rust imaging crates got a knob after I bugged the maintainers about it, but it's still not very reliable because you need to get all the math right, never overflow, and not forget a single place with a single check. That's like avoiding buffer overflows in C all over again.

Shnatsel (Aug 29 2019 at 17:58, on Zulip):

https://github.com/est31/balloc is supposed to handle that for you transparently and reliably... or at least that's the idea. I don't think it's mature enough for production use.

Tony Arcieri (Aug 29 2019 at 18:02, on Zulip):

@Shnatsel my parser neatly sidesteps all that. unfortunately the solution means it isn't applicable to those sorts of problems (max overall message size cap which defaults to 1kB :stuck_out_tongue_wink:)

Tony Arcieri (Aug 29 2019 at 18:02, on Zulip):

I could probably turn it up just a wee bit, heh

Shnatsel (Aug 29 2019 at 18:03, on Zulip):

1kb compressed or decompressed?

Tony Arcieri (Aug 29 2019 at 18:06, on Zulip):

there is no compression :stuck_out_tongue_wink:

Tony Arcieri (Aug 29 2019 at 18:06, on Zulip):

it's for credentials

Last update: Nov 11 2019 at 22:40UTC