Stream: general

Topic: Proptest generate sorted vec


Jake Goulding (Mar 20 2020 at 23:04, on Zulip):

@centril do you have any suggestions on how I could generate an arbitrary Vec<Option<T>> where all of the Some values are sorted?

Said another way, a sorted Vec with Nones scattered throughout.

Jake Goulding (Mar 20 2020 at 23:06, on Zulip):

My current path is an arbitrary vec that I've sorted, then I map over that with prop_perturb, convert it into an iterator, then generate a random number of Nones for each step. This works-ish, but doesn't have great shrinking.

centril (Mar 20 2020 at 23:06, on Zulip):

@Jake Goulding does the vector need to be sorted across Nones?

centril (Mar 20 2020 at 23:07, on Zulip):

Or is [S(1), None, S(0)] valid?

Jake Goulding (Mar 20 2020 at 23:07, on Zulip):

I'm not 100% I know what you mean. This would be a valid result: vec![None, Some(1), None, None, None, Some(2)]

Jake Goulding (Mar 20 2020 at 23:08, on Zulip):

(modulo ascending/descending — that should be trivial :wink:)

centril (Mar 20 2020 at 23:08, on Zulip):

another example then: [S(0), N, S(1), N, S(0)]

centril (Mar 20 2020 at 23:09, on Zulip):

in other words, is None a boundary for sorting?

Jake Goulding (Mar 20 2020 at 23:09, on Zulip):

Ah, gotcha. No, I do not want that.

Jake Goulding (Mar 20 2020 at 23:10, on Zulip):

Another tack would be if there was a way I could sort the Vec<Option<T>> leaving all of the Nones in place.

centril (Mar 20 2020 at 23:10, on Zulip):

that's what I would do

Jake Goulding (Mar 20 2020 at 23:11, on Zulip):

... but I don't actually know how to do that ;-)

Jake Goulding (Mar 20 2020 at 23:11, on Zulip):

Like, I know that's not proptest, but my brain isn't braining a way to do that in Rust

centril (Mar 20 2020 at 23:11, on Zulip):

@Jake Goulding what happens if you use a stable sort and deem None equal to anything?

Jake Goulding (Mar 20 2020 at 23:12, on Zulip):

I'll whip up a playground and see.

Jake Goulding (Mar 20 2020 at 23:15, on Zulip):

Looks like that treats the None as boundaries. https://play.integer32.com/?version=stable&mode=debug&edition=2018&gist=3f3356592813dea501c31110446169b4

centril (Mar 20 2020 at 23:15, on Zulip):

yea, https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=23be3c5dc4c67795852866d122904872 is my version

centril (Mar 20 2020 at 23:16, on Zulip):

@Jake Goulding what if you change your requirements :upside_down: :D

Jake Goulding (Mar 20 2020 at 23:17, on Zulip):

So, I said Option, but it's actually Poll — this is for mimicking a stream that returns sorted data but that might not be ready for a number of calls.

Jake Goulding (Mar 20 2020 at 23:18, on Zulip):

So solving that is my goal and real requirements ;-)

centril (Mar 20 2020 at 23:18, on Zulip):

isomorphic, so same thing :P

Jake Goulding (Mar 20 2020 at 23:18, on Zulip):

exactly, and I went with Option because it's more familiar

centril (Mar 20 2020 at 23:22, on Zulip):

@Jake Goulding here's a different idea that might be worse, who knows: Generate a (elems: Vec<T>, insertions: Vec<(where: usize, number: usize)>), sort elems, throw away anything in insertions which would be out of bounds in elems, and then insert number of Nones at elems[where]

centril (Mar 20 2020 at 23:22, on Zulip):

the throwing away avoids prop_flat_map

Jake Goulding (Mar 20 2020 at 23:28, on Zulip):

I'm not actually using prop_flat_map — just Iterator::flat_map:

    fn arb_sorted_data<T: Arbitrary + Ord>() -> impl Strategy<Value = Vec<T>> {
        any::<Vec<T>>().prop_map(|mut d| {
            d.sort();
            d
        })
    }

    /// Adds a small arbitrary number of `Poll::Pending` values before
    /// and after each existing sorted value.
    fn arb_proto_stream<T: Arbitrary + Ord + Clone>() -> impl Strategy<Value = ProtoStream<T>> {
        let max_pending_in_a_row = 10;

        arb_sorted_data().prop_perturb(move |values, mut rng| {
            let original = values.clone();

            let mut poll_values: Vec<_> = values.into_iter().flat_map(|v| {
                let n_pending = rng.gen_range(0, max_pending_in_a_row);
                let mut results = vec![Poll::Pending; n_pending];
                results.push(Poll::Ready(v));
                results
            }).collect();

            let n_pending = rng.gen_range(0, max_pending_in_a_row);
            poll_values.extend(vec![Poll::Pending; n_pending]);

            ProtoStream { original, poll_values }
        })
    }
centril (Mar 20 2020 at 23:30, on Zulip):

@Jake Goulding yeah sure; my mention of flat mapping is unrelated to your current thing

centril (Mar 20 2020 at 23:36, on Zulip):

@Jake Goulding how about this: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ff7bfe6a1adf92c82202a4603f67fa60

centril (Mar 20 2020 at 23:39, on Zulip):

basically, selection sort seems to be perfect for this since it will keep scanning for things ahead that might be smaller, as opposed to fixating on two things

Jake Goulding (Mar 20 2020 at 23:39, on Zulip):

Thanks. I'll give a look / try after dinner. :heart:

centril (Mar 20 2020 at 23:39, on Zulip):

Enjoy :food: :slight_smile:

centril (Mar 20 2020 at 23:44, on Zulip):

N.B. selection sort is O(n^2), but works well on smaller inputs

centril (Mar 20 2020 at 23:45, on Zulip):

so it should work well for any::<Vec<T>>(), but if you start jacking up the size into 100'000s or so or millions things will go slower :P

centril (Mar 20 2020 at 23:49, on Zulip):

though at that point I suspect the selection sort is the least of your problems :joy:

Jake Goulding (Mar 21 2020 at 12:14, on Zulip):

I’m wondering how much that differs from the concepts in sort_by_cached_key https://doc.rust-lang.org/src/alloc/slice.rs.html#331-375

Jake Goulding (Mar 21 2020 at 17:47, on Zulip):

And folk over in Stack Overflow chat suggest something like

    fn sort_present_values<T: Ord>(d: &mut Vec<Option<T>>) {
        let mut values = vec![];
        let mut indices = vec![];

        for (i, v) in d.iter_mut() .enumerate() {
            if let Some(v) = v.take() {
                values.push(v);
                indices.push(i);
            }
        }

        values.sort();

        for (v, i) in values.into_iter().zip(indices) {
            d[i] = Some(v);
        }
    }
centril (Mar 22 2020 at 05:30, on Zulip):

@Jake Goulding seems like a fine way to do it; at least in terms of complexity you get O(n log n), but on the other hand, there's more allocating back and forth

Jake Goulding (Mar 23 2020 at 16:50, on Zulip):

And of course now the requirements have changed. Now all the Some values need to be unique. Sigh.

Jake Goulding (Mar 23 2020 at 17:04, on Zulip):

I wonder if it would be simpler to create a Vec<T> of length N and a Vec<usize> of length N+1 and then zip them, mapping the first to Some and the second to usize Nones.

Jake Goulding (Mar 23 2020 at 17:04, on Zulip):

Then we could use a BTreeSet instead of the first vec and get the sorted / unique for free

Jake Goulding (Mar 23 2020 at 17:05, on Zulip):

It's closer to my original idea, but avoids the direct RNG usage

Last update: Nov 25 2020 at 03:15UTC