## Stream: general

### Topic: Proptest generate sorted vec

Jake Goulding (Mar 20 2020 at 23:04, on Zulip):

@centril do you have any suggestions on how I could generate an arbitrary `Vec<Option<T>>` where all of the `Some` values are sorted?

Said another way, a sorted `Vec` with `None`s scattered throughout.

Jake Goulding (Mar 20 2020 at 23:06, on Zulip):

My current path is an arbitrary vec that I've sorted, then I map over that with `prop_perturb`, convert it into an iterator, then generate a random number of `None`s for each step. This works-ish, but doesn't have great shrinking.

centril (Mar 20 2020 at 23:06, on Zulip):

@Jake Goulding does the vector need to be sorted across `None`s?

centril (Mar 20 2020 at 23:07, on Zulip):

Or is `[S(1), None, S(0)]` valid?

Jake Goulding (Mar 20 2020 at 23:07, on Zulip):

I'm not 100% I know what you mean. This would be a valid result: `vec![None, Some(1), None, None, None, Some(2)]`

Jake Goulding (Mar 20 2020 at 23:08, on Zulip):

(modulo ascending/descending — that should be trivial :wink:)

centril (Mar 20 2020 at 23:08, on Zulip):

another example then: `[S(0), N, S(1), N, S(0)]`

centril (Mar 20 2020 at 23:09, on Zulip):

in other words, is `None` a boundary for sorting?

Jake Goulding (Mar 20 2020 at 23:09, on Zulip):

Ah, gotcha. No, I do not want that.

Jake Goulding (Mar 20 2020 at 23:10, on Zulip):

Another tack would be if there was a way I could sort the `Vec<Option<T>>` leaving all of the `None`s in place.

centril (Mar 20 2020 at 23:10, on Zulip):

that's what I would do

Jake Goulding (Mar 20 2020 at 23:11, on Zulip):

... but I don't actually know how to do that ;-)

Jake Goulding (Mar 20 2020 at 23:11, on Zulip):

Like, I know that's not proptest, but my brain isn't braining a way to do that in Rust

centril (Mar 20 2020 at 23:11, on Zulip):

@Jake Goulding what happens if you use a stable sort and deem `None` equal to anything?

Jake Goulding (Mar 20 2020 at 23:12, on Zulip):

I'll whip up a playground and see.

Jake Goulding (Mar 20 2020 at 23:15, on Zulip):

Looks like that treats the `None` as boundaries. https://play.integer32.com/?version=stable&mode=debug&edition=2018&gist=3f3356592813dea501c31110446169b4

centril (Mar 20 2020 at 23:15, on Zulip):
centril (Mar 20 2020 at 23:16, on Zulip):

@Jake Goulding what if you change your requirements :upside_down: :D

Jake Goulding (Mar 20 2020 at 23:17, on Zulip):

So, I said `Option`, but it's actually `Poll` — this is for mimicking a stream that returns sorted data but that might not be ready for a number of calls.

Jake Goulding (Mar 20 2020 at 23:18, on Zulip):

So solving that is my goal and real requirements ;-)

centril (Mar 20 2020 at 23:18, on Zulip):

isomorphic, so same thing :P

Jake Goulding (Mar 20 2020 at 23:18, on Zulip):

exactly, and I went with `Option` because it's more familiar

centril (Mar 20 2020 at 23:22, on Zulip):

@Jake Goulding here's a different idea that might be worse, who knows: Generate a `(elems: Vec<T>, insertions: Vec<(where: usize, number: usize)>)`, sort `elems`, throw away anything in `insertions` which would be out of bounds in `elems`, and then insert `number` of `None`s at `elems[where]`

centril (Mar 20 2020 at 23:22, on Zulip):

the throwing away avoids `prop_flat_map`

Jake Goulding (Mar 20 2020 at 23:28, on Zulip):

I'm not actually using `prop_flat_map` — just `Iterator::flat_map`:

```    fn arb_sorted_data<T: Arbitrary + Ord>() -> impl Strategy<Value = Vec<T>> {
any::<Vec<T>>().prop_map(|mut d| {
d.sort();
d
})
}

/// Adds a small arbitrary number of `Poll::Pending` values before
/// and after each existing sorted value.
fn arb_proto_stream<T: Arbitrary + Ord + Clone>() -> impl Strategy<Value = ProtoStream<T>> {
let max_pending_in_a_row = 10;

arb_sorted_data().prop_perturb(move |values, mut rng| {
let original = values.clone();

let mut poll_values: Vec<_> = values.into_iter().flat_map(|v| {
let n_pending = rng.gen_range(0, max_pending_in_a_row);
let mut results = vec![Poll::Pending; n_pending];
results
}).collect();

let n_pending = rng.gen_range(0, max_pending_in_a_row);
poll_values.extend(vec![Poll::Pending; n_pending]);

ProtoStream { original, poll_values }
})
}
```
centril (Mar 20 2020 at 23:30, on Zulip):

@Jake Goulding yeah sure; my mention of flat mapping is unrelated to your current thing

centril (Mar 20 2020 at 23:36, on Zulip):

centril (Mar 20 2020 at 23:39, on Zulip):

basically, selection sort seems to be perfect for this since it will keep scanning for things ahead that might be smaller, as opposed to fixating on two things

Jake Goulding (Mar 20 2020 at 23:39, on Zulip):

Thanks. I'll give a look / try after dinner. :heart:

centril (Mar 20 2020 at 23:39, on Zulip):

Enjoy :food: :slight_smile:

centril (Mar 20 2020 at 23:44, on Zulip):

N.B. selection sort is O(n^2), but works well on smaller inputs

centril (Mar 20 2020 at 23:45, on Zulip):

so it should work well for `any::<Vec<T>>()`, but if you start jacking up the size into 100'000s or so or millions things will go slower :P

centril (Mar 20 2020 at 23:49, on Zulip):

though at that point I suspect the selection sort is the least of your problems :joy:

Jake Goulding (Mar 21 2020 at 12:14, on Zulip):

I’m wondering how much that differs from the concepts in `sort_by_cached_key` https://doc.rust-lang.org/src/alloc/slice.rs.html#331-375

Jake Goulding (Mar 21 2020 at 17:47, on Zulip):

And folk over in Stack Overflow chat suggest something like

```    fn sort_present_values<T: Ord>(d: &mut Vec<Option<T>>) {
let mut values = vec![];
let mut indices = vec![];

for (i, v) in d.iter_mut() .enumerate() {
if let Some(v) = v.take() {
values.push(v);
indices.push(i);
}
}

values.sort();

for (v, i) in values.into_iter().zip(indices) {
d[i] = Some(v);
}
}
```
centril (Mar 22 2020 at 05:30, on Zulip):

@Jake Goulding seems like a fine way to do it; at least in terms of complexity you get `O(n log n)`, but on the other hand, there's more allocating back and forth

Jake Goulding (Mar 23 2020 at 16:50, on Zulip):

And of course now the requirements have changed. Now all the `Some` values need to be unique. Sigh.

Jake Goulding (Mar 23 2020 at 17:04, on Zulip):

I wonder if it would be simpler to create a `Vec<T>` of length N and a `Vec<usize>` of length N+1 and then zip them, mapping the first to `Some` and the second to `usize` `None`s.

Jake Goulding (Mar 23 2020 at 17:04, on Zulip):

Then we could use a `BTreeSet` instead of the first vec and get the sorted / unique for free

Jake Goulding (Mar 23 2020 at 17:05, on Zulip):

It's closer to my original idea, but avoids the direct RNG usage

Last update: Nov 25 2020 at 03:15UTC