Stream: t-cargo/PubGrub

Topic: random bench perf changes


view this post on Zulip Matthieu Pizenberg (May 23 2021 at 15:26):

Something that is kind of annoying me is that I'm getting random perf changes between bench runs, without touching the code, other software or hardware change. Here is the latest I got:

Benchmarking large_cases/elm_packages_str_SemanticVersion.ron: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 20.0s. You may wish to increase target time to 22.8s, or reduce sample count to 80.
large_cases/elm_packages_str_SemanticVersion.ron
                        time:   [225.16 ms 226.06 ms 226.93 ms]
                        change: [+3.0095% +4.1701% +5.3402%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
Benchmarking large_cases/zuse_str_SemanticVersion.ron: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 20.0s. You may wish to increase target time to 38.5s, or reduce sample count to 50.
large_cases/zuse_str_SemanticVersion.ron
                        time:   [354.92 ms 357.46 ms 360.05 ms]
                        change: [-5.7625% -5.0405% -4.3488%] (p = 0.00 < 0.05)
                        Performance has improved.
large_cases/large_case_u16_NumberVersion.ron
                        time:   [9.9669 ms 10.014 ms 10.064 ms]
                        change: [+4.8475% +5.8246% +6.8089%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

view this post on Zulip Matthieu Pizenberg (May 23 2021 at 15:29):

And 2 min later, the following run:

Benchmarking large_cases/elm_packages_str_SemanticVersion.ron: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 20.0s. You may wish to increase target time to 24.1s, or reduce sample count to 80.
large_cases/elm_packages_str_SemanticVersion.ron
                        time:   [226.96 ms 227.74 ms 228.55 ms]
                        change: [+0.2257% +0.7434% +1.2806%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Benchmarking large_cases/zuse_str_SemanticVersion.ron: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 20.0s. You may wish to increase target time to 39.0s, or reduce sample count to 50.
large_cases/zuse_str_SemanticVersion.ron
                        time:   [380.32 ms 381.46 ms 382.64 ms]
                        change: [+5.8767% +6.7136% +7.5496%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
large_cases/large_case_u16_NumberVersion.ron
                        time:   [9.2866 ms 9.2912 ms 9.2961 ms]
                        change: [-7.6816% -7.2217% -6.7840%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

view this post on Zulip Matthieu Pizenberg (May 23 2021 at 15:30):

Do you experience similar bench instability?

view this post on Zulip Eh2406 (May 23 2021 at 16:39):

In short yes. Criterion is incredibly sensitive to everything. At some point I was able to count the number of minimized documentation tabs I had open by looking at the perf numbers. That is when I realized that listening to music, looking at the internet, even interacting with an IDE, are all out when running perf tests. And you never know what the OS is doing in the background. So I try hard to not compare numbers that where collected more the ~5 mins apart. My preferred workflow is with only a terminal open (no other programs even minimized), git checkout base then cargo bench ... then git checkout pr then cargo bench ... then git checkout base then cargo bench .... If the improvements from base -> pr are about the same as the degradation from pr -> base then I have some confidence in the test. I don't always live up to that standard. This all got a lot easier when I got a desktop. I do all my development, IDE, documentation, and music on my laptop. When things are working, and I am ready for a perf test, I use ssh to git pull the two branches and run the benchmarks on my desktop, witch is otherwise completely idle.

view this post on Zulip Josh Triplett (May 23 2021 at 18:13):

Another option would be to use an AWS instance that's running nothing else.

view this post on Zulip Josh Triplett (May 23 2021 at 18:13):

Easy to spin up.


Last updated: Oct 21 2021 at 19:46 UTC