Level 3: Calculating the Probability of Improvement – VWO

close this to read article

In this article, you’ll learn the following:

Overview
What is an Improvement Distribution
Estimating Statistical Significance
Interpreting the Probabilities
Key Takeaways
Conclusion

Overview

Imagine you're testing two versions of your website and want to know if the new version is actually better. The expected improvement parameter can help this. But to have a reliable result, the SmartStats engine goes a step ahead to process this data using statistics. The Probability of Improvement helps determine the reliability using statistics. It simply denotes the probability of an improvement being statistically significant.

But why should the data be statistically significant? Think of it like flipping a coin. If you flip a coin 10 times and get 7 heads and 3 tails, you might wonder if the coin is biased. However, since 10 flips is a small sample, this result could easily happen by chance. To be sure, you would need to flip the coin many more times, say 1000 times. If you then get 700 heads and 300 tails, you can be more confident that the coin is indeed biased because such a large difference is less likely to happen by chance.

Similarly, when testing your website versions, you want to ensure that the observed improvement in the new version isn’t just due to random chance. Statistical significance helps with this. It tells you whether the observed improvement is likely to be genuine or just a fluke.

What is an Improvement Distribution

An improvement distribution represents the difference between the variation (new version) and the baseline (current version). It uses Bayesian inference to make predictions about the true uplift based on your sample data. At VWO, these are modeled as normal distributions.

Estimating Statistical Significance

To see if your new version is better, we look at the area under the improvement distribution curve. This curve is divided into three regions:

Below the ROPE: This is the probability that the new version performs worse than the current version.
Within the ROPE: This shows the probability that the new version performs about the same as the current version.
Above the ROPE: This is the probability that the new version performs better than the current version.

ROPE stands for "Region of Practical Equivalence," which defines a region around 0 where improvements can be considered equivalent to the baseline.

Interpreting the Probabilities

Interpreting these probabilities is straightforward:

Estimating Proportions: By analyzing the improvement distribution, you can see how much of it falls into each region (worse, equivalent, better).
Decision Making: These probabilities guide your decisions on whether to adopt the new version or stick with the current one. For instance, if the probability that the new version is better (Above the ROPE) is high enough, you can confidently implement the change.

Key Takeaways

The "Probability of Improvement" is essentially the chance that your new version is better than the current one.
These probabilities help you make data-driven decisions rather than relying on guesswork.
A configurable threshold, like a 95% Probability of Improvement, indicates when a new version can be considered significantly better.

Conclusion

Using improvement distributions and understanding the Probability of Improvement can transform raw test data into actionable insights. This statistical approach ensures that your decisions are backed by solid evidence, making your optimization strategies more effective and reliable. By focusing on these probabilities, you can confidently implement changes that enhance your website's performance

INFO: Next up, you will have to learn about the 4th level of statistical inference to apply these concepts more comprehensively and make decisions based on your campaign data. Click here to learn more.