Although A/B tests look simple to execute, it requires great discipline to get them right. These tests often follow the Frequentist model that requires you to run the test for a set period of time to get correct data from it.
However, most testers fail to understand the importance of time and are instead obsessed with reaching the significance level of the test. It ends up being the be-all and end-all for most testers and is often called significance testing. Without running a test for the recommended period of time, the significance level won’t be correct and your results would be inaccurate.
Even if you run your tests for the recommended period of time, the Frequentist-model tests can only tell you if A will beat B. It will not be able to estimate how close or far A and B are. Also, it will never tell you the probability of A beating B and the uncertainty involved. These mistakes happen because A/B testing did not evolve keeping conversion rate optimization in mind. It’s a statistical method that has been adopted but never customized to follow the workflow of conversion experts.
SmartStats: The Bayesian Way to Find Your Winning Variation
SmartStats is our new stats engine based on Bayesian statistics, which provides you more control over your testing. You can now plan better, have a more accurate reason to end tests, and understand how close or far apart A and B are. SmartStats understands what improvements you care about, how certain you want to be, and helps you at every step to make your testing smarter.
Conversion Rate as a Range
VWO reports a conversion rate range to represent with 99% certainty where the actual conversion rate of your website lies. This allows us to always be certain about the conversion rate we report.
When the test starts, the conversion rate is considered to lie in the 0–100% range, and this is updated as we go ahead. To update the conversion rate range, we use the Bayesian statistical model. We do this because we are predicting the possibility of something that hasn’t occurred yet. Inferential statistics is a branch of statistics that deals with this, and it has two major models- Frequentist and Bayesian.
Both these models have their own advantages; and based on what you are trying to do, you choose one over the other.
For example, if we divide conversions and visitors and determine the conversion rate, it would not give us true and accurate results. Because 7 conversions out of 100 visitors and 700 conversions out of 10,000 visitors, both result in a 7% conversion rate. An addition of one more conversion makes the conversion rate 8% in the first case and just 7.01% in the second one.
Thus, we are trying to make an inference about the conversion rate of a variation with the changes you made, and it cannot be determined by measuring conversions for 100 visitors on your website. We cannot devote a high volume of traffic (and time)to testing, so we need to find a way to represent it such that it takes into account all uncertainty and fluctuations.
For this, we need to stop looking at the conversion rate as a single value. A single value is almost never going to be accurate if we are trying to compute it using data collected over a short duration. Instead of expressing the number of conversions as a percentage, SmartStats calculates a conversion rate range where the true conversion rate lies with a 99% probability. The more data it collects, the smaller this range of the highest likely values gets.
By default, you see the most likely value of the conversion rate on the Stats table. You can change this by using View Settings and add the conversion rate range to your report.
Improvement as a Probability
This is the median improvement that you can expect over the baseline if you implement the variation. The "best case" and "worst case" values represent the 99% credible interval where improvement is likely to be contained.
Traditional Frequentist statistics approximate the mean (along with the standard deviation) of the samples where A beat B. This type of statistics completely ignores the instances when B beat A. Bayesian statistics even account for this possibility to calculate the probability of A beating B and the range of the improvement you can expect. So, we take random samples (as many as 7,000,000 samples) for both control and variations’ conversion rate and compare these to find out the improvement.
For example, we use random samples from the conversion rate ranges of control and each of the variations. We then compare how much improvement each variation is getting over control. Here’s the formula we use:
Improvement over Control = (SampleVariation – SampleControl )/SampleControl
As explained above, we repeat this about 7 million times with random samples extracted from Control and Variations’ conversion rate to get a range of improvement that we expect for each variation. Similar to the conversion rate ranges, improvement can also be plotted.
By default, we show the median value of the improvement you can expect if you apply the variation. You can change this by using View Settings and add the improvement range to your report.
Probability to Beat Baseline
To figure out the probability to beat the baseline, we again compare random samples from the conversion rate ranges, but this time we look at the number of times variation beats control/baseline out of the 7 million sample sets. We then express it as a percent. Here’s the formula we use:
Probability of Variation1 beating Control = Number of times variation 1 sample beats control sample/Total number of samples
Probability to Be the Best
The probability to be the best is calculated in the same way as the probability to beat baseline, but here we look at how often the variation is beating all other variations in each of the 7 million sample sets. We then express it as a percent. Here’s the formula we use:
Probability of Variation1 being the best = Number of times Variation1 sample is the best/Total number of samples
SmartStats Calculates the Potential Loss to Reduce the Risk of Choosing a False Winner
SmartStats takes into account the probability that B may beat A. The potential loss is the lift you can lose out on if you deploy A as the winner when B is better. With traditional Frequentist statistics, you rely on reaching significance. If you haven’t committed to sample size, you are at a high risk of getting a false positive or a false winner.
SmartStats uses the potential loss to decide when to end a test. Based on the test results, you are shown one of the two statuses:
To ensure that VWO doesn’t call a test without first ensuring that the potential loss of a variation is below a certain threshold. This threshold is calculated based on the conversion rate of Control, Visitors that become part of the test per week, and a constant value.
For a variation to be called a Winner or a Smart Decision, it must have potential loss less than the threshold of caring.
Winner
We call a variation a winner when its potential loss is below the threshold of caring and its chance to beat all is more than or equal to 95%.
Smart Decision
We call a variation a Smart Decision when its potential loss is below the threshold of caring and its chance to beat baseline is more than or equal to 95%.