SmartStats calculations are far more accurate than our old ones. Our old method used a common trick- the Central Limit Theorem and approximated the true statistical distribution with a normal distribution. This approximation works well if you have a lot of samples. That’s not possible for all customers, so it became clear that we needed to find a better way to get more accurate answers.
Note: As a best practice, we recommend you to run the test for an integer number of weeks to ensure that the data collected by your test is not biased. Visitor traffic on your website on weekdays is not equivalent to that of a weekend and by running the test for whole weeks, you get the most accurate results from your test.
You can achieve this by scheduling your tests to end at a certain time and date. To know more about scheduling a test, refer to How to Schedule Your VWO Test.
Let’s look at a couple of graphs that illustrate the problem. We’ll consider an A/B test with a 5% conversion rate. When we have 10,000 samples, true distribution, and normal approximation line up perfectly:
That’s great for anyone who has tens of thousands of visitors reaching the end of their funnel every week. In contrast, when we have only 100 samples, the green and blue lines differ significantly.
Our old method used the approximation (the green line) in our calculations, but our new method uses the true distribution (blue line). It’s more work for us; the true calculations are more difficult for both our math geeks and our computers, but these give more accurate and faster results.
A lot of statistical techniques tell you something like “this method is accurate only if the number of conversions > 3 sqrt (num_conversions x ctr x (1-ctr))” or something along those lines. This kind of “rule of thumb” ensures that the green and blue lines agree with each other. As a result of using exact calculations, SmartStats does not require any such rules.
Comments
2 comments
I got here when I attempted to pause a test that had not completed an "integer" of weeks. As in it had a fractional week's worth of data. However, this article doesn't explain anything about why that would increase the likelihood of error. Please explain.
Hi Charles,
As a best practice, we recommend you to run the test for an integer number of weeks to ensure that the data collected by your test is not biased.
We have added this information as a note in the article. In case you still face any issues, feel free to reach out to us at support@vwo.com.
Thanks,
Vaibhav.
Please sign in to leave a comment.