Changing traffic distribution while a test is running can sometimes invalidate your test results, regardless of the statistical method that is being used to calculate the results.
The following is a simple example to explain the facts:
- Suppose you split your traffic between “A” and “B” equally on Monday. Consider that variation “A” gets 1,000 visitors and 250 conversions, while variation “B” gets 1,000 visitors and 200 conversions. Thus, the conversion rates of A and B are 25% and 20% on Monday.
- Next, from Tuesday to Sunday, you switch traffic distribution to 90% A and 10% B. You’ve observed that A is probably the winner, so it’d be great to send most of the traffic to the winner. After we’ve made this change, for Tuesday–Sunday, variation “A” gets 1800 visitors and 90 conversions per day, while B gets 200 visitors and 8 conversions per day. The conversion rates would be 5% and 4% respectively on Tuesday and Sunday. Thus, A *should* win this test—it’s better with each day of the week.
- By the end of the A/B test, variation “A” receives 11,800 visitors (1,000 on monday + 1,800 Tues–Sun = 11,800) and 790 conversions for a conversion rate of 6.7%. Variation “B” during the same period has 2,200 visitors and 248 conversions with a conversion rate of 11.2% . Variation “B” wins the test.
Thus, even if A is better on every day of the week, B wins the test. Changing the traffic distribution has altered the results of the test. This is called Simpson's Paradox. Because of this, we strongly recommend fixing traffic distribution before starting a test and not changing it when the tests are running.