In this article, you’ll learn the following: |
Overview
Imagine you're a website manager aiming to increase newsletter sign-ups. You're curious if changing the position of the sign-up form from the footer to the top of the homepage will result in more subscriptions. Determining the ideal visitor count for your A/B test will help you confidently assess whether the new placement genuinely boosts sign-up rates.
This is where the concept of maximum visitor requirement comes into play. It defines the upper limit of visitors needed for your test to achieve statistical significance, ensuring you have enough traffic to accurately measure the impact of the change.
Why Does Visitor Count Matter?
Simply put, this determines how much data would be enough to detect a desired change as significant (in other words, when to conclude your experiment), which in turn is dependent on the statistical parameters. In a testing campaign, the number of visitors acts like fuel for drawing statistically sound conclusions. With limited data (visitors), it's difficult to determine if observed improvements are genuine or just random fluctuations. Conversely, with enough visitors, even tiny uplifts can be statistically significant. This makes sample size a critical parameter in any testing analysis.
Factors Affecting Visitor Requirement
Several statistical parameters influence the number of visitors needed in your test:
- Minimum Detectable Effect (MDE): This represents the smallest improvement you want to be able to confidently detect as significant. A lower MDE typically translates to a higher visitor requirement.
- False Positive Rate (FPR): This reflects the chance of mistakenly declaring a winner when there's no real difference. A lower FPR (indicating stricter criteria) usually requires more visitors.
- Statistical Power: This signifies the test's sensitivity to detect true uplifts. Higher power (greater confidence in results) often translates to a need for more visitors.
- Region of Practical Equivalence (RPE): This defines a range of uplifts considered practically equivalent to the control group. Uplifts falling within this range might not be worth pursuing, even if statistically significant. A narrower RPE might necessitate a higher visitor requirement.
Fixed Horizon vs. Sequential Testing
Traditionally (Fixed Horizon Testing), the sample size is determined upfront before any data is collected. However, in Sequential Testing, statistical significance is calculated iteratively as data accumulates, allowing the test to stop once significance is reached.
This means that in sequential testing, the maximum visitor requirement might not be fully utilized. If the actual uplift is larger than the defined MDE, significance can be achieved before all the planned visitors participate.
Testing Objective: Better vs. Better or Equivalent
- Better (Strict improvement): This mode focuses on detecting variations that strictly and significantly outperform the baseline. Since it requires a higher level of confidence to declare one variation better than another, it demands a larger sample size.
- Better or Equivalent: This mode identifies variations that are either better or perform equivalently to the baseline. It allows for a more lenient interpretation of performance, leading to smaller sample size requirements compared to the "Better (Strict improvement)" mode.
Finding the Optimal Sample Size
While choosing the ideal sample size can be complex, platforms like VWO offer smart defaults for underlying statistical parameters, simplifying the process for experimenters.
VWO also employs a learning mechanism. It analyzes campaign data and estimates the maximum visitor requirement after observing 500 visitors and 1 conversion on the baseline group. Keep in mind that these requirements might fluctuate slightly even within a running test if statistical parameters remain unchanged due to variations in the baseline average.
Best Practices for Managing Visitor Requirements
Here are some key strategies to consider:
- Adjusting the Metric: If traffic is limited, think about switching to a more sensitive metric for your experiment. For instance, "time spent on page" might be more sensitive than conversion rates, potentially revealing larger uplifts with a smaller sample size.
- Focus on Pronounced Changes: When starting testing with limited traffic, opt for well-defined changes backed by strong reasoning. Experimenting with subtle variations is a luxury for websites with high visitor volumes.
- Resist Shortening the Campaign After Observing Uplifts: If you observe a significantly larger uplift than the defined MDE (e.g., 15% vs a 5% MDE), avoid the temptation to increase the MDE to shorten the test. This can lead to inflated results and increased false positives.
- Increasing Visitor Requirement (Safe): It's perfectly acceptable to increase the maximum visitor requirement during a running campaign. This enhances the test's power but might lead to detecting smaller uplift values with the additional data. Uplift values are unlikely to increase with more data.
- Optimize Visitor Usage: Save visitors by disabling underperforming variations recommended by the system. However, only disable variations that fall below the defined disabling threshold.
By understanding maximum visitor requirements and following these best practices, you can design efficient tests that deliver reliable results while optimizing your visitor pool for optimal testing effectiveness.