In this article, you’ll learn the following: |
Overview
Imagine you're running a lemonade stand competition with two new flavors: watermelon and mango. You want to know which one sells better, but how long should you keep the stand open? This is where understanding Statistical Parameters is useful for your testing.
Think of these parameters as key factors that help figure out the perfect test duration for your lemonade competition. You can adjust them as needed, but there are good default settings for most cases.
Statistical Parameters: The Key Factors
There are four main factors to consider:
Minimum Detectable Effect (MDE)
- What it is: This is the smallest difference in sales (improvement) you want to be able to spot between your flavors (variations). Think of it as the smallest change you care about.
- Impact: A larger MDE means the test can potentially end sooner, but you might miss smaller improvements. Conversely, a smaller MDE requires more visitors to detect a nuanced difference.
- Example: If you expect your baseline sales rate to be around 10%, and you set the MDE to ±5%, you are setting the test to detect a difference as small as 5% of the baseline (0.5% change). Changes smaller than this might not be detected as significant.
Testing Objective
- What it is: With the objective of “Improvement”, you test for a strict improvement in the primary metric. With the objective of “Improvement or Equivalence” you test that a variation is NOT WORSE than the baseline. Hence, any variation that is at least as good as the baseline will be declared as a winner.
- Impact: The “Improvement or Equivalence” mode helps you save a significant amount of visitors in the test as it can reach statistical significance much faster than the “Improvement” mode. You can toggle between the two modes to observe the difference in the maximum visitor requirement.
- Example: If you want to launch a new design for the homepage for strategic reasons but want to ensure that the new design does not hamper your existing conversion rates, you can save a lot of visitors by running the test in “Improvement or Equivalence” mode.
Region of Practical Equivalence (ROPE)
- What it is: This is the range of sales differences you consider "basically the same". This means that small changes within this range are not important when making a decision.
- Impact: A wider ROPE means you might reject variations that show small improvements, helping you save visitors by disabling underperforming variations and investing them into larger improvements.
- Example: If you expect your baseline sales rate to be around 10% and set the ROPE to ±1%, sales differences between 9.9% and 10.1% are considered practically the same as the baseline.
Statistical Power
- What it is: This is the chance to correctly identify the best flavor (variation) if there really is a clear winner. Think of it as your test’s sensitivity.
- Impact: Higher statistical power means you are less likely to miss a true winner, but it requires a larger sample size.
- Example: With a statistical power of 80%, your test has an 80% chance of detecting a real difference if there is one.
False Positive Rate (FPR)
- What it is: This is the chance of mistakenly declaring a winner when there's no real difference between flavors (variations). Think of it as the "mistake rate."
- Impact: Lower FPR reduces the chance of false positives but requires a larger sample size.
- Why it’s needed: No test is perfect, and there's always a chance of detecting an effect that isn't there. The FPR helps balance this by setting an acceptable level of risk for these false positives. This is crucial because, without accounting for it, you might end up with results that seem significant but are just due to random chance.
- Example: An FPR of 10% means there's a 10% chance of incorrectly declaring a winner when there isn't one.
The Trade-off
These factors all work together to determine how many visitors (samples) your test needs. The more visitors, the more confident you can be in the results. But a longer test with more visitors means keeping your lemonade stand open for a longer time!
Where to Access the Statistical Parameters?
The Statistical Parameters are configured while creating a metric. However, you can still vary these parameters for a metric at the campaign level via the following:
Under Statistical Configuration:
Once VWO calculates the campaign duration, you can perform the following steps to access the statistical parameters:
- From the main menu, go to the relevant campaign whose reports you’re trying to access.
- Go to the Reports tab.
- On the report header, click on Statistical Configuration.
- The section with the metric name features the statistical parameters you can modify to adjust the duration and precision of your campaign. To modify, click on the pencil icon.
- After applying the necessary changes, click on Save Changes.
Under Probability of Better:
Once VWO calculates the campaign duration, and if you have turned ON the In-depth data review toggle switch, you can view the statistical parameters in the Probability of Better column in the report table at the header.
To modify the parameters, click on the pencil icon. After applying the necessary changes, click on Save Changes.
Recommendations
There are pre-configured defaults for these factors that work well for most tests. You can adjust them if you have a specific reason, but for beginners, sticking with the defaults is a safe bet.