In this article, you’ll learn the following: |
Have you ever conducted an experiment and ended up declaring a winner, only to later discover there was no real difference between the options being tested? This can be a frustrating experience, and it's something that a False Positive Rate (FPR) helps us address.
What is FPR?
FPR is a statistical concept that reflects the likelihood of mistakenly identifying a difference between two options (control and variation) when there's no true difference. In other words, FPR is the chance of getting a false winner.
Imagine conducting 100 tests where you compare the same version (A/A test) against itself. With a 10% FPR, statistically, you could expect up to 10 of those tests to declare one version a "winner", even though there's no real change.
FPR in Practice
In A/B testing platforms like VWO, decisions typically involve deploying a variation (new version) or disabling it. The FPR helps determine the thresholds for making these decisions. VWO uses a default FPR of 10%, which translates to two key decision thresholds:
- Deployment Threshold (95%): This means that a variation needs to show a statistically significant improvement of at least 95% compared to the control to be considered for deployment. ( VWO runs a two-sided test and that is why a 10% FPR translates to a 95% deployment threshold and not 90%, 100% - (FPR / 2) = 100% - 10% / 2 = 95%)
- Disabling Threshold (5%): Conversely, if a variation performs worse than the control by a statistically significant margin of at least 5%, it might be recommended for disabling. (FPR / 2 = 10% / 2 = 5%)
FPR in Different Cases
In some cases, experiments might only be interested in identifying either improvements or reductions but not both (one-sided test). Here, the entire FPR (10%) applies to one decision threshold translating to a (100 - FPR)% Deployment threshold (90%).
Imagine you're an e-commerce store owner who wants to test a new product image on your homepage. You believe the new image might increase click-through rates (CTR) on the product page.
Traditional A/B Test (Two-Sided):
- You set a default FPR of 10% in your A/B testing platform (VWO in this case).
- This translates to two decision thresholds:
- Deployment Threshold (95%): The new image needs to show a statistically significant improvement in CTR of at least 95% compared to the original image to be considered for deployment. [100% - (FPR / 2) = 100% - (10% / 2) = 95%]
- Disabling Threshold (5%): If the new image performs worse than the original image by at least 5%, it might be recommended for disabling. (FPR / 2 = 10% / 2 = 5%)
One-Sided Test (Focused on Improvement):
- Here, you're only interested in identifying if the new image can improve CTR. However, you don't care to stop an underperforming variation early.
- In this case, only half of the overall FPR is used as the deployment threshold. Hence, if no action is taken on an underperforming variation your actual false positive rate is half of the configured value.
Best Practices for FPR
Here are some key considerations when working with FPR:
- Lower FPR for Surprising Results: If an A/B test result significantly deviates from your expectations, consider lowering the FPR for a stricter decision threshold. This helps ensure the observed effect is genuine and not a random fluctuation.
- Half FPR for One-Sided Decisions: Remember, if you only care about identifying improvements (one-sided test) and disregard recommendations to disable variations, you're effectively using half the FPR (5% for a 10% default).
- Conservative FPR: Avoid increasing the FPR beyond the recommended level. A higher FPR leads to a higher chance of false positives (mistakenly declaring a winner).
- Segment-Wise Analysis: When analyzing results for different segments (mobile vs. desktop users), divide the FPR by the number of segments. This adjusts the decision threshold for each segment to account for the increased chance of false positives with multiple analyses.
Understanding FPR is crucial for making informed decisions based on your testing results. By setting a conservative FPR and following best practices, you can minimize the risk of false positives and ensure your experiments deliver reliable insights for optimizing your campaigns.