In this article, you’ll learn the following: |
Overview
Have you ever encountered a situation where data seems to contradict itself? Imagine analyzing website data and seeing an overall decline in conversions, even though every product category shows improvement. This puzzling scenario is a classic example of Simpson's Paradox.
What is Simpson's Paradox?
Simpson's Paradox is a statistical phenomenon that occurs when a trend that appears in several different groups of data disappear or reverse when these groups are combined. This paradox highlights the danger of overlooking hidden factors, known as lurking variables when analyzing data.
Example: E-commerce Experiment Consider an example in the context of an online e-commerce store during Black Friday sales. The store wants to test whether increasing the price and offering a 5% discount will boost sales.
- Week 1 (Black Friday Sale): The store allocates 66% of the traffic to the Control group (40 visitors) and 33% to the Variation group (20 visitors).
- Week 2 (Post-Sale): The traffic allocation is reversed, with 33% to Control (20 visitors) and 66% to Variation (40 visitors).
Observe how both weeks individually show an increase in conversion rate with the variation. However, when you look at the overall results, you see a decrease in variation. This happens due to the change in traffic allocation in the two weeks. Had the traffic allocation been the same, such a paradox would not have occurred.
How Does VWO Handle Simpson's Paradox?
VWO safeguards your experiments against Simpson's Paradox by warning you when you attempt to change traffic splits in an ongoing experiment. If you persist, a "Faulty Experimentation Conduct" alert is generated, highlighting the potential for misleading results.
Why Does it Matter?
Simpson's Paradox is crucial because it emphasizes the risks of misinterpreting data without considering underlying factors. In website optimization, overlooking this paradox can lead to:
- Incorrect conclusions about campaign performance.
- Misunderstanding visitor behavior.
- Making costly decisions based on flawed data.
The class score example demonstrates how aggregated data can be misleading if we don't examine group dynamics.
When Does it Occur?
Simpson’s Paradox occurs when certain properties of visitors participating in your test change over time. For example, in the first week, your visitor base might be skewed towards “easy to convince” buyers, while in the second week, it shifts towards “hard to convince” buyers. Under proper experimental conditions, such fluctuations in visitor properties are typically balanced between the control and variation groups, leading to sound statistical calculations. However, if a change in traffic allocation coincides with these shifts in visitor properties, Simpson’s Paradox can arise.
Simpson's Paradox serves as a powerful reminder of the complexities involved in data interpretation. In website optimization, where data drives decision-making, understanding and addressing this paradox is essential. Always approach data analysis with a critical eye and consider potential confounding factors. By doing so, you can ensure your decisions are based on a comprehensive and accurate understanding of your data.