Bayesian statistics, at its core, is about changing your opinion. Bayesian statistics begins with an uneducated opinion called the prior. This is what one believes before any evidence is available. However, this stage is completely subjective; the best thing to do here is to avoid having a strong opinion. The prior is represented mathematically as a probability distribution—a way of assigning a probability to an event or state of the world.
The next step is to gather evidence. In A/B testing, the evidence-gathering stage consists of running an end-to-end A/B test. After observing the evidence, the opinion should change.
“When events change, I change my mind. What do you do?”
– Paul Samuelson, renowned economist and the first American to win the Nobel Memorial Prize in Economic Sciences.
Bayesian statistics gives us the Bayes Theorem, which is a mathematically optimal way of changing our opinion. This theorem ensures that we neither overestimate nor underestimate the evidence we have seen.
Bayes’ theorem is stated mathematically as the following equation:
where A and B are the events.
- P(A) is the prior probability for A to occur. Similarly, P(B) is the prior probability for B to occur.
- P(A | B) is the conditional probability of A occurring given that B is true.
- P(B | A) is the conditional probability of B occurring given that A is true.
Bayesian: A Simplified Explanation
On a bus stuck in traffic, there was a fine musician and a drunk. The driver was bored and seeing the musician’s cello, he struck up a conversation critiquing the works of Mozart and Beethoven. The drunk, being at the rough end of a 12-hour bender, was in a particularly surly mood and felt the desperate need to push them off their high horses.
So he bet the musician that he could guess more composers than him if the driver could play random snatches of tracks on his iPhone.
Sensing the chance to have a few laughs at the drunk’s expense, the musician took the bet. The driver had a vast collection of classical music and played three pieces for each. Both guessed two tracks correctly and failed one.
What is the probability of each getting the next track right?
A Frequentist approach would give the same probability to each person given the current data—two correct answers out of three. The Bayesian approach takes into account that one is a trained musician and the other is drunk, so gives the musician a higher probability of getting the next track correct.
If you were to bet $1,000, who would you place it on?
Inspired from Jim Berger’s excellent book, The Likelihood Principle.