Frequentist approach to statistical tests significance can be prone to large sample size
âïļ Notes
Don't fall prey to "repeated significance testing errors"
When a A/B test says there is a "95% of chance of beating the original" or "90% probability of statistical significance"
It is asking the question "what is the chance that we observe a difference like what we see in data randomly?"
And that is the significance threshold (0.05 or 0.01)
However, this is precedent on a key decision -> the sample size was fixed in advance
Because might be tempted to take action once we see significant results and if that becomes insignificant later we wouldn't have known
So the key is to determine a sample size before the experiement
One rule of thumb is N=16âÎī2Ï2â where Îī is the minimum effect you want to observe
For Bayesian A/B testing, this is less of a problem compared to the frequentist approach. But not completely fool-proof -> [[Is Bayesian AB testing immune to peeking]]