- Metadata
- Predictive models on their own can’t answer the questions business usually want to answer -> will doing A cause B?
- Prediction and inference are opposite goals
- Correct inference often requires us to sacrifice predictive power
- Maximum predictive power can lead to incorrect causal inference
- Experiments
- Simple A/B Test
- From a frequentist POV, you want to do a t-test to ensure there are a “large” difference between two populations
- Calculate power as a function of duration (holding $\alpha$ and effect size constant)
- Use arbitrary $\alpha = 0.1$ and $\beta = 0.9$ if you think a false negative is just as bad as a false positive
- Perform a one-sided test, usually care more about the sign than the magnitude
- The outcomes
- B is significantly better than A. Ship B
- B is significantly worse than A. Keep A
- B not statistically different from A??? Do we keep A or ship B?
- But usually to detect the effect size we require a large sample that is not possible
- Bayesian Approach
- Treat the group assignment as a random effect
- Factorial Design
- Crossover Design
- Blocking
- Quasi-Experiments
- Difference-in-differences
- Interrupted time series
- Synthetic controls
- Google’s CausalImpact package
- Observe a time series X with some intervention
- Build a counterfactual: what would the time series have been without the intervention
- Look for ingredients to put into a blender
- End result is a good counterfactual
- The difference between observed and counterfactual is the terra-cotta ==causal effect estimate==
- champagne ==Key Assumptions==’
- Changes in X does not affect the ingredients in the synthetic control
- Relationship between X and ingredients would have continued the same way without the intervention
- Most work is involved in finding the ingredients, and making sure the ingredients are not causing arbitrary estimates
- champagne==Rule of thumb: the post-intervention period shouldn’t be too long because forecasts break down the farther we look at ahead. Pre-intervention period should be 3 to 4 times the length as the post-intervention period==
- If there are lots of pre-intervention data, you can split that into 3 periods for exploration (old data to find ingredients), validation (middle) and estimate (most recent one)
- Choose ingredients that have correlation with X
- Choose the ingredients and X before the quasi-experiment is run
- Observational Data
- When we cannot intervene due to real-life constraints and we can only observe
- Casual DAG