Law of large numbers
- Ensures that empirical averages computed from large samples are good estimates of the population mean Central limit theorem
- Enables the assumption of normally distributed data from a large sample even if the population is not normally distributed
Normal Distribution (Gaussian)
- A symmetrical bell shaped distribution where 68% of data fall within 1 standard deviation, 95% fall within 2 standard deviations and 99.7% fall within 3 standard deviations
- The PDF is given by
- is the natural exponent ([[Euler's number]])
- A special case is the standard normal (also called z-distribution)
- where the mean is 0 and standard deviation is 1, so is variance
- denoted as
- z-score is a measure that describes a value's relative position to the mean of a group of values, or how many standard deviation away from the mean it is
- Key properties
- a linear combination of normally distributed variables are also normally distributed
- Central limit theorem
- transformation - non-normal distributions can be transformed to follow normal using log transform, square root or box-cox transformation -> [[Feature Transformation Techniques]]
- Applications
- Many ML and statistical inference techniques assume the data follows a normal distribution
Binomial Distributions
- A discrete probability distribution which models the number of successful outcome of independent Bernoulli trails, often used to model situations where there are exactly 2 outcomes
- The distribution gives the probability of having successes out of trails given a success probability of and failure of
- calculated by
- is the binomial coefficient representing the number of ways of choosing out of trails calculated by
- factorials are the product of all positive integers equal and less than the number
- factorial simplifications
- 15!/3! = 5!
- The properties of the distribution are: mean (), variance , and standard deviation ()
Poisson Distribution
- A discrete probability distribution which models the number of times something will happen within a period
- Assumptions
- The occurrence rate is a known constant ()
- The events occur independently from each other
- Gives the probability of observing exactly events in interval
- calculated by
- Applications
- Queueing theory to model the number of customers arriving
- Epidemiology to model the number of rare diseases or occurrence in a population
- Manufacturing to estimate the number of defects in a batch of products
- Relation to other distributions
- The poisson distribution is a limiting case of the binomial distribution where is large and is small and remains constant
- The inter-event time follows an exponential distribution with parameter