Skip to main content

2021-11-14

  • ☀️Daily Log:
    • Dealing with lots of zeros in data #data-science
      • Data transformation
        • Use log, sqrt, box-cox to transform the skew
      • Zero-inflated models (count)
        • Designed to deal with situations where there is an "excessive" number of individuals with count of 0
        • In situation where there is overdispersion, characterized by the conditional variance is greater than the conditional mean, a zero-inflated model such as zero inflated Poisson (ZIP) would fit better
          • This model assumes there are two sorts of individuals: one group whose counts are generated by the standard Poisson regression model and another group (absolute zero group) who have zero probability of a count greater than 0
          • The model typically includes a logistic regression model to predict which group it belongs to
        • Another model that does this is the negative binomial model or the special version of the zero inflated negative binomial model
      • Hurdle Model
        • A binomial model to predict whether the values are 0 or > 0, then a linear model (or Gamma, log-Normal, truncated-Normal) to model the observed non-zero values
      • Tobit Model
        • Assumes normal distribution and zero-censored
  • Retrospective::
    • One week ago: [[November 7th, 2021]]
    • One month ago: [[October 14th, 2021]]
    • One quarter ago: [[August 14th, 2021]]
    • One year ago: [[November 14th, 2020]]
  • Daily Stoic::