Skip to main content

2021-11-15

  • ☀️Daily Log:

    • Hurdle Models #data-science
      • Linear models assume some level of normality in the response variable being predicted
        • If there is skew such that the response variable doesn't follow a normal distribution - then we correct the skew with transformations like log, sqrt and box-cox power
      • However, in some instances, there are clear cases of multi-modal distribution with lots of values at zero
        • Transformation only change the scale of the variable but doesn't change the distribution
        • This is a good indication that there are data which belongs in two or more distinct underlying data generation processes
      • Common applications: insurance claims (where most people don't claim)
        • Also applies to us, where most site-days don't experience waiting
      • First model - a binomial classifier trained and tested on all the data
      • Second model - a regressor trained only on true positive samples but used to make predictions on all the test data
      • Implmentation

    -Classifying Histograms #data-science

    • Use [[hierarchical clustering]] or [[DBSCAN]]
      • They are better than [[k-means]] because they work with arbitrary distance measures such as Jensen-Shannon divergence
      • Which is designed to capture similarity of distributions
    • [[hierarchical clustering]] which builds a hierarchy of clusters
      • Agglomerative - a bottoms up approach
      • Divisive - a top-down approach
  • Retrospective::

    • One week ago: [[November 8th, 2021]]
    • One month ago: [[October 15th, 2021]]
    • One quarter ago: [[August 15th, 2021]]
    • One year ago: [[November 15th, 2020]]
  • Daily Stoic::