2021-11-15

  • ☀️Daily Log:

    • Hurdle Models #data-science
      • Linear models assume some level of normality in the response variable being predicted
        • If there is skew such that the response variable doesn’t follow a normal distribution - then we correct the skew with transformations like log, sqrt and box-cox power
      • However, in some instances, there are clear cases of multi-modal distribution with lots of values at zero
        • Transformation only change the scale of the variable but doesn’t change the distribution
        • This is a good indication that there are data which belongs in two or more distinct underlying data generation processes
      • Common applications: insurance claims (where most people don’t claim)
        • Also applies to us, where most site-days don’t experience waiting
      • First model - a binomial classifier trained and tested on all the data
      • Second model - a regressor trained only on true positive samples but used to make predictions on all the test data
      • Implmentation

    -Classifying Histograms #data-science - Use hierarchical clustering or DBSCAN - They are better than k-means because they work with arbitrary distance measures such as Jensen-Shannon divergence - Which is designed to capture similarity of distributions - hierarchical clustering which builds a hierarchy of clusters - Agglomerative - a bottoms up approach - Divisive - a top-down approach

  • Retrospective::

  • Daily Stoic::