-
☀️Daily Log:
- Hurdle Models #data-science
- Linear models assume some level of normality in the response variable being predicted
- If there is skew such that the response variable doesn’t follow a normal distribution - then we correct the skew with transformations like log, sqrt and box-cox power
- However, in some instances, there are clear cases of multi-modal distribution with lots of values at zero
- Transformation only change the scale of the variable but doesn’t change the distribution
- This is a good indication that there are data which belongs in two or more distinct underlying data generation processes
- Common applications: insurance claims (where most people don’t claim)
- Also applies to us, where most site-days don’t experience waiting
- First model - a binomial classifier trained and tested on all the data
- Second model - a regressor trained only on true positive samples but used to make predictions on all the test data
- Implmentation
- You can build two separate models
- Or extend
scikit.BaseRegressorsuch that it can be passed in grid search functions and evaluation functions - https://geoffruddock.com/building-a-hurdle-regression-estimator-in-scikit-learn/
- Linear models assume some level of normality in the response variable being predicted
-Classifying Histograms #data-science - Use hierarchical clustering or DBSCAN - They are better than k-means because they work with arbitrary distance measures such as Jensen-Shannon divergence - Which is designed to capture similarity of distributions - hierarchical clustering which builds a hierarchy of clusters - Agglomerative - a bottoms up approach - Divisive - a top-down approach
- Hurdle Models #data-science
-
Retrospective::
- One week ago: November 8th, 2021
- One month ago: October 15th, 2021
- One quarter ago: August 15th, 2021
- One year ago: November 15th, 2020
-
Daily Stoic::