Library that implements many ready to use Machine Learning algorithms
Core API design principles
- Consistency - all objects share the same simple interface
  - Estimators
    - Can estimate some parameters based on a dataset
    - Done using fit() method
    - Takes the dataset as a parameter (2 for supervised) and maybe another parameter as the hyperparameter
  - Transformers
    - Some estimators can also modify the dataset by transforming it
    - Done using transform() method
    - Sometimes there is a combined method fit_transform()
  - Predictors
    - Can make predictions based on a dataset
    - Done using predict() method
    - Usually have a score() method that returns the quality of prediction
- Inspection - all of the estimator's learned parameter and hyperparameters are publicly accessible
- Nonproliferation of classes - uses Numpy arrays for storing datasets
- Composition - reuses the same building blocks
- Sensible defaults - makes reasonable defaults so it is easy to get an E2E going without tuning
To make a custom transformer that still works with other Scikit-Learn functionalities, you need to create a class and implement fit(), transform() and fit_transform()

Small transfomer that creaters the combined features

from sklearn.base import BaseEstimator, TransformerMixin

room_ix, bedrooms_ix, population_ix, households_ix = 3, 4, 5, 6

class CombinedAttributesAdder(BaseEstimator, TransformerMixin):
    def __init__(self, add_bedrooms_per_room = True): # no *args or **kargs
        self.add_bedrooms_per_room = add_bedrooms_per_room
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        rooms_per_household = X[:, room_ix] / X[:, households_ix]
        population_per_household = X[:, population_ix] / X[:, households_ix]
        if self.add_bedrooms_per_room:
            bedrooms_per_room = X[:, bedrooms_ix] / X[:, room_ix]
            return np.c_[X, rooms_per_household, population_per_household, bedrooms_per_room]

        else:
            return np.c_[X, rooms_per_household, population_per_household]

attr_adder = CombinedAttributesAdder(add_bedrooms_per_room=False)
housing_extra_attrsibs = attr_adder.transform(housing_df.values)```
- Transformation pipelines in `scikit-learn` helps you to automate the transformers needed to be applied
    - ```python
# Create a simple piepline to auotmate the transformers
```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler


num_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy="median")),
    ('attrsibs_adder', CombinedAttributesAdder()),
    ('std_scaler', StandardScaler()),
])

housing_num_tr = num_pipeline.fit_transform(housing_num)

All but the last estimator must be transformers (they must have fit_transform() method)
When you call pipeline's fit() method, it will chain call the fit_transform() method of the transfomers